diff --git a/develop/api_doc/.buildinfo b/develop/api_doc/.buildinfo index adac1818d966d3b61ebf0c01b0e9e5821d174cfe..11adacf44c62bc67369cd54d86aefa754acdbf53 100644 --- a/develop/api_doc/.buildinfo +++ b/develop/api_doc/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. -config: afc32173167fc468034cdf5aae2571ec +config: 90642218475087a09239879d1159d425 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/develop/doc/.buildinfo b/develop/doc/.buildinfo index adac1818d966d3b61ebf0c01b0e9e5821d174cfe..11adacf44c62bc67369cd54d86aefa754acdbf53 100644 --- a/develop/doc/.buildinfo +++ b/develop/doc/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. -config: afc32173167fc468034cdf5aae2571ec +config: 90642218475087a09239879d1159d425 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/develop/doc/_images/control_flow_graph.png b/develop/doc/_images/control_flow_graph.png deleted file mode 100644 index 3579998e58d07abc50bd3332128d4733a391cb3b..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/control_flow_graph.png and /dev/null differ diff --git a/develop/doc/_images/dataflow_equations.png b/develop/doc/_images/dataflow_equations.png deleted file mode 100644 index c10f7f69f4007952e5b0394edaa04efa1cfbb658..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/dataflow_equations.png and /dev/null differ diff --git a/develop/doc/_images/deep_learning.png b/develop/doc/_images/deep_learning.png deleted file mode 100644 index 026becc4d94e01e407dacb2a5314a0e5723334ff..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/deep_learning.png and /dev/null differ diff --git a/develop/doc/_images/fluid-compiler.png b/develop/doc/_images/fluid-compiler.png deleted file mode 100644 index 1b0ffed2039c91a3a00bbb719da08c91c3acf7bb..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/fluid-compiler.png and /dev/null differ diff --git a/develop/doc/_images/graph_construction_example_all.png b/develop/doc/_images/graph_construction_example_all.png deleted file mode 100644 index 261611a5721f9aa97874f7e6d897fe48cf667db2..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/graph_construction_example_all.png and /dev/null differ diff --git a/develop/doc/_images/graph_construction_example_forward_backward.png b/develop/doc/_images/graph_construction_example_forward_backward.png deleted file mode 100644 index 4c69687f4a6a181138f3df72ce5e8aa48487b5be..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/graph_construction_example_forward_backward.png and /dev/null differ diff --git a/develop/doc/_images/graph_construction_example_forward_only.png b/develop/doc/_images/graph_construction_example_forward_only.png deleted file mode 100644 index e668c16e0cac73acb4e5dc2b1827557ae77126b4..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/graph_construction_example_forward_only.png and /dev/null differ diff --git a/develop/doc/_images/pprof_1.png b/develop/doc/_images/pprof_1.png deleted file mode 100644 index 8e9edbf377672d0ef40f2fc7bd39e746923550cb..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/pprof_1.png and /dev/null differ diff --git a/develop/doc/_images/pprof_2.png b/develop/doc/_images/pprof_2.png deleted file mode 100644 index 172ba20399ba974d27f4c072425277b69b02520b..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/pprof_2.png and /dev/null differ diff --git a/develop/doc/_images/trainer.png b/develop/doc/_images/trainer.png deleted file mode 100644 index 6537d3d56589ca9f19a77a50a970e4b5275e6ce0..0000000000000000000000000000000000000000 Binary files a/develop/doc/_images/trainer.png and /dev/null differ diff --git a/develop/doc/_sources/design/api.md.txt b/develop/doc/_sources/design/api.md.txt deleted file mode 100644 index e6a4638d9100d9b07c3ee6b92b530a17eae1c162..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/api.md.txt +++ /dev/null @@ -1,262 +0,0 @@ -# PaddlePaddle Design Doc - -## Ingredients - -As our design principle is starting from the essence: how could we -allow users to express and solve their problems as neural networks. -Some essential concepts that our API have to provide include: - -1. A *topology* is an expression of *layers*. - -1. A layer could be any kind of computation, including *cost*. - -1. Some layers have parameters, some don't. Most costs don't have - parameters. - -1. In some topologies, layers share parameters. For - example, - [the network for training a ranking model](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850). - -1. At programming time, users specify topologies and possible sharing - of parameters. PaddlePaddle can figure out and create parameters - required (and possibly shared) by one or more topologies. - - -## Starting from Examples - -As a summarization -of -[our disucssion](https://github.com/PaddlePaddle/Paddle/issues/1315), -let us present two examples here: - - -### Example 1. Sharing Parameters between Layers - -We use -the -[3-branch ranking](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850) model -in this example. For your convenience, I copy-a-paste the model's -topology as follows: - -``` -A -> f -\ -Q -> f --> cost -B -> f -/ -``` - -The following program trains the topology including the cost, and then -use the sub-network in the trained topology in inference: - -```python -def f(in): - e = paddle.layer.embedding(in, parameter_name="embedding") - o = paddle.layer.softmax(e, parameter_name="semantic") - return o - -# Create 3 topologies (subnets), they share parameters because all -# correspoinding layers have the same parameter names. -fA = f(paddle.layer.data(input_name="A")) -fB = f(paddle.layer.data(input_name="B")) -fQ = f(paddle.layer.data(input_name="Q")) - -topology = paddle.layer.less_than( - paddle.layer.cross_entropy(fA, fQ), - paddle.layer.corss_entropy(fB, fQ)) - -# Derive parameters required in topology and create them in model. -parameters = paddle.parameters.create(topology) - -# Estimate parameters used in topology from data. -paddle.train(topology, parameters, reader=read_ranking_model_data) - -# Inference using fA (or fB or fC, as they share their parameters). -[testA, testB, testQ] = read_ranking_model_data() -print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA) -``` - - -### Example 2. Sharing Parameters between "Models" - -We use [GAN](https://github.com/PaddlePaddle/book/tree/develop/gan) in -this example. In the following example program, `d0` and `d1` -correspond to the two networks in the following figure: - - - -```python -def G(in): - # over-simplified example as G has only one layers: - return paddle.layer.fc(in, parameter_name="G") - -def D(in); - # again, over-simplified: - return paddle.layer.fc(in, parameter_name="D") - -# Construct the first topology, which contains both D and G. -# By learning this topology, we update parameters of G. -d0 = paddle.layer.should_be_false(D(G(paddle.layer.data()))) - -# Construct a second topology d1, which contains only D. By -# training this topology, we update parameters of D. Note -# that d1 share parameters with d0. -d1 = paddle.layer.should_be_true(D(paddle.layer.data())) - -# Create parameters from a list of multiple topologies (models) for -# the chance to share parameters between these topologies. -parameters = paddle.parameters.create([d0, d1]) - -# Iterative training of GAN. -for ...: - train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"}) - train(d1, parameters, reader=read_from_realistic_images) - -# Use d1 for inference: -print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images) -``` - - -### Summarization - - -Above two programs reveal some important design concerns: - -1. Users describe a topology as an expression of layers. Every layer - has a *parameter name*. If the users don't specify it explicitly, it's automatically generated as a unique name. By - specifying the parameter name, users can specify the sharing of - parameters between layers and even between topologies. - -1. `paddle.parameters.create` figures out parameters required by one - or more topologies from parameter names of layers. It creates these - parameters and returns a `ParameterSet` object, which is in essence - a map from *parameter names* to *parameters*. - -1. At training and inference time, `paddle.train` and `paddle.infer` - requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons: - - 1. This prevents users from forgetting to call - `paddle.parameters.create`. - 1. `paddle.train` needs to know which parameter set to update. - 1. Users could load another (pre-trained) parameter set and use it - with a topology in `train.infer`. - -1. By specifying the `immutable_parameters` parameter of - `paddle.train`, we can forbid the update of these parameters. - - -## Reader - -Not all programming frameworks allow users to define I/O functions. -An example is Google MapReduce, which can only read from text, -SSTable, and RecordIO files. Hadoop MapReduce allows users to define -readers and writers by deriving from base classes `Reader` and -`Writer`. The former is less flexible but also less error-prone. We -decide to provide the flexibility to users to define their readers. - - -There are some open questions here: - -1. **Should a reader return a Python dictionary?** - -1. **How to map multiple outputs from a reader to multiple data layers?** - -1. **How to easily compose some existing readers to read more data and - feed a topology with more data layers?** - - -## Training - -The recommended way to training a model is to call `paddle.train`, -which simply calls `paddle.trainer.Default`, a global variable of -type `paddle.trainer.SGD`. Equivalently, we can do - -```python -opt = paddle.trainer.SGD(..., paddle.updater.Adam(...)) -opt.train(topology, parameters, reader=read, ...) -``` - -### Updater - -Please be aware that a trainer can accept an updater as its data -member, where an updater is a class derived from -`paddle.trainer.Updater`. This is to make it easier to customize -trainers, as discussed -[here](https://github.com/PaddlePaddle/Paddle/issues/1319). - -### Event Handler - -`paddle.train` and `paddle.trainer.XXX.train` take an optional -parameter `event_handler`, which should be either `None` or a function -that handle some events: - -1. BeginTraining -1. EndTraining -1. BeginIteration -1. EndIteration -1. BeginPass -1. EndPass - -where EndPass is sent if and only if the reader yields -`end_pass=True`. - -An example as follows: - -```python -def event_handler(event): - if ininstance(event, paddle.event.EndIteration): - print paddle.test(...) - -paddle.train(topology, parameters, reader, event_handler) -``` - -If we are writing a PaddlePaddle program in and for iPython/Jypyter, -we can use metaplotlib in the event handler to plot a curve of -cost/error versus iterations, as shown -[here](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/). - -### Distributed Training - -If users want to do distributed training on a cluster, s/he should -call `paddle.dist_train` and provides access tokens to the cluster as -a parameter. - -For example, if the user has a TLS certificate that allows him to -access a Kubernetes cluster, s/he should be able to call - -```python -paddle.dist_train(model, - trainer=paddle.trainer.SGD(..., - paddle.updater.Adam(...)), - reader=read, - k8s_user="yi", - k8s_token="kube_cluster_tls.pem", - k8s_job="hello", - num_parameter_servers=15) -``` - -The pseudo code of `paddle.dist_train` is as follows: - -```python -def dist_train(topology, parameters, trainer, reader, ...): - if os.getenv("KUBERNETES_SERVICE_HOST") == None: - image_name = k8s_user + '/' + k8s_job - docker_build(image_name) - docker_push() - kube_ctrl_start_job(image_name, k8s_user, k8s_token) - else: - rank = kube_list_containers_in_job_and_return_current_containers_rank() - if rank == 0: - master() - elif rank < 15: - parameter_server() - else: - trainer.train(model, reader=read) -``` - -Please be aware that if a process is running on the Kubernetes -cluster, it will have some environment variables pre-defined. - -If `dist_train` doesn't see these environment variables, it knows -that it's running on users' personal computer, and it should work as a -*launcher*. Otherwise, it knows that it's running on the cluster and -need to figure out its role as either the master, or a trainer, or a -parameter server. diff --git a/develop/doc/_sources/design/auto_gradient_check.md.txt b/develop/doc/_sources/design/auto_gradient_check.md.txt deleted file mode 100644 index 773b7b6a767541f28c27f247c1ad8c9a8a2d0ccf..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/auto_gradient_check.md.txt +++ /dev/null @@ -1,150 +0,0 @@ -## Auto Gradient Check Design - -## Background: -- Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right because of the following challenges: - 1. The formula for backpropagation formula should be correct according to the forward computation. - 2. The Implementation of the above shoule be correct in CPP. - 3. It is difficult to prepare an unbiased test data. - -- Auto gradient checking gets a numerical gradient using forward Operator and uses it as a reference for the backward Operator's result. It has several advantages: - 1. Numerical gradient checker only needs the forward operator. - 2. The user only needs to prepare the input data for forward Operator and not worry about the backward Operator. - -## Mathematical Theory -The following documents from Stanford have a detailed explanation of how to compute the numerical gradient and why it is useful. - -- [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization) -- [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96) - - -## Numerical Gradient Implementation -### Python Interface -```python -def get_numerical_gradient(op, - input_values, - output_name, - input_to_check, - delta=0.005, - local_scope=None): - """ - Get Numerical Gradient for the input of an operator. - - :param op: C++ operator instance, could be an network. - :param input_values: The input variables. Should be an dictionary, whose key is - variable name, and value is a numpy array. - :param output_name: The final output variable name. - :param input_to_check: The input variable with respect to which the gradient has to be computed. - :param delta: The perturbation value for numerical gradient method. The - smaller the delta, the more accurate the result. But if the delta is too - small, it will suffer from the numerical stability problem. - :param local_scope: The local scope used for get_numeric_gradient. - :return: The gradient array in numpy format. - """ -``` - -### Explanation: - -- Why do we need an `output_name` - - An Operator may have multiple Outputs, one can compute an independent gradient from each Output. So the caller should specify the name of the output variable. - -- Why do we need `input_to_check` - - One operator can have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numerical Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times each with a different input. - - -### Core Algorithm Implementation - - -```python - # we only compute the gradient of one element a time. - # we use a for loop to compute the gradient of each element. - for i in xrange(tensor_size): - # get one input element using the index i. - original = tensor_to_check.get_float_element(i) - - # add delta to it, run the forward op and then - # get the new value of the result tensor. - x_pos = original + delta - tensor_to_check.set_float_element(i, x_pos) - y_pos = get_output() - - # Subtract delta from this element, run the op again - # and get the new value of the result tensor. - x_neg = original - delta - tensor_to_check.set_float_element(i, x_neg) - y_neg = get_output() - - # restore old value - tensor_to_check.set_float_element(i, original) - - # compute the gradient of this element and store - # it into a numpy array. - gradient_flat[i] = (y_pos - y_neg) / delta / 2 - - # reshape the gradient result to the shape of the source tensor. - return gradient_flat.reshape(tensor_to_check.get_dims()) -``` - -## Auto Gradient Check Framework - -Each Operator Kernel has three kinds of Gradient: - -1. Numerical gradient -2. CPU kernel gradient -3. GPU kernel gradient (if supported by the device) - -The numerical gradient only relies on the forward Operator, so we use the numerical gradient as the reference value. The gradient checking is performed in the following three steps: - -1. Calculate the numerical gradient -2. Calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient. -3. Calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient. (if supported) - -#### Python Interface - -```python - def check_grad(self, - forward_op, - input_vars, - inputs_to_check, - output_name, - no_grad_set=None, - only_cpu=False, - max_relative_error=0.005): - """ - :param forward_op: used to create backward_op - :param input_vars: numpy value of input variable. The following - computation will use these variables. - :param inputs_to_check: the input variable with respect to which the - gradient will be computed. - :param output_name: The final output variable name. - :param max_relative_error: The relative tolerance parameter. - :param no_grad_set: used to create backward ops - :param only_cpu: only compute and check gradient on cpu kernel. - :return: - """ -``` - -### How to check if two numpy arrays are close enough? -if `abs_numerical_grad` is nearly zero, then use absolute error for numerical_grad. - -```python -numerical_grad = ... -operator_grad = numpy.array(scope.find_var(grad_var_name(name)).get_tensor()) - -abs_numerical_grad = numpy.abs(numerical_grad) -# if abs_numerical_grad is nearly zero, then use abs error for -# numeric_grad, instead of relative error. -abs_numerical_grad[abs_numerical_grad < 1e-3] = 1 - -diff_mat = numpy.abs(abs_numerical_grad - operator_grad) / abs_numerical_grad -max_diff = numpy.max(diff_mat) -``` - - -#### Notes: -The Input data for auto gradient checker should be reasonable to avoid numerical stability problem. - - -#### References: - -- [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization) -- [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96) diff --git a/develop/doc/_sources/design/backward.md.txt b/develop/doc/_sources/design/backward.md.txt deleted file mode 100644 index 20fda7a98f514a3f1c1c2d0ba7447ec954b21d5a..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/backward.md.txt +++ /dev/null @@ -1,158 +0,0 @@ -# Backward Building - -## Motivation - -In Neural Network, most models are solved by the backpropagation algorithm(known as **BP**) at present. Technically, BP calculates the gradient of the loss function, then propagates it back through the networks following the chain rule. However, when configuring the model structure, users do not need to define the backward part. So a mechanism is required by the framework which can complete the model's backward part automatically according to the given forward part. - -When implementing a specific `op`, the developer is also asked to implement its backward version, called `grad_op`. A `grad_op` takes gradients of its corresponding `op`'s outputs, and calculate gradients of the `op`'s inputs. During the building of a model's backward part, the framework creates each forward `op`'s `grad_op`, and then string them together in reverse order of forwarding part. In this way, gradients spread from the end to the beginning of the model, in another word, from the loss to parameters. - -## Challenges - -The motivation of backward building is apparent. However, implementation it correctly is not so easy. In the **Fluid** design, a deep learning model is described by `Program`, `Block`, `Op` and `Variable`. The `Block` itself can be nested. It means that the `op`s and `variable`s are scattered across different blocks rather than all be gathered in a single graph. Our backward building algorithm shall visit blocks in recursive order and be able to insert `grad_op`s and new created `variable`s into the right place. - -## Usage - -Although the whole algorithm is comprised of many functions, only one is exposed as API: - -```python -def append_backward(loss, parameter_list=None, no_grad_set=None): - """ - Append backward part to main_program - - Args: - loss(Variable): The variable generated by the cost function. - parameter_list(list): Parameters that need to be updated by optimizers. - If None, it means all parameters need to be updated. - - no_grad_set(set): Variables that have no gradients in Block 0. - If None, the set will be generated inside the function and - contains all variables with `step_gradient=True` from all blocks. - - Return: - (list[Variable]): list of (parameters, gradients) pair. - """ -``` - -By invoking this API, the framework appends backward part of the program where the `loss` is. It takes three arguments. `loss` means the final loss value. It must be a scalar and is usually the output of the loss layer. It is also where the gradient generated and backpropagation starts. `parameter_list` marks all parameters needs updating. If it's `None`, all parameter will be updated by optimizers. `no_grad_set` marks variables without gradient. if all outputs of some `grad_op` are in `no_grad_set`, the `grad_op` will not be run. - -This API will be invoked automatically before optimizer building. -As a result, in most cases, users do not need to invoke the API by themselves to append backward part. - -## Implementation - -The implementation of backward building algorithm is in `backward.py` file. The whole algorithm can be divided into two independent parts: creating `grad_op`s and creating new variables. - -### Creating `grad_op`s - -The creating of `grad_op`s is implemented by: - -```python -def _append_backward_ops_(target, - block, - target_block, - no_grad_dict, - grad_to_var): - """ - Create all grad ops, and insert them into given block - - Args: - target(Variable): the target variable of forward pass - block(Block): the block where forward ops are - target_block(Block): the block which is going to hold new generated grad ops - no_grad_dict(dict): - key(int) block index - val(set) a set of varibale names. These varibales have no gradient - grad_to_var(dict)(output argument): - key(str): grad variable name - val(str): corresponding forward variable name - """ -``` - -Given a `block`, the function will traverses all `op`s in this block in reverse order, gets corresponding `grad_op` from the C++ core via `core.get_grad_op_desc()`, then append it to `target_block`. - -However, some specific `op`(e.g. `while_op`, `if_else_op`) can hold its own sub-block. For these sub-blocks contains `op`s as well, the `grad_op` creating should be recursive. - -During the reverse traversal, we check each `op` whether it has an attribute named `sub_block`. If so, it means there is a sub-block and we need to deal with it first. After creating a new block whose father is the one in `op`'s attribute, we invoke `_append_backward_ops_()` recursively, assigning the new block to parameter `target_block` and the one in `op`'s attribute to `block`. The *pseudo-code* shows this process: - -``` -******* pseudo-code ******** -for op in reversed(block.ops): - if op has an attribute named 'sub_block': - Get the sub-block(`s_block`) from op's attribute. - Create a new block(`grad_s_block`), whose father is `s_block`. - Invoke _append_backward_ops_(), with `block=s_block` and `target_block=grad_s_block` - - Invoke `core.get_grad_op_desc()` to get op's grad_op. - Insert name correspondings between variables and their gradients of the grad_op to grad_to_var - Assign grad_s_block to grad_op as it's 'sub_block' attribute. - Append grad_op to current target_block. -``` - -The first invoking of `_append_backward_ops_()` is initiated by `append_backward()`, in which parameters `block` and `target_block` are all assigned with root block(the block with index 0). - -### Corner Cases of `grad_op` Creating - -In the previous section, we show the regular process of `grad_op` creating. However, in some corner cases, the conventional algorithm is not enough to get the correct result and appending handling is required. These additional processes run after the algorithm mentioned above and do some special adjusts on its output `grad_op`s. - -#### Shared Variables - -If a variable is read by more than one `op` in the forward pass, its gradient is likely to be written by more than one `grad_op`s in the next backward pass. To make the gradient result being the sum of all `grad_op`s' outputs instead of the last running one, we assign each output with a temporary variable and then add a `sum_op` to add them up. - -For the debug convenience, if the final gradient name is `w@GRAD`, it's corresponding temporary variables will be named as `w@GRAD@RENAME@0`, `w@GRAD@RENAME@1`... - -See function `_addup_repetitive_outputs_` in `backward.py` for implementation details. - -#### No Gradient Variables - -In our framework, variables can be marked as *no_gradient*, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some `grad_op` are marked as *no_gradient*, the `grad_op` itself can be skipped in backward pass. - -Another situation is all the gradient inputs of some `grad_op` are marked as *no_gradient*, which means all of them can be considered as zeros. For `grad_op`s are in essence the propagation of gradients, all the outputs are definitely zeros when all gradient inputs are zeros. Therefore the `grad_op` can also be skipped. - -It should be noted that all these zero gradients still need to be creating and initialized by something, otherwise following `grad_op`s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ `fill_zeros_like_op` to initialize them as all zeros. - -This features are implemented in function `_remove_no_grad_branch_`. It checks new created `grad_op`s one-by-one, removes who can be skipped and inserts `fill_zeros_like_op` when its necessary. We can get the `no_grad_set` from the `_append_backward_ops_` argument `no_grad_dict` or generate it on the fly by scanning all variables' `no_gradient` attribute(True or False). - -### Creating Backward Variables - -Up to now, we have completed all creating and adjusting jobs of `grad_op`s. However, backward variables have not been created. Now they are only represented by `grad_op`'s input and output arguments. The backward variable creating job will be done by: - -```python -def _append_backward_vars_(block, - start_op_idx, - grad_to_var, - grad_info_map): - """ - Create new variables required by backward pass. - - Args: - block(Block): the block where new variables will be created - start_op_idx(int): Only variables required by ops in block.ops[start_op_idx : ] will be created - grad_to_var(dict): - key(str): grad variable name - val(str): corresponding forward variable name - In most cases, this dict is generated by _append_backward_ops_() - grad_info_map(dict)(output argument): - key(str): forward variable name - val(tuple): a tuple of (str, int), str is the corresponding grad name, int is the block index - """ -``` - -Given a `block`, this function traverses all the `grad_op`s in it(The argument `start_op_idx` indicates where the grad_op sequence starts.) and creates all the uncreated outputs. The *pseudo-code* shows this process: - -``` -for op in block.ops[start_op_idx : ]: - - if op has an attribute named 'sub_block': - Get the sub-block(`s_block`) from op's attribute. - Invoke _append_backward_vars_(), with `block=s_block` - - for var_name in op.all_output_names(): - if block.has_var_recursive(var_name) or var_name is the name of empty variable: - continue - create a new variable named 'var_name' in block - if grad_to_var.has_key(var_name): - set grad_info_map[grad_to_var[var_name]] as a tuple of (var_name. block) - - do op's var type inference - do op's shape inference -``` diff --git a/develop/doc/_sources/design/block.md.txt b/develop/doc/_sources/design/block.md.txt deleted file mode 100644 index 907a2def557fd472ac4d679c73447bd9107d1190..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/block.md.txt +++ /dev/null @@ -1,336 +0,0 @@ -# Design Doc: Block and Scope - -## The Representation of Computation - -Both deep learning systems and programming languages help users describe computation procedures. These systems use various representations of computation: - -- Caffe, Torch, and Paddle: sequences of layers. -- TensorFlow, Caffe2, Mxnet: graph of operators. -- PaddlePaddle: nested blocks, like C++ and Java programs. - -## Block in Programming Languages and Deep Learning - -In programming languages, a block is a pair of curly braces that includes local variables definitions and a sequence of instructions or operators. - -Blocks work with control flow structures like `if`, `else`, and `for`, which have equivalents in deep learning: - -| programming languages | PaddlePaddle | -|-----------------------|-----------------------| -| for, while loop | RNN, WhileOp | -| if, if-else, switch | IfElseOp, SwitchOp | -| sequential execution | a sequence of layers | - -A key difference is that a C++ program describes a one pass computation, whereas a deep learning program describes both the forward and backward passes. - -## Stack Frames and the Scope Hierarchy - -The existence of the backward pass makes the execution of a block of PaddlePaddle different from traditional programs: - -| programming languages | PaddlePaddle | -|-----------------------|---------------------------------| -| stack | scope hierarchy | -| stack frame | scope | -| push at entering block| push at entering block | -| pop at leaving block | destroy when minibatch completes| - -1. In traditional programs: - - - When the execution enters the left curly brace of a block, the runtime pushes a frame into the stack, where it realizes local variables. - - After the execution leaves the right curly brace, the runtime pops the frame. - - The maximum number of frames in the stack is the maximum depth of nested blocks. - -1. In PaddlePaddle - - - When the execution enters a block, PaddlePaddle adds a new scope, where it realizes variables. - - PaddlePaddle doesn't pop a scope after the execution of the block because variables therein are used by the backward pass. So it has a stack forest known as a *scope hierarchy*. - - The height of the highest tree is the maximum depth of nested blocks. - - After the processing of a minibatch, PaddlePaddle destroys the scope hierarchy. - -## Use Blocks in C++ and PaddlePaddle Programs - -Let us consolidate the discussion by presenting some examples. - -### Blocks with `if-else` and `IfElseOp` - -The following C++ programs shows how blocks are used with the `if-else` structure: - -```c++ -namespace pd = paddle; - -int x = 10; -int y = 1; -int z = 10; -bool cond = false; -int o1, o2; -if (cond) { - int z = x + y; - o1 = z; - o2 = pd::layer::softmax(z); -} else { - int d = pd::layer::fc(z); - o1 = d; - o2 = d+1; -} - -``` - -An equivalent PaddlePaddle program from the design doc of the [IfElseOp operator](./if_else_op.md) is as follows: - -```python -import paddle as pd - -x = minibatch([10, 20, 30]) # shape=[None, 1] -y = var(1) # shape=[1], value=1 -z = minibatch([10, 20, 30]) # shape=[None, 1] -cond = larger_than(x, 15) # [false, true, true] - -ie = pd.ifelse() -with ie.true_block(): - d = pd.layer.add_scalar(x, y) - ie.output(d, pd.layer.softmax(d)) -with ie.false_block(): - d = pd.layer.fc(z) - ie.output(d, d+1) -o1, o2 = ie(cond) -``` - -In both examples, the left branch computes `x+y` and `softmax(x+y)`, the right branch computes `fc(x)` and `x+1` . - -The difference is that variables in the C++ program contain scalar values, whereas those in the PaddlePaddle programs are mini-batches of instances. - - -### Blocks with `for` and `RNNOp` - -The following RNN model in PaddlePaddle from the [RNN design doc](./rnn.md) : - -```python -x = sequence([10, 20, 30]) # shape=[None, 1] -m = var(0) # shape=[1] -W = var(0.314, param=true) # shape=[1] -U = var(0.375, param=true) # shape=[1] - -rnn = pd.rnn() -with rnn.step(): - h = rnn.memory(init = m) - h_prev = rnn.previous_memory(h) - a = layer.fc(W, x) - b = layer.fc(U, h_prev) - s = pd.add(a, b) - act = pd.sigmoid(s) - rnn.update_memory(h, act) - rnn.output(a, b) -o1, o2 = rnn() -``` -has its equivalent C++ program as follows - -```c++ -int* x = {10, 20, 30}; -int* m = {0}; -int* W = {0.314}; -int* U = {0.375}; - -int mem[sizeof(x) / sizeof(x[0]) + 1]; -int o1[sizeof(x) / sizeof(x[0]) + 1]; -int o2[sizeof(x) / sizeof(x[0]) + 1]; -for (int i = 1; i <= sizeof(x)/sizeof(x[0]); ++i) { - int x = x[i-1]; - if (i == 1) mem[0] = m; - int a = W * x; - int b = Y * mem[i-1]; - int s = fc_out + hidden_out; - int act = sigmoid(sum); - mem[i] = act; - o1[i] = act; - o2[i] = hidden_out; -} -``` - -## Compilation and Execution - -Like TensorFlow, a PaddlePaddle program is written in Python. The first part describes a neural network as a protobuf message, and the rest executes the message for training or inference. - -The generation of this protobuf message is similar to how a compiler generates a binary executable file. The execution of the message is similar to how the OS executes the binary file. - -## The "Binary Executable File Format" - -The definition of the protobuf message is as follows: - -```protobuf -message BlockDesc { - repeated VarDesc vars = 1; - repeated OpDesc ops = 2; -} -``` - -The step net in above RNN example would look like - -``` -BlockDesc { - vars = { - VarDesc {...} // x - VarDesc {...} // h - VarDesc {...} // fc_out - VarDesc {...} // hidden_out - VarDesc {...} // sum - VarDesc {...} // act - } - ops = { - OpDesc {...} // matmul - OpDesc {...} // add_two - OpDesc {...} // sigmoid - } -}; -``` - -Also, the RNN operator in above example is serialized into a protobuf message of type `OpDesc` and would look like: - -``` -OpDesc { - inputs = {0} // the index of x in vars of BlockDesc above - outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above - attrs { - "states" : {1} // the index of h - "step_net" : - } -}; -``` - -This `OpDesc` value is in the `ops` field of the `BlockDesc` value representing the global block. - - -## The Compilation of Blocks - -During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator). - -VarDesc in a block should have its name scope to avoid local variables affecting parent block's name scope. -Child block's name scopes should inherit the parent's so that OpDesc in child block can reference a VarDesc that is stored in the parent block. For example: - -```python -a = pd.Variable(shape=[20, 20]) -b = pd.fc(a, params=["fc.w", "fc.b"]) - -rnn = pd.create_rnn() -with rnn.stepnet(): - x = a.as_step_input() - # reuse fc's parameter - fc_without_b = pd.get_variable("fc.w") - rnn.output(fc_without_b) - -out = rnn() -``` -The method `pd.get_variable` can help retrieve a Variable by the name. The Variable may be stored in a parent block, but might be retrieved in a child block, so block should have a variable scope that supports inheritance. - -In compiler design, the symbol table is a data structure created and maintained by compilers to store information about the occurrence of various entities such as variable names, function names, classes, etc. - -To store the definition of variables and operators, we define a C++ class `SymbolTable`, like the one used in compilers. - -`SymbolTable` can do the following: - -- store the definitions (some names and attributes) of variables and operators, -- verify if a variable was declared, -- make it possible to implement type checking (offer Protobuf message pointers to `InferShape` handlers). - - -```c++ -// Information in SymbolTable is enough to trace the dependency graph. So maybe -// the Eval() interface takes a SymbolTable is enough. -class SymbolTable { - public: - SymbolTable(SymbolTable* parent) : parent_(parent) {} - - OpDesc* NewOp(const string& name=""); - - // TODO determine whether name is generated by python or C++. - // Currently assume that a unique name will be generated by C++ if the - // argument name is left default. - VarDesc* Var(const string& name=""); - - // find a VarDesc by name, if recursive is true, find parent's SymbolTable - // recursively. - // this interface is introduced to support InferShape, find protobuf messages - // of variables and operators, pass pointers into InferShape. - // - // NOTE maybe some C++ classes such as VarDescBuilder and OpDescBuilder should - // be proposed and embedded into pybind to enable python operation on C++ pointers. - VarDesc* FindVar(const string& name, bool recursive=true); - - OpDesc* FindOp(const string& name); - - BlockDesc Compile() const; - - private: - SymbolTable* parent_; - - map ops_; - map vars_; -}; -``` - -After all the description of variables and operators is added into SymbolTable, -the block has enough information to run. - -The `Block` class takes a `BlockDesc` as input, and provides `Run` and `InferShape` functions. - - -```c++ -namespace { - -class Block : OperatorBase { -public: - Block(const BlockDesc& desc) desc_(desc) {} - - void InferShape(const framework::Scope& scope) const override { - if (!symbols_ready_) { - CreateVariables(scope); - CreateOperators(); - } - // should run InferShape first. - for (auto& op : runtime_table_.ops()) { - op->InferShape(scope); - } - } - - void Run(const framework::Scope& scope, - const platform::Place& place) const override { - PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first."); - for (auto& op : runtime_table_.ops()) { - op->Run(scope, place); - } - } - - void CreateVariables(const framework::Scope& scope); - void CreateOperators(); - - // some other necessary interfaces of NetOp are listed below - // ... - -private: - BlockDesc desc_; - bool symbols_ready_{false}; -}; -``` - -## The Execution of Blocks - -Block inherits from OperatorBase, which has a Run method. -Block's Run method will run its operators sequentially. - -There is another important interface called `Eval`, which takes some arguments called targets and generates a minimal graph which treats targets as the end points and creates a new Block. After `Run`, `Eval` will get the latest value and return the targets. - -The definition of Eval is as follows: - -```c++ -// clean a block description by targets using the corresponding dependency graph. -// return a new BlockDesc with minimal number of operators. -// NOTE: The return type is not a Block but the block's description so that this can be distributed -// to a cluster. -BlockDesc Prune(const BlockDesc& desc, vector targets); - -void Block::Eval(const vector& targets, - const framework::Scope& scope, - const platform::DeviceContext& dev_ctx) { - BlockDesc min_desc = Prune(desc_, targets); - Block min_block(min_desc); - min_block.Run(scope, dev_ctx); -} -``` diff --git a/develop/doc/_sources/design/build_system/README.md.txt b/develop/doc/_sources/design/build_system/README.md.txt deleted file mode 100644 index bf0e4dddc1b640ecbce489f65820aaf8a4b3b1e7..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/build_system/README.md.txt +++ /dev/null @@ -1,152 +0,0 @@ -A few months ago when we were trying to replace CMake with Bazel, @emailweixu suggested that we rewrite those handy Bazel functions using CMake. Now it seems that it's the right time to get this done, as we are facing problems from the porting of Majel and the development of new the parameter server using Go and C++. - -Here are some initial thoughts. Your comments are welcome! - -### Required CMake Function - -I think we need only the following few CMake functions to make a project description mean and clean: - -| C++ | CUDA C++ | Go | -|---|---|---| -| cc_library | nv_library | go_library | -| cc_binary | nv_binary | go_binary | -| cc_test | nv_test | go_test | - -- The `_library` functions generate .a files from source code. -- The `_binary` functions generate executable binary files. -- The `_test` functions generate executable unit test files. They work like `_binary` but links `-lgtest` and `-lgtest_main`. - -The difference between `nv_` functions and `cc_` functions is that the former use `nvcc` instead of the system-default C++ compiler. - -Both `nv_` and `cc_` functions enables C++11 (-std=c++11). - -Also, - -- to describe external dependencies, we need `external_library`. -- to build shared libraries, we need `shared_library`. - -### An Example Project - -Suppose that we have aforementioned functions defined in our `/cmake` directory. The following example `CMakeLists.txt` describes a project including the following source files: - -- tensor.h -- tensor.cc -- tensor_test.cc -- ops.h -- ops.cu -- ops_test.cu -- api.go -- api_test.go - -Suppose that ops.cu depends on CUDNN. - -```cmake -# cc_binary parses tensor.cc and figures out that target also depend -# on tensor.h. -cc_binary(tensor - SRCS - tensor.cc) - -# The dependency to target tensor implies that if any of -# tensor{.h,.cc,_test.cc} is changed, tensor_test need to be re-built. -cc_test(tensor_test - SRCS - tensor_test.cc - DEPS - tensor) - -# I don't have a clear idea what parameters external_library need to -# have. @gangliao as a CMake expert would have better ideas. -external_library(cudnn - ....) - -# Suppose that ops.cu depends on external target CUDNN. Also, ops.cu -# include global functions that take Tensor as their parameters, so -# ops depend on tensor. This implies that if any of tensor.{h.cc}, -# ops.{h,cu} is changed, ops need to be re-built. -nv_library(ops - SRCS - ops.cu - DEPS - tensor - cudnn) # cudnn is defined later. - -nv_test(ops_test - SRCS - ops_test.cu - DEPS - ops) - -# Because api.go defines a GO wrapper to ops and tensor, it depends on -# both. This implies that if any of tensor.{h,cc}, ops.{h,cu}, or -# api.go is changed, api need to be re-built. -go_library(api - SRCS - api.go - DEPS - tensor # Because ops depend on tensor, this line is optional. - ops) - -go_test(api_test - SRCS - api_test.go - DEPS - api) - - -# This builds libapi.so. shared_library might use CMake target -# api_shared so to distinguish it from above target api. -shared_library(api - DEPS - api) - -``` - -### Implementation - -As above example CMakeLists.txt executes, each function invocation adds "nodes" to a dependency graph. It also use this graph to generate CMake commands including `add_executable`, `add_dependencies`, `target_link_libraries`, and `add_test`. - -### Using Package Manager For Go - -Building Go binaries and libraries need to satisfy their dependencies, generally -we can do `go get ./...` to download and compile all external dependencies. The -problems are: - -1. `go get` will always get the latest code from the default branch of the - remote repo, so changes of dependents might break the build. This is very - different with what we already have in `cmake/external` which download a - specific version or commit id of the dependency. -1. Some locations can not access external dependencies through the internet, as mentioned - in https://github.com/PaddlePaddle/Paddle/issues/2605. Using package management - tools can package the dependencies as a "vendor" package, which can be mirrored - at many cloud file hosting, so users what to compile paddle by themselves can - download this "vendor" package from a mirror site. - -#### Choose A Suitable Tool - -As mentioned by @wangkuiyi, [Here](https://github.com/golang/go/wiki/PackageManagementTools) -list dozens of Go package managers. We choose the tool using following principles: - -- Most "active" projects with more stars, more pull requests or commits -- Widely used project - -After comparing all these projects, we shall choose between the most popular -tools: Godep and Glide. - -Here's a brief comparison between Godep and Glide -: https://github.com/Masterminds/glide/wiki/Go-Package-Manager-Comparison. There are -also many complaints about using `Godep`. There's also a new "official" pakcage -management tool has been started at: https://github.com/golang/dep to resolve -such problems, but it's currently at Alpha stage. So the best choice now is -glide obviously. - -#### Manage Go Packages - -- Dependencies: `go/glide.yaml` will store the dependencies and their versions which - is directly imported by paddle. `go/glide.lock` will store all dependencies recursively - with their commit id. Builds will "lock" to these packages if we don't `glide up` - them -- Vendor package: `go/vendor` directory will generated when running `cmake` command. `cmake` - will download the code corresponding to `go/glide.lock`. If we put a vendor folder - under `go/`, cmake will just check the commit id to the packages under the folder, - if commit id matches, there will be no download at all. diff --git a/develop/doc/_sources/design/cluster_train/README.md.txt b/develop/doc/_sources/design/cluster_train/README.md.txt deleted file mode 100644 index 177a5f5d54bd924fab34795219ce1f7b270c8e25..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/README.md.txt +++ /dev/null @@ -1,182 +0,0 @@ -# Design Doc: Distributed Training - -## Objective - -In [this slides](https://www.slideshare.net/cxwangyi/paddlepaddle-a-complete-solution-for-businesses), we explained that we'd like PaddlePaddle running on general-purpose clusters like those managed by Kubernetes, so to address demands for AI from both Internet and non-Internet industries. - -This poses technical challenges to PaddlePaddle: - -1. Support fault-recovery. -1. Support both offline and online training. -1. [Serverless computing](https://en.wikipedia.org/wiki/Serverless_computing) of distributed training. - - -## Training Job - -A training job will be created once user asks Paddle cloud to train a model. The training job is made up of different processes that collaboratively consume data and produce a trained model. There are three kinds of processes: - -1. the *master server process*, which dispatches tasks to -1. one or more *trainer processes*, which run distributed training and synchronize gradients/models via -1. one or more *parameter server processes*, where each holds a shard of the global model, and receive the uploaded gradients from every *trainer process*, so they can run the optimize functions to update their parameters. - -Their relation is illustrated in the following graph: - - - -By coordinating these processes, PaddlePaddle supports use both Synchronize Stochastic Gradient Descent (sync SGD) and Asynchronous Stochastic Gradient Descent (async SGD) to train user-defined neural network topologies. - -When training with sync SGD, parameter servers wait for all trainers to finish gradients update and then send the updated parameters to trainers, training can not proceed until the trainer received the updated parameters. This creates a synchronization point between trainers. When training with async SGD, each trainer upload gradient and download new parameters individually, without the synchronization with other trainers. Using asyc SGD will be faster in terms of time per pass, but have more noise in gradient since trainers are likely to have a stale model. - -### Master Server Process - -The master server process will: - -- Partition a dataset into [tasks](#task) and dispatch tasks to trainers. -- Keep track of training progress on the dataset with [task queue](#task-queue). A training job will iterate on the dataset for a full pass until it goes into next pass. - - -#### Task - -A task is a data shard to be trained. The total number of tasks will be much bigger than the total number of trainers. The number of data instances inside a task will be much bigger than the mini-batch size. - -#### Task Queue - -The master server has three task queues to track training progress. As illustrated in the graph below, Job A and Job B both have one master server. Each master server process has three task queues. - - - -- The todo queue holds tasks to be dispatched. When a job starts, the master server fills in the todo queue with all tasks. -- The pending queue holds tasks that are currently training by trainers. -- the done queue holds tasks that are already trained. - -The life cycle of a single task is illustrated below: - - - -1. When a new pass of training starts, all tasks will be placed in the todo queue. -1. Upon trainer requests for new task, the master server will dispatch a task from todo queue to it, put the task in the pending queue and wait for completion. -1. The trainer will work on its task and tell the master server once the task is completed and ask for new task. The master server will dispatch a new task to that trainer. -1. If a task fails for any reason in trainer, or takes longer than a specific period of time, the master server will move the task back to the todo queue. The timeout count for that task will increase by one. If the timeout count is above a threshold, the task is likely to cause a trainer to crash, then it will be discarded. -1. The master server will move completed task to the done queue. When the todo queue is empty, the master server will start a new pass by moving all tasks in the done queue to todo queue and reset the timeout counter of all tasks to zero. - -### Trainer Process - -The trainer process will: - -- Request tasks from the master. -- Work on the tasks -- Upload gradient to parameter servers, and update local model by downloading new parameters from parameter servers. - -### Parameter Server Process - -Parameter server processes hold the parameters collaboratively. The parameters are partitioned on different parameter servers. - -The parameter server will: - -- Receive gradient from the trainers, update its parameters, and give the trainers the latest parameters. -- Periodically save its parameters to distributed file system by overriding the previous save. - -### Optimization Algorithms - -The communication pattern between the trainers and the parameter servers depends on the category of optimization algorithm: - -- Synchronous Stochastic Gradient Descent (sync-SGD) - - Parameter server will wait for all trainer finish n-th mini-batch calculation and send their gradients before broadcasting new parameters to every trainer. Every trainer will wait for the new parameters before starting n+1-th mini-batch. - -- Asynchronous Stochastic Gradient Descent (async-SGD) - - There will no synchronization between different trainers, and parameter server updates its parameter as soon as it receives new gradient: - - - Each trainer uploads its accumulated gradient every n mini-batches. - - Every m mini-batches, the trainer downloads new parameters from parameter server. - - n and m do not have to be equal. - -## Fault Tolerant - -The training job will pause if the master server processes is dead, or any of the parameter server process is dead. They will be started by [Kubernetes](https://kubernetes.io/) and recover in few minutes. Please refer to [fault recovery](#fault-recovery). - -The training job will continue to make progress if there is at least one training process running. The strategy depends on the type of optimization algorithm: - -- sync-SGD - - TODO - -- async-SGD - - Since async-SGD does not require synchronization between mini-batches, the system will by definition make process if at least one trainer is running. - -## Fault Recovery - -PaddlePaddle uses [etcd](https://github.com/coreos/etcd) to keep track of the states of processes. Because etcd is a distributed reliable key-value store, the restarted process can recover its states from etcd. The model parameters are periodically saved into distributed file system, so a restarted parameter server can recover its parameters from the saved file. - -Now we will introduce how each process recovers from a failure, the graph below shows how etcd is used: - - - -### Master Server Process - -When the master is started by the Kubernetes, it executes the following steps at startup: - -1. Grabs a unique *master* lock in etcd, which prevents concurrent master instantiations. -1. Recovers the task queues from etcd if they already exist, otherwise, the master will create them. -1. Write its ip address to */master/addr* so that trainers can discover it. -1. Listens to trainers' request of task, dispatch one upon request, and updates task queue using an etcd transaction to ensure lock is held during the update. - -When the master server process is dead for any reason, Kubernetes will restart it. It will be online again with all states recovered from etcd in few minutes. - -### Trainer Process - -When the trainer is started by the Kubernetes, it executes the following steps at startup: - -1. Watches the available parameter server prefix keys `/ps/` on etcd and waits until the count of parameter servers reaches the desired count */ps_desired*. -1. Finds and watches */master/addr* to get master's address. -1. Requests for tasks from the master to start training. - -When a trainer fails, Kuberentes would try to restart it. The recovered trainer would fetch tasks from master and go on training. - -### Parameter Server Process - -When the parameter server is started by Kubernetes, it executes the following steps at startup: - -1. Read desired total number of parameter servers from etcd `/ps_desired` -1. Search through etcd keys `/ps/` (`/ps/0`, `/ps/1`, ...) to find the first non-existant key whose index is smaller than the total number of parameter servers. Set the key using a transaction to avoid concurrent writes. The parameter server's index is inferred from the key name. - - The desired number of parameter servers is 3: - - - - The third parameter server joined: - - - -1. The parameter server can load parameters if there are already saved parameters in the save path (inferred from its index). -1. Now the parameter server is ready for the trainers' requests. - -If the parameter server's etcd lease expires, the parameter server will kill itself. - - -## Parameter Server Checkpointing -See [here](./checkpointing.md) - -## Store and dispatching trainning data -See [here](./data_dispatch.md) - - -## Dynamic Scaling - -### Trainer Scaling - -TODO - -### Parameter Server Scaling - -Not planned for v1. - -## Training Dataset Format - -TODO - -## User Interface - -TODO diff --git a/develop/doc/_sources/design/cluster_train/checkpointing.md.txt b/develop/doc/_sources/design/cluster_train/checkpointing.md.txt deleted file mode 100644 index c87ef2c7d2636208866d05456d5d44316d0bb200..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/checkpointing.md.txt +++ /dev/null @@ -1,44 +0,0 @@ -## 模型参数检查点(Checkpointing) -模型数据检查点的实现,可以有效的避免parameter server的单点或多点同时故障。模型参数检查点通过定期向磁盘上保存一份存储在parameter server内存中的模型数据的完整镜像,来保证训练过程可以从中间状态重新启动。在一个不可中断并缺少备份的训练任务中,可以通过阶段性的保存每个parameter server的数据快照(snapshot)到 ***分布式存储服务*** 达到容灾的目的,比如每隔10分钟最新的快照,并删除更早的快照。在出现单点故障时,只需要恢复这台节点,或者将这台节点迁移到另一个节点并启动即可恢复训练任务。 - - - -### 快照保存的设计如下: - -说明: - -* parameter server在集群中启动后,自动挂载分布式存储目录,并把快照保存到这个目录下。 -* ***注:每个parameter server的检查点各自独立保存,暂时不考虑多个parameter server同步的保存一个特定时间点的全局检查点,因为这样做也没法保证消除随机性。*** - -检查点保存程序流程: - -1. 如果满足条件"每隔10分钟"时,parameter server会获取parameters内存的`read_lock`,启动一个新的线程开始保存检查点。如果已经正在执行保存检查点的线程,则忽略。由于对parameters的更新需要获取parameters内存的`write_lock`,所以在写入快照的过程中,parameter server会暂停参数更新并等待。 -2. parameter server生成一个UUID,向指定的目录中一个新的文件(文件名为此UUID)写入快照数据。在快照写入完成后,计算这个文件的MD5 sum。然后在etcd的`/checkpoints/[pserver_id]`中写入json内容:`{"uuid": [UUID], "md5", "MD5 sum", "timestamp": xxxx}`。 -3. 删除磁盘目录中不是当前uuid的快照文件。 -4. 释放对paramters内存的锁定,停止保存检查点的线程。 - -这里需要用户额外注意,在您的实际环境中,训练任务的运行可能会占满trainer和parameter server之间的网络带宽,如果parameter server此时还需要通过网络访问分布式存储以保存快照,可能会造成网络拥塞,而出现阶段性的运行停滞。 - -### 从快照恢复 - -在parameter server第一次启动或任意时间parameter server故障后被Kubernetes重新启动,则需要回滚到上一个检查点: - - 1. 从etcd中读取节点:`/checkpoints/[pserver_id]`获取最新的检查点的文件uuid - 1. 从磁盘文件中加载uuid文件名的检查点快照文件,并加载其中的参数 - 1. 如果上面两步出现错误,则使用启动参数定义的初始化方法初始化参数 - 1. 开始提供服务 - -## TODO List -### 推测执行/加速执行(TODO) -在异构集群中,如果存在某些trainer执行速度过慢会影响整体集群的速度(如图中Trainer 1),此时master将负责启动一个新的Trainer(Accelerate Trainer 2),使用同样的训练数据block。哪个trainer先完成block的训练,则把另一个慢速的kill掉。 - -### 动态扩容/缩容 -目前只考虑动态扩容trainer数量,可以减小系统复杂性。 - -## 术语 -* model: 指深度学习训练之后得到的所有参数,使用这个神经网络可以完成对新数据的预测 -* parameters: 神经网络中的参数,包括权重w和偏置b。一个神经网络的模型由大量的参数组成 -* shard: 分片,通常指将一个整体拆分成多份的其中的一份。 -* model shard: 将一个神经网络参数拆分成多份,每个shard分别存储在其中一台parameter server之上 -* parameter block: 多个parameter block构成一个model shard -* 单点故障: 任意时刻只可能同时有一台服务器故障。由于集群中同时存在两台机器故障的概率极低((平均故障率*平均故障修复时间)^2)只对特殊在线系统考虑两台以上同时故障的容灾。 diff --git a/develop/doc/_sources/design/cluster_train/data_dispatch.md.txt b/develop/doc/_sources/design/cluster_train/data_dispatch.md.txt deleted file mode 100644 index 1f5d22ff5e6abcb576d16cbe7391da1967a1ab8e..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/data_dispatch.md.txt +++ /dev/null @@ -1,160 +0,0 @@ -## 训练数据的存储和分发 - -### 概念解释 - -### 流程介绍 -生产环境中的训练数据集通常体积很大,并被存储在诸如Hadoop HDFS,Ceph,AWS S3之类的分布式存储之上。这些分布式存储服务通常会把数据切割成多个分片分布式的存储在多个节点之上。这样就可以在云端执行多种数据类计算任务,包括: - -* 数据预处理任务 -* Paddle训练任务 -* 在线模型预测服务 -
- -
- -在上图中显示了在一个实际生产环境中的应用(人脸识别)的数据流图。生产环境的日志数据会通过实时流的方式(Kafka)和离线数据的方式(HDFS)存储,并在集群中运行多个分布式数据处理任务,比如流式数据处理(online data process),离线批处理(offline data process)完成数据的预处理,提供给paddle作为训练数据。用户也可以上传labeled data到分布式存储补充训练数据。在paddle之上运行的深度学习训练输出的模型会提供给在线人脸识别的应用使用。 - -### 训练数据存储 -我们选择[CephFS](http://docs.ceph.com/docs/master/cephfs/)作为存储系统。 - -- 无论是从[PFSClient](../file_manager/README.md)的角度,还是从[Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/)中运行任务的角度,统一用`/pfs/$DATACENTER/home/$USER`来访问用户自己的数据。 -- `/pfs/$DATACENTER/common`下存放公共数据集合 - - 做只读挂载 - -
- -
- -### 文件预处理 - - -在开始训练之前, 数据集需要预先被转换成PaddlePaddle分布式训练使用的存储格[RecordIO](https://github.com/PaddlePaddle/Paddle/issues/1947)。我们提供两个转换方式: - -1. 用户在本地转换好再上传 -1. 用户上传数据后,在机群上运行转换程序 - -转换生成的文件名会是以下格式: - -```text -name_prefix-aaaaa-of-bbbbb -``` - -"aaaaa"和"bbbbb"都是五位的数字,每一个文件是数据集的一个shard,"aaaaa"代表shard的index,"bbbbb"代表这个shard的最大index。 - -比如ImageNet这个数据集可能被分成1000个shard,它们的文件名是: -```text -imagenet-00000-of-00999 -imagenet-00001-of-00999 -... -imagenet-00999-of-00999 -``` - -#### 转换库 - -无论是在本地或是云端转换,我们都提供Python的转换库,接口是: -```python -def convert(output_path, reader, num_shards, name_prefix) -``` - -- `output_path`: directory in which output files will be saved. -- `reader`: a [data reader](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md#data-reader-interface), from which the convert program will read data instances. -- `num_shards`: the number of shards that the dataset will be partitioned into. -- `name_prefix`: the name prefix of generated files. - -`reader`每次输出一个data instance,这个instance可以是单个值,或者用tuple表示的多个值: - -```python -yield 1 # 单个值 -yield numpy.random.uniform(-1, 1, size=28*28) # 单个值 -yield numpy.random.uniform(-1, 1, size=28*28), 0 # 多个值 -``` - -每个值的类型可以是整形、浮点型数据、字符串,或者由它们组成的list,以及numpy.ndarray。如果是其它类型,会被Pickle序列化成字符串。 - -### 示例程序 - -#### 使用转换库 - -以下`reader_creator`生成的`reader`每次输出一个data instance,每个data instance包涵两个值:numpy.ndarray类型的值和整型的值: -```python -def reader_creator(): - def reader(): - for i in range(1000): - yield numpy.random.uniform(-1, 1, size=28*28), 0 # 多个值 - return reader -``` - -把`reader_creator`生成的`reader`传入`convert`函数即可完成转换: -```python -convert("./", reader_creator(), 100, random_images) -``` - -以上命令会在当前目录下生成100个文件: -```text -random_images-00000-of-00099 -random_images-00001-of-00099 -... -random_images-00099-of-00099 -``` - -#### 进行训练 - - -PaddlePaddle提供专用的[data reader creator](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md#python-data-reader-design-doc),生成给定`RecordIO`文件对应的data reader。**无论在本地还是在云端,reader的使用方式都是一致的**: - -```python -# ... -reader = paddle.reader.creator.RecordIO("/pfs/datacenter_name/home/user_name/random_images-*-of-*") -batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128) -trainer.train(batch_reader, ...) -``` - -以上代码的reader输出的data instance与生成数据集时,reader输出的data instance是一模一样的。 - -### 上传训练文件 - -使用下面命令,可以把本地的数据上传到存储集群中。 - -```bash -paddle pfs cp filename /pfs/$DATACENTER/home/$USER/folder/ -``` - -比如,把之前示例中转换完毕的random_images数据集上传到云端的`/home/`可以用以下指令: - -```bash -paddle pfs cp random_images-*-of-* /pfs/$DATACENTER/home/$USER/folder/ -``` - -需要`$DATACENTER`的配置写到配置文件中,例如 - -``` -# config file -[datacenter_1] -username=user -usercert=user.pem -userkey=user-key.pem -endpoint=datacenter1.paddlepaddle.org - -[datacenter_2] -username=user -usercert=user.pem -userkey=user-key.pem -endpoint=datacenter2.paddlepaddle.org -``` -## TODO -### 文件访问的权限 -控制用户权限 - -- 用户可以把自己的数据分享给别人 - -### 文件访问方式 -不用mount的方式来访问数据,而是直接用API的接口远程访问 - -例如: - -``` -f = open('/pfs/datacenter_name/home/user_name/test1.dat') -``` - - -### 支持用户自定义的数据预处理job diff --git a/develop/doc/_sources/design/cluster_train/large_model_dist_train.md.txt b/develop/doc/_sources/design/cluster_train/large_model_dist_train.md.txt deleted file mode 100644 index 0c4b5bc24c854b7062d509249bea9c50d42bd5f1..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/large_model_dist_train.md.txt +++ /dev/null @@ -1,101 +0,0 @@ -# Alalysis of large model distributed training in Paddle - -***NOTE: This is only some note for how we implemeted this scheme in V1, not a new design.*** - -## What is it - -We often encounter cases that the embedding layer parameters(sparse) are so large that we can not store it in the trainer's memory when training. So we need to put them to several servers, and fetch them row by row instead of fetch all of the parameters. - -## How to use - -Specify command-line argument like `--loadsave_parameters_in_pserver=true --ports_num_for_sparse=1 --use_old_updater=1` when starting the paddle trainer. And also add something like `--ports_num_for_sparse=1 --pserver_num_threads=5` when starting pserver processes. - -Accrodingly, configure your embedding layers like: - -```python -SPARSE_REMOTE=True - -w1 = data_layer(name="w1", size=dict_size) -emb1 = embedding_layer(input=w1, size=32, param_attr=ParameterAttribute(sparse_update=SPARSE_REMOTE)) -w2 = data_layer(name="w2", size=dict_size) -emb2 = embedding_layer(input=w2, size=32, param_attr=ParameterAttribute(sparse_update=SPARSE_REMOTE)) -... -``` - -## Implementation details - -```c++ -enum MatType { - MAT_NORMAL, - MAT_NORMAL_SHARED, - MAT_VALUE_SHARED, - MAT_SPARSE_ROW_IDS, - MAT_SPARSE_ROW_AUTO_GROW, - MAT_CACHE_ROW, - MAT_SPARSE_ROW, - MAT_SPARSE_ROW_PREFETCH, - MAT_SPARSE_ROW_PREFETCH_FULL_SIZE, -}; -``` - -`MAT_SPARSE_ROW_PREFETCH` is what we use when configured to fetch only row of matrix when training. - -In `trainer_internal.cpp:L93 trainOneBatch`: - -```c++ - if (config_->getOptConfig().use_sparse_remote_updater()) { - REGISTER_TIMER("prefetch"); - gradientMachine_->prefetch(inArgs); - parameterUpdater_->getParametersRemote(); - } -``` - -When doing actual network forward and backward, at the beginning of each batch, the trainer will try to download one row of data from pserver. - -In `trainer/RemoteParameterUpdater.cpp`: `parameterUpdater_->getParametersRemote();`: - -```c++ -if (fullSize) { - ... -} else { -getParams = [&] { - parameterClient_->getParameterSparse( - /* recvParameterType= */ PARAMETER_VALUE, sendBackParameterType); -}; -applyL1 = [](Parameter& para, real decayRate) { - para.getMat(PARAMETER_VALUE)->applyL1(/*lr=*/1.0f, decayRate); -}; -} -``` - -Calling `parameterClient_->getParameterSparse` will do remote call to pserver's `getParameterSparse`: - -```c++ -void ParameterServer2::getParameterSparse(const SendParameterRequest& request, - std::vector& inputBuffers, - SendParameterResponse* response, - std::vector* outputBuffers) { - (void)inputBuffers; - auto& buffer = *readWriteBuffer_; - size_t numReals = 0; - for (const auto& block : request.blocks()) { - numReals += getParameterConfig(block).dims(1); - } - buffer.resize(numReals); - - VLOG(3) << "pserver: getParameterSparse, numReals=" << numReals; - - ReadLockGuard guard(parameterMutex_); - size_t offset = 0; - for (const auto& block : request.blocks()) { - size_t width = getParameterConfig(block).dims(1); - Buffer buf = {buffer.data() + offset, width}; - int type = request.send_back_parameter_type(); - sendBackParameterSparse(block, type, response, &buf, width, outputBuffers); - offset += width; - } -} -``` - -`getParameterConfig(block).dims(1)` returns the width of the current "parameter block"(a shard of parameter object), -then `getParameterSparse` remote call returns only one row of data to the client. diff --git a/develop/doc/_sources/design/cluster_train/master_server.md.txt b/develop/doc/_sources/design/cluster_train/master_server.md.txt deleted file mode 100644 index 4bf3c506f101361875043f8bfd97972b8c981a22..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/master_server.md.txt +++ /dev/null @@ -1,91 +0,0 @@ -# Design Doc: Master Server - -For an overview of master server's role, please refer to [distributed training design doc](./README.md). In this design doc we will discuss the master server in more details. The master will be implemented in [Go](https://golang.org/). - -## Dataset - - - -A dataset is a list of files in *RecordIO* format. A RecordIO file consists of chunks, whereas each chunk consists some records. - -## Task Queue - -As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress. - -### Task Queue Creation - -1. Each trainer will make an RPC call (using Go's [rpc](https://golang.org/pkg/net/rpc/) package) to the master server, telling it the RecordIO files representing the dataset specified by the user. Since every trainer will tell the master server the same dataset, only the first RPC call will be honored. - - The RPC interface is: - ```go - func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error { - } - ``` -1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file. - - The definition of the chunk is: - ```go - type Chunk struct { - Idx int // index of the chunk within the file - Path string - Index recordio.Index // chunk index - } - ``` -1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element. - - The definition of the task is: - ```go - type Task struct { - Index int - Chunks []Chunk - } - ``` - - The elements in the tasks queues is of type `TaskEntry`, containing a timeout counter (described in [task retry logic](#task-retry-logic)), and a task: - ```go - type TaskEntry struct { - NumTimeout int - Task Task - } - ``` - - The definition of task queues is: - ```go - type TaskQueues struct { - Todo []TaskEntry - Pending map[int]TaskEntry // map from task index to task entry - Done []TaskEntry - } - ``` - -### Task Queue Persistence - -The task queues need to be persisted on [etcd](https://github.com/coreos/etcd) for fault recovery. Since the task queues only change once a task is completed or timed out, which is not very frequent, we can afford to synchronize with etcd every time the task queues change. - -We will serialize the task queues data structure with [gob encoding](https://golang.org/pkg/encoding/gob/), compress with gzip, and save into etcd synchronously under key `/task_queues`. - -### Task Dispatch - -The trainer will make an RPC call to master to get a new task when: - -- the trainer first started, or -- the trainer finishes a task. - -The RPC interface is: -```go -func (m *RPCServer) GetTask(finished *Task, result *Task) error { -} -``` -Argument `finished` will be `nil` when the trainer is just started. - -During the RPC call the master will do the following: - -- Make a copy of the task queues, and update the copy reflecting the finished tasks and the new pending tasks. -- Synchronize the copy of task queues with etcd using a transaction conditioned on holding the master lock. -- Replace the task queues with the copy and report to the trainer with the new tasks if succeeded, or discard the copy and report the error to the trainer if failed. - -### Task Retry Logic - -When a task is dispatched to the trainer, the master will schedule a function for execution after the timeout duration (based on the moving average of task completion time). If the task entry in still in the pending queue, its timeout counter will increase by one, and the task will be moved to todo queue. If the timeout counter is above the threshold, the master will log the error and discard the task. - -Please note that since a timed out task could be completed after it has been dispatched for retry, so it is possible for a task to be processed multiple times. We do not try to prevent it from happening since it's fine to train on the same task multiple times due to the stochastic nature of the stochastic gradient decent algorithm. diff --git a/develop/doc/_sources/design/cluster_train/pserver_client.md.txt b/develop/doc/_sources/design/cluster_train/pserver_client.md.txt deleted file mode 100644 index 474b8c572cd92fc87e9f7f3f2b19d12cccd158de..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/pserver_client.md.txt +++ /dev/null @@ -1,171 +0,0 @@ -# Design Doc: The Client Library of Parameter Server - -For an overview of trainer's role, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter server's client library, which will manage communication with parameter servers. The library will be implemented in [Go](https://golang.org/) and made available as a static or dynamic library with a C header file. - -## Parameter Partition - -Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The *sparse parameter* require a little different treatment: - -### Sparse Parameter - -The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector. - -Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention: - -If a sparse parameter is partitioned into n shards, they should be named as: - -```text -name:sparse-0 -name:sparse-1 -... -name:sparse-n-1 -``` - -The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention. - -## Model Optimization Using Gradients - -There are two ways to perform model optimization using gradients: - -- On Client - - The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization). - -- On Parameter Server - - The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients. - -## L1 and L2 Regularization - -PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary. - -## Parameter Initialization - -The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers. - -### Trainer Selection - -To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below: - - - -### Trainer Selection Process - -The trainer select process is encapsulated in the C API function: -```c -int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto); -``` -The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below: - - - -## C Interface - -```c -typedef enum { - PADDLE_ELEMENT_TYPE_INT32 = 0, - PADDLE_ELEMENT_TYPE_UINT32 = 1, - PADDLE_ELEMENT_TYPE_INT64 = 2, - PADDLE_ELEMENT_TYPE_UINT64 = 3, - PADDLE_ELEMENT_TYPE_FLOAT32 = 4, - PADDLE_ELEMENT_TYPE_FLOAT64 = 5, -} paddle_element_type; - -typedef struct { - char* name; - paddle_element_type element_type; - unsigned char* content; - int content_len; -} paddle_parameter, paddle_gradient; - -typedef int paddle_pserver_client; - -/** - * @brief creates a pserver client that talks to etcd for coordination. - */ -paddle_pserver_client paddle_new_etcd_pserver_client(char* etcd_addr); - -/** - * @brief creates a pserver client given pserver addresses. - * - * @param pserver_addrs comma-separated pserver addresses. - * @param selected if current pserver client is selected to initialize all parameter servers. - */ -paddle_pserver_client paddle_new_pserver_client(char* pserver_addrs, int selected); -void paddle_pserver_client_release(paddle_pserver_client c); - -/** - * @brief paddle_begin_init_params begins to initialize parameters on - * parameter servers. - * - * paddle_begin_init_params will be called from multiple trainers, - * only one trainer will be selected to initialize the parameters on - * parameter servers. Other trainers need to get the initialized - * parameters from parameter servers using @paddle_get_params. - * - * @return 1 if the trainer is selected to initialize parameter - * servers, otherwise 0. - */ -int paddle_begin_init_params(paddle_pserver_client client); - -/** - * @brief paddle_init_param initializes the parameter on parameter - * servers. - * - * @param param the parameter to initialize. - * @param param_config_proto the configuration for the parameter. - * @param config_len the length of param_config_proto - * @return 0 if successful, otherwise -1. On failure, the trainer - * needs to restart the entire initialization process (starting from - * @paddle_begin_init_param). Or simply exit the program and wait for - * the cluster management system to restart the trainer. - */ -int paddle_init_param(paddle_pserver_client client, paddle_parameter param, const unsigned char* param_config_proto, int config_len); - -/** - * @brief paddle_finish_init_params tells parameter servers client has - * sent all parameters to parameter servers as initialization. - * - * @return 0 if successful, otherwise -1. On failure, the trainer - * needs to restart the entire initialization process (starting from - * @paddle_begin_init_param). Or simply exit the program and wait for - * the cluster management system to restart the trainer. - */ -int paddle_finish_init_params(paddle_pserver_client client); - -/** - * @brief paddle_send_grads sends gradients to parameter servers for - * updating parameters. - * - * @param grads the array of gradients to send. - * @param len the length of the gradient array. - * @param learning_rate the learning rate for the gradients. - * @return 0 if successful, otherwise -1. - */ -int paddle_send_grads(paddle_pserver_client client, const paddle_gradient* grads, int len); - -/** - * @brief paddle_get_params gets parameters from parameter servers. - * - * paddle_get_params will block until parameters are initialized on - * the parameter servers. - * - * @param dst the destination array of parameter pointers to save to. - * The parameter pointer must be pre-popullated with required parameter name, - * and the content of parameter must be pre-allocated of the size of required - * parameter on pserver. - * @param len the length of the names array and the paddle_parameter - * array. - * @return 0 if successful, otherwise -1. - */ -int paddle_get_params(paddle_pserver_client client, paddle_parameter** dst, int len); - -/** - * @brief paddle_save_model indicates parameters to save the parameter - * to the given path - * - * @param path the path to save parameters. - * @return 0 if successful, otherwise -1. - */ -int paddle_save_model(paddle_pserver_client client, const char* path); -``` diff --git a/develop/doc/_sources/design/cluster_train/remote_parameter_updater.md.txt b/develop/doc/_sources/design/cluster_train/remote_parameter_updater.md.txt deleted file mode 100644 index 6e8e5938455b869e0f3367794c41250340b37f77..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/remote_parameter_updater.md.txt +++ /dev/null @@ -1,21 +0,0 @@ -# Design Doc: Remote Parameter Updater for Cluster Train - -For an overview of distribute training, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter updater that will use parameter server cclient [The Client Library of Parameter Server Design Doc](pserver_client.md) to manage and update parameters. - -## Parameter Updater - -Parameter Updater is used by trainer to manage and update parameter, there are mainly two kind of parameter updater: local and remote, since this design is for cluster train, we will only discuss remote parameter updater here. - -### Remote Parameter Updater - -Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md)) - -In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a instance of parameter updater and call it's functions directly. In this design, we will also expose the api of RemoteParameterUpdater to python with swig. - -#### Sparse Remote Parameter Updater - -Since we will only implement dense parameter management new, the mechanism for sparse parameter will be discussed in next stage. - -### Interface Design - -TBD diff --git a/develop/doc/_sources/design/cluster_train/save_model.md.txt b/develop/doc/_sources/design/cluster_train/save_model.md.txt deleted file mode 100644 index b755185c81ad617b9c85c47de0f5f65d2201c658..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/save_model.md.txt +++ /dev/null @@ -1,111 +0,0 @@ -# Design Doc: Save Model - -## Overview - -The model is the output of the training process. There are two -ways from which user can obtain a model: - -- Save model triggered by user code: user code asks PaddlePaddle to - save a model. -- Convert model from the checkpoint: model being converted from - pservers' periodic checkpoint. In this way, the user can cancel a - job at any time, and still have a relatively fresh model (we - checkpoint around every 5 minutes). - -### Trainer Saving Model vs. Pservers Saving Model - -Both trainers and pservers have access to the model. So the model can -be saved from a trainer or pservers. We need to decide where the model -is saved from. - -#### Dense Update vs. Sparse Update - -There are two types of model update methods: dense update and sparse -update (when the model parameter is configured to be sparse). - -- Dense update - - Every trainer has it's own full copy of the model. Every model - update will update the entire model. - -- Sparse update - - The training input is sparse, and the trainer does not have the - entire model. It will only download the sub-model necessary related - to the input. When updating the model, only the sub-model related to - the training input is updated. - - -#### Pservers Saving Model - -The benefit of letting pservers save model is they have the entire -model all the time. However, since pservers are on different nodes, it -requires a merging process to merge model shards into the same -model. Thus requires the pservers to write models to a distributed -filesystem, making the checkpoint shards visible to the merge program. - -#### Trainer Saving Model - -The benefit of letting one trainer to save the model is it does not -require a distributed filesystem. And it's reusing the same save model -logic when training locally - except when doing sparse update, the -trainer needs to download the entire model during the saving process. - -#### Conclusion - -Given trainer saving model does not require a distributed filesystem, -and is an intuitive extension to trainer saving model when training -locally, we decide to let the trainer save the model when doing -distributed training. - - -### Convert Model from Checkpoint - -TODO - - -## Timeline - -We first implement trainer save the model. Converting the latest -snapshot to a model will be a TODO for future. - - -## Trainer Save Model - -### Trainer Election - -One trainer will be elected as the one to save the model. When using -etcd, trainer ID is a randomly generated UUID, the trainer will -contact the master server requesting to save the model, and find out -if itself is elected. When the master server is not used, unique -trainer IDs will be given by the administrator, the trainer whose ID -is "0" is elected to save the model. - -### Model Save Path - -Each trainer will be given the directory to save the model. The -elected trainer will save the model to -`given-directory/trainerID`. Since the trainer ID is unique, this -would prevent concurrent save to the same file when multiple trainers -are elected to save the model when split-brain problem happens. - -### What Happens When Model Is Saving - -It takes some time to save model, we need to define what will happen -when save model is taking place. - -When doing dense update, the trainer uses the local model. Pservers -does not need to pause model update. - -When doing sparse update. The trainer needs to download the entire -model while saving. To get the most accurate model, the model update -needs to be paused before the download starts and resumed after the -download finishes. Otherwise, the trainer gets a model that is -"polluted": some part of the model is old, some part of the model is -new. - -It's unclear that the "polluted" model will be inferior due to the -stochastic nature of deep learning, and pausing the model update will -add more complexity to the system. Since supporting sparse update is a -TODO item. We defer the evaluation of pause the model update or not -during saving model to the future. diff --git a/develop/doc/_sources/design/cluster_train/submit-job.md.txt b/develop/doc/_sources/design/cluster_train/submit-job.md.txt deleted file mode 100644 index 8377d5489dc64bd2fdc5bb4f7bc737e7b489000d..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cluster_train/submit-job.md.txt +++ /dev/null @@ -1,127 +0,0 @@ -# Submit a Distributed Training Job - -The user can submit a distributed training job with Python code, rather than with a command-line interface. - -## Runtime Environment On Kubernetes - -For a distributed training job, there is two Docker image called *runtime Docker image* and *base Docker image*. The runtime Docker image is the Docker image that gets scheduled by Kubernetes to run during training. The base Docker image is for building the runtime Docker image. - -### Base Docker Image - -Usually, the base Docker image is PaddlePaddle product Docker image including paddle binary files and python package. And of course, users can specify any image name hosted on any docker registry which users have the access right. - -### Runtime Docker Image - -The trainer package which user upload and some Python dependencies are packaged into a runtime Docker image based on base Docker image. - -- Handle Python Dependencies - - You need to provide requirements.txt file in your `trainer-package` folder. Example: - - ```txt - pillow - protobuf==3.1.0 - ``` - More [details](https://pip.readthedocs.io/en/1.1/requirements.html) about requirements, an example project looks like: - ```bash - paddle_example - |-quick_start - |-trainer.py - |-dataset.py - |-requirements.txt - ``` - -## Submit Distributed Training Job With Python Code - - -- `paddle.job.dist_train()` will call the Job Server API `/v1/packages` to upload the trainer package and save them on CephFS, and then call `/v1/trainer/job` to submit the PaddlePaddle distributed job. -- `/v1/trainer/job` will start a building job for preparing the runtime Docker image. When the building job is finished, Job Server will submit the PaddlePaddle distributed job to Kubernetes. -- *NOTE*: For the first version, we will not prepare the runtime Docker image, instead, the package is uploaded to Paddle Cloud, and Paddle Cloud will mount the package in a temporary folder into the base Docker image. We will not support custom Python dependencies in the first version as well. - -You can call `paddle.job.dist_train` and provide distributed training configuration as the parameters: -```python -paddle.job.dist_train( - trainer=dist_trainer(), - paddle_job=PaddleJob( - job_name = "paddle-cloud", - entry_point = "python %s"%__file__, - trainer_package = "/example/word2vec", - image = "yancey1989/paddle-job", - trainers = 10, - pservers = 3, - trainer_cpu = 1, - trainer_gpu = 1, - trainer_mem = "10G", - pserver_cpu = 1, - pserver_mem = "2G" - )) -``` - -The parameter `trainer` of `paddle.job.dist_train` is a function and you can implement it as follows: -```python -def dist_trainer(): - def trainer_creator(): - trainer = paddle.v2.trainer.SGD(...) - trainer.train(...) - return trainer_creator -``` - -The pseudo code of `paddle.job.dist_train` is as follows: -```python -def dist_train(trainer, paddle_job): - # if the code is running on cloud, set PADDLE_ON_CLOUD=YES - if os.getenv("RUNNING_ON_CLOUD", "NO") == "NO": - #submit the paddle job - paddle_job.submit() - else: - #start the training - trainer() -``` -### PaddleJob Parameters -parameter | type | explanation - --- | --- | --- -job_name | str | the unique name for the training job -entry_point | str | entry point for startup trainer process -trainer_package | str | trainer package file path which user have the access right -image|str|the [base image](#base-docker-image) for building the [runtime image](#runtime-docker-image) -pservers|int| Parameter Server process count -trainers|int| Trainer process count -pserver_cpu|int| CPU count for each Parameter Server process -pserver_mem|str| memory allocated for each Parameter Server process, a plain integer using one of these suffixes: E, P, T, G, M, K -trainer_cpu|int| CPU count for each Trainer process -trainer_mem|str| memory allocated for each Trainer process, a plain integer using one of these suffixes: E, P, T, G, M, K -trainer_gpu|int| GPU count for each Trainer process, if you only want CPU, do not set this parameter - -### Deploy Parameter Server, Trainer and Master Process - - Deploy PaddlePaddle Parameter Server processes, it's a Kubernetes ReplicaSet. - - Deploy PaddlePaddle Trainer processes, it's a Kubernetes Job. - - Deploy PaddlePaddle Master processes, it's a Kubernetes ReplicaSet. - -## Job Server - -- RESTful API - - Job server provides RESTful HTTP API for receiving the trainer package and displaying - PaddlePaddle job related informations. - - `POST /v1/package` receive the trainer package and save them on CephFS - - `POST /v1/trainer/job` submit a trainer job - - `GET /v1/jobs/` list all jobs - - `GET /v1/jobs/` the status of a job - - `DELETE /v1/jobs/` delete a job - - `GET /v1/version` job server version - -- Build Runtime Docker Image on Kubernetes - - `paddle.job.dist_train` will upload the trainer package to Job Server, save them on the distributed filesystem, and then start up a job for building the runtime Docker image that gets scheduled by Kubernetes to run during training. - - There are some benefits for building runtime Docker image on JobServer: - - On Paddle Cloud, users will run the trainer code in a Jupyter Notebook which is a Kubernetes Pod, if we want to execute `docker build` in the Pod, we should mount the host's `docker.sock` to the Pod, user's code will connect the host's Docker Engine directly, it's not safe. - - Users only need to upload the training package files, does not need to install docker engine, docker registry as dependencies. - - If we want to change another image type, such as RKT, users do not need to care about it. - -- Deploy Parameter Server, Trainer and Master Processes - - `POST /v1/trainer/job` receives the distributed training parameters, and deploy the job as follows: - - Deploy PaddlePaddle Parameter Server processes, it's a Kubernetes ReplicaSet. - - Deploy PaddlePaddle Trainer processes, it's a Kubernetes Job. - - Deploy PaddlePaddle Master processes, it's a Kubernetes ReplicaSet. diff --git a/develop/doc/_sources/design/concurrent_programming.md.txt b/develop/doc/_sources/design/concurrent_programming.md.txt deleted file mode 100644 index f022e67fd3a048cd7e53c91d9a1fd0506487b665..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/concurrent_programming.md.txt +++ /dev/null @@ -1,163 +0,0 @@ -# Design Doc: Concurrent Programming with Fluid - -With PaddlePaddle Fluid, users describe a program other than a model. The program is a [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto) protobuf message. TensorFlow/MxNet/Caffe2 applications generate protobuf messages too, but their protobuf messages represent the model, a graph of operators, but not the program that trains/uses the model. - -Many know that when we program TensorFlow, we can specify the device on which each operator runs. This allows us to create a concurrent/parallel AI application. An interesting questions is **how does a `ProgramDesc` represents a concurrent program?** - -The answer relies on the fact that a `ProgramDesc` is similar to an abstract syntax tree (AST) that describes a program. So users just program a concurrent program that they do with any concurrent programming language, e.g., [Go](https://golang.org). - -## An Analogy - -The following table compares concepts in Fluid and Go - -| Go | Fluid | -|----|-------| -|user-defined functions | [layers](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid) | -| control-flow and built-in functions | [intrinsics/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators) | -| goroutines, channels | [class ThreadPool](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework/thread_pool.h) | -| runtime | [class Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h) | - -## An Example Concurrent Program - -To review all above concepts in an example, let us take a simple program and writes its distributed version. - -Suppose that we want to parallelize a naive Fluid program (written in Go and calling Fluid's Go binding) that multiplies two tensors. - -```go -import "fluid" - -func paddlepaddle() { - X = fluid.read(...) - W = fluid.Tensor(...) - Y = fluid.mult(X, W) -} -``` - -Please be aware that the Fluid's Go binding provides the default `main` function, which calls the `paddlepaddle` function, which, in this case, is defined in above program and creates the following `ProgramDesc` message. - -```protobuf -message ProgramDesc { - block[0] = Block { - vars = [X, W, Y], - ops = [ - read(output = X) - assign(input = ..., output = W) - mult(input = {X, W}, output = Y) - ], - } -} -``` - -Then, the default `main` function calls `fluid.run()`, which creates an instance of the [`class Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h) and calls `Executor.Run(block[0])`, where `block[0]` is the first and only block defined in above `ProgramDesc` message. - -The default `main` function is defined as follows: - -```go -func main() { - paddlepaddle() - fluid.run() -} -``` - -## The Concurrent Version - -By parallelizing the above program, we could support very big tensor X by splitting into small pieces {x_1, x_2, ...} and sent each piece to worker process/node for parallel multiplication. - -In this case, we can write a transpiler that takes a `ProgramDesc` message that represents the above example program and outputs two `ProgramDesc` messages, one for running on the master process/node, and the other one for worker processes/nodes. - -### The Master Program - -The master program could look like the following: - -```protobuf -message ProgramDesc { - block[0] = Block { - vars = [X, L, Y], - ops = [ - read(output = X) - kube_get_workers_addrs(output = L) - Y = tensor_array(len(L)) - parallel_for(input = X, output = Y, - attrs = {L, block_id(1)}) # referring to block 1 - ] - } - - block[1] = Block { - parent = 0, - vars = [x, y, index], - ops = [ - slice(input = [X, index], output = x) # index is initialized by parallel_for - send(input = x, attrs = L[index]) - recv(outputs = y, attrs = L[index]) - assign(input = y, output = Y[index]) - ] - } -} -``` - -The equivalent Fluid program (calling the Go binding) is: - -```go -func main() { //// block 0 - X = fluid.read(...) - L = fluid.k8s.get_worker_addrs() - Y = fluid.tensor_array(len(L)) - fluid.parallel_for(X, L, - func(index int) { //// block 1 - x = X[index] - fluid.send(L[index], x) - y = fluid.recv(L[index]) - Y[index] = y - }) -} -``` - -An explanation of the above program: - -- `fluid.k8s` is a package that provides access to Kubernetes API. -- `fluid.k8s.get_worker_addrs` returns the list of IP and ports of all pods of the current job except for the current one (the master pod). -- `fluid.tensor_array` creates a [tensor array](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor_array.h). `fluid.parallel_for` creates a `ParallelFor` intrinsic, which, when executed, - - 1. creates `len(L)` scopes, each for the concurrent running of the sub-block (block 1 in this case), and initializes a variable named "index" in the scope to an integer value in the range `[0, len(L)-1]`, and - 2. creates `len(L)` threads by calling into the `ThreadPool` singleton, each thread - 1. creates an Executor instance, and - 2. calls `Executor.Run(block)`, where `block` is block 1 as explained above. -1. Please be aware that block 1 is a sub-block of block 0, so ops in block 1 could refer to variables defined in block 0. - -### The Worker Program - -The worker program looks like - -```go -func main() { - W = Tensor(...) - x = fluid.listen_and_do( - fluid.k8s.self_addr(), - func(input Tensor) { - output = fluid.mult(input, W) - }) -} -``` - -where - -- `fluid.listen_and_do` creates a `ListenAndDo` intrinsic, which, when executed, - 1. listens on the current pod's IP address, as returned by `fliud.k8s.self_addr()`, - 2. once a connection is established, - 1. creates a scope of two parameters, "input" and "output", - 2. reads a [Fluid variable](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h) and saves it into "input", - 3. creates an Executor instance and calls `Executor.Run(block)`, where the block is generated by running the lambda specified as the second parameter of `fluid.listen_and_do`. - -## Summarization - -From the above example, we see that: - -1. Fluid enables the imperative programming paradigm by: - 1. letting users describe a program, but not a model (a sequence of layers, or a graph of operators), and - 2. call the `fluid.run` function that runs the program implicitly. -1. The program is described as a `ProgramDesc` protobuf message. -2. Function `Executor.Run` takes a block, instead of a `ProgramDesc`, as its parameter. -3. `fluid.run` calls `Executor.Run` to run the first block in the `ProgramDesc` message. -4. `Executor.Run`'s implementation is extremely simple -- it doesn't plan the execution nor create threads; instead, it runs on the current thread and execute intrinsics/operators' `Run` method sequentially as they appear in the `Block.ops` array. -5. Intrinsics/operators' `Run` method might create threads. For example, the `ListenAndDo` operator creates a thread to handle each incoming request. -6. Threads are not necessarily OS thread; instead, they could be [green threads](https://en.wikipedia.org/wiki/Green_threads) managed by ThreadPool. Multiple green threads might run on the same OS thread. An example green threads is Go's [goroutines](https://tour.golang.org/concurrency/1). diff --git a/develop/doc/_sources/design/cpp_data_feeding.md.txt b/develop/doc/_sources/design/cpp_data_feeding.md.txt deleted file mode 100644 index 40205350f99722f0b71bfa6f390fe9d01d831966..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/cpp_data_feeding.md.txt +++ /dev/null @@ -1,79 +0,0 @@ -# C++ Data Feeding - -In training with Paddle V2 API, data feeding wholly dependents on Python code. To get rid of the Python environment and achieve the goal of "wrapping the whole training by a while loop op" in Paddle Fluid, a C++ data feeding mechanism is required. - -In this document we show the fundamental design of C++ data feeding process, which includes the data reading, shuffling and batching. - -## Reader - -A new concept named 'Reader' is introduced. `Reader` is a series of inherited classes which can be hold by our `Variable` and they are used to read or process file data. - - -### `ReaderBase` - -`ReaderBase` is the abstract base class of all readers. It defines the all readers' interfaces. - -```cpp -class ReaderBase { - public: - explicit ReaderBase(const std::vector& shapes) : shapes_(shapes) { - PADDLE_ENFORCE(!shapes_.empty()); - } - // Read the next batch of data. (A 'batch' can be only one instance) - virtual void ReadNext(std::vector* out) = 0; - // Show whether the next bacth exists. - virtual bool HasNext() const = 0; - - // Reinitialize the reader and read the file from the begin. - virtual void ReInit() = 0; - - // Get a certain read in data's shape. - DDim shape(size_t idx) const; - // Get shapes of all read in data. - std::vector shapes() const { return shapes_; } - // Set shapes of read in data. - void set_shapes(const std::vector& shapes) { shapes_ = shapes; } - - virtual ~ReaderBase() {} - - protected: - std::vector shapes_; -}; -``` - -### `FileReader` and `DecoratedReader` - -These two classes are derived from the `ReaderBase` and will further be derived by respective specific readers. That is to say, in our design, there are two kinds of readers: file readers and decorated readers. A file reader reads from a file of some specific format, and yield only one instance of data at a time. e.g. RecordIO reader, jpg reader, .... A decorated reader takes another reader(both file reader and decorated reader are OK) as its 'underlying reader'. It gets data from its underlying reader, does some process on them(shuffling, or batching), then yields processed data. The output data of a decorated reader can be a single instance or a batch. `ShuffleReader` and `BatchReader` are both decorated readers. - -All the readers share exactly the same interfaces defined in `ReaderBase`. So they can be decorated for more than one time: We can **shuffle** a reader's outputs and then **batch** the shuffle outputs. The interface consistency also allows related ops use readers without knowing what they are exactly. - - -### `ReaderHolder` - -Different readers belong to different class types. It leads to a problem: How can we drop them into `Variable`s and fetch them out by a unified method? For example, if a Variable holds a `BatchReader`, we can not get it by the following code: - -```cpp -var->Get("batch_reader"); -``` - -we have to write: - -```cpp -var->Get("batch_reader"); -``` - -This requires each time getting a reader from a variable we must know the reader's type exactly. It is nearly impossible. - -To solve this problem, we introduce `ReaderHolder` as a wrapper. It acts as an empty decorator of `ReaderBase`, which erases reader's type. With `ReaderHolder` we are able to fetch all types of readers by `var->Get("...")` and regard the obtained object as a reader. - -## Related Operators - -To create and invoke readers, some now ops are introduced: - -### `CreateReaderOp` - -Each reader has its creating op. File readers' creating ops have no input and yield the created file reader as its output. Decorated readers' creating ops take the underlying readers as inputs and then yield new decorated readers. - -### `ReadOp` - -A reader is only a Variable. It cannot trigger the reading process by itself. So we add the `ReadOp` to execute it. A `ReadOp` takes a reader Variable as its input. Each time it runs, it invokes the reader‘s `ReadNext()` function and gets a new batch of data(or only one instance of data, if we use file reader directly). The output data of a reader are in the form of `std::vector`, so the `ReadOp` also needs to split the vector and move LoDTensors to their respective output Variables. diff --git a/develop/doc/_sources/design/csp.md.txt b/develop/doc/_sources/design/csp.md.txt deleted file mode 100644 index 10d936860fab7e09241e968a63526c7d86d3e568..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/csp.md.txt +++ /dev/null @@ -1,224 +0,0 @@ -# Design Doc: CSP in PaddlePaddle Fluid - -## Motivation - -Concurrent programming is important for deep learning. Few example applications are: - -1. The main thread keeps reading the next mini-batch while another thread uses the GPU for computing. -2. The main thread performs the computation while another thread uploads the local gradients from each trainer to the parameter server. - -Most DL systems, including TensorFlow, Caffe2, and MxNet, can asynchronously execute operators in a graph. However, Fluid doesn't have the concept of a graph at all, as the design goal of Fluid is that of a programming language. - -## Concurrent Programming Models - -There were many concurrent programming models, implemented in various forms: - -| concurrent programming model | implementation | -|-----|-----| -| mutex | types and functions in standard libraries | -| semaphore | types and functions in standard libraries | -| communicating sequential processes (CSP) | Go programming language | -| actor model | Erlang programming language | -| message passing | MPI | -| bulk synchronous parallel (BSP) | Pregel distributed programming framework | - -Since Fluid was designed to be a programming language, we would like to implement CSP in Fluid. - -### CSP v.s. Actor Model - -A well-known implementation of Actor Model is the Erlang programming language. In Actor Model, *processes* could send messages to another process and receive messages from another process given the process IDs. We can find the three ingredients, process with ID, send, and recv, in MPI too. Indeed, we can rewrite Erlang programs in Python + MPI with possibly fewer lines of code. Our concern with Actor Model is that it doesn't seem reasonable to implement process management in a programming language's runtime library; instead, it should be the operating systems' responsibility to manage processes and libraries like MPI for send/recv. - -## CSP in Fluid - -Fluid has two fundamental control-flows: *if-else* and *while*. If we are to implement CSP, we need the following: - -1. a new data type: *channel* and operators *send* and *recv*, -1. *goroutine* or thread, and -1. a new control-flow: select. - -We also need Python wrappers for the above components. - -The type *channel* is conceptually the blocking queue. In Go, its implemented is a [blocking circular queue](https://github.com/golang/go/blob/68ce117cf17b8debf5754bfd476345779b5b6616/src/runtime/chan.go#L31-L50), which supports send and recv. - -The `select` operation has been in OS kernels long before Go language. All Unix kernels implement system calls *poll* and *select*. They monitor multiple file descriptors to see if I/O is possible on any of them. This takes O(N) time. Since Linux 2.6, a new system call, *epoll*, can do the same in O(1) time. In BSD systems, there is a similar system call *kqueue*. Go's Linux implementation uses epoll. - -It might be a good idea to implement Fluid's select using epoll too. In this design doc, we start from the O(N) way so that we could focus on Python binding and the syntax. - -### Type Channel - -Fluid supports many data types: - -1. Tensor, -1. Row-sparse Tensor -1. LoD Tensor, -1. Tensor array, etc - -Each data type is registered in the [`framework.proto`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L117-L127) as an enum value. To add a new type channel, we need to add a new type enum. - -To expose a C++ type to Python, we need to edit the [`pybind.cc`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc) file. [Here](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc#L120-L164) is an example how we expose C++ class LoDTensor. - -## Syntax Design - -### Create Channel - -In Go, we create a channel by specifying the element type and buffer size: - -```go -ch := make(chan int) // a channel without buffer -ch1 := make(chan int, 100) // a channel that can buffer 100 ints. -``` - -In Fluid, we should be able to do the same: - -```python -ch = fluid.make_channel(dtype=INT) -ch1 = fluid.make_channel(dtype=INT, 100) -``` - -In addition to that, we want channels that can hold more complex element types, e.g., Tensors of float16: - -```python -ch = fluid.make_channel(dtype=Tensor, etype=float16) -``` - -or Tensors of Tensors of float16 etc. - -The point here is that we need a consistent way to compose types, like in C++ we can have `Tensor...> >`. - -### Send and Recv - -Go's CSP implementation depends on data type *channel*. There are two types of channels: - -1. The unblocked channel, or buffered channel, is a blocking queue with a non-zero sized buffer. The sending to buffered channel blocks if the buffer is full, and the receive operation blocks if the buffer is empty. -1. blocked channel, or unbuffered channel, is a blocking queue with no buffer. Both sending and receiving block with unbuffered channels. - -There are four types of actions with a channel: - -1. Create a channel - - ```go - ch := make(chan int) // this is an unbuffered channel - ch := make(chan int, 100) // this is a buffered channel of 100 ints. - ``` - -1. Send - - ```go - ch <- 111 - ``` - -1. Recv - - ```go - y, ok <- ch - ``` - -1. Close - - ```go - close(ch) - ``` - - Please be aware that a closed channel is not a nil channel, which is `var ch chan int`. - -There are some [axioms with channels](https://dave.cheney.net/2014/03/19/channel-axioms): - -1. A send to a nil channel blocks forever - -1. A receive from a nil channel blocks forever - -1. A send to a closed channel panics - -1. A receive from a closed channel returns the residual values and then zeros. - -In Fluid, we have [buffered channels](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/details/buffered_channel.h) and [unbuffered channels](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/details/unbuffered_channel.h) - -The following program illustrates the Python syntax for accessing Fluid buffers. - -```python -import fluid - -buffer_size = 10 -ch = fluid.make_channel(dtype=INT, buffer_size) - -# Now write three elements to the channel -with fluid.while(steps=buffer_size): - fluid.send(ch, step) - -fluid.close_channel(ch) - -with fluid.while(steps=buffer_size): - fluid.print(fluid.recv(ch)) -``` - -The following example shows that to avoid the always-blocking behavior of unbuffered channels, we need to use Fluid's goroutines. - -```python -import fluid - -ch = fluid.make_channel(dtype=INT) - -with fluid.go(): - fluid.send(ch) - -y = fluid.recv(ch) - -fluid.close_channel(ch) -``` - -### Select - -In Go, the `select` statement lets a goroutine wait on multiple communication operations. A `select` blocks until one of its cases can run, then it executes that case. It chooses one at random if multiple are ready. - -```go - -ch1 := make(chan int) -ch2 := make(chan int, 100) - -x := 0 - -for { - select { - case ch1 <- x: - x := x + 1 - case y <- ch2: - fmt.Println("Received on channel") - default: - fmt.Println("Default") - } - } - -``` - -In Fluid, we should be able to do the same: - -```python -ch1 = fluid.make_chan(dtype=INT) -ch2 = fluid.make_chan(dtype=INT, 100) - -sel = fluid.select() - -with sel.case(ch1, 'w', X): - fluid.layers.increment(X) - -with sel.case(ch2, 'r', Y): - fluid.print("Received on Channel") - -with sel.default(): - fluid.print("Default") - -``` - -In the above code snippet, `X` and `Y` are variables. Now let us look at each of these statements one by one. - -- `sel.case(ch1, 'w', X)` : This specifies that we are writing to `ch1` and we want to write the integer in variable `X` to the channel. The character `w` is used here to make the syntax familiar to write syntax in Python I/O. - -- `sel.case(ch2, 'r', Y)` : This specifies that we would like to read the result from `ch2` into variable `Y`. The character `r` is used here to make the syntax familiar to read syntax in Python I/O. - -- `sel.default()` : This is equivalent to the default in Go `select`. If none of the channels are ready for read or write, then the fluid code in the default block will be executed. - -## Example Programs - -### 1. RPC between Trainers and Parameter Servers - -### 2. Concurrent Minibatch Loading diff --git a/develop/doc/_sources/design/dist_refactor/distributed_architecture.md.txt b/develop/doc/_sources/design/dist_refactor/distributed_architecture.md.txt deleted file mode 100644 index 9368c5780dc922953f38bf0f86d9f797a4a8a6fe..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/dist_refactor/distributed_architecture.md.txt +++ /dev/null @@ -1,197 +0,0 @@ -# Design Doc: Distributed Training Architecture - -## Abstract - -PaddlePaddle version 0.10.0 uses the "trainer-parameter server" architecture. We run multiple instances of trainers (where each trainer runs the same model) and parameter servers for distributed training. This architecture serves well, but has few limitations: - -1. There is a need to write special code that handles tasks which should only be run on a single trainer. E.g., initializing the model, saving the model etc. - -2. Model parallelism is hard: It would need all the if-else branches conditioned on the trainer ID to partition the model onto the trainers, and eventually manually writing out the inter-model-shard communication code to communicate between different trainers. - -3. The user can not directly specify the parameter update rule: This would need to modify the parameter server code and compile a new binary. This makes things more complicated for researchers: A lot of extra effort is required to make this work. Besides, the training job submission program may not allow running arbitrary binaries. - -This design doc discusses PaddlePaddle's new distributed training architecture that addresses the above mentioned limitations. - -## Analysis - -The assumption is that the user writes the trainer program in either Python or C++. - -### Limitation 1 - -There are two basic functionalities in the trainer program: - -1. The training logic such as loading / saving the model and printing out the logs. -2. The neural network definition such as the definition of the data layer, the fully connected layer, the cost function and the - optimizer. - -When we train using PaddlePaddle v0.10.0 in a distributed fashion, multiple instances of the same Python code are run on different nodes, hence both: the -training logic as well as the neural network computation logic, is replicated. - -The tasks that only need to be run once belong to the training logic. Hence if we only replicate the neural network computation part, and do **not** -replicate the training logic, the limitation mentioned above can be avoided. - -### Limitation 2 - -Model parallelism means that a single model is partitioned into different components and each node runs one of the component separately. This comes at the extra cost of managing the -inter-model-shard communication between nodes. - -PaddlePaddle should ideally be able to modify the neural network computation and figure out the support for model parallelism automatically. However, the -computation is only specified in Python code which sits outside of PaddlePaddle, hence PaddlePaddle can not support the feature in this setup. - -Similar to how a compiler uses an intermediate representation (IR) so that the programmer does not need to manually optimize their code for most of the cases, we can have an intermediate representation in PaddlePaddle as well. The compiler optimizes the IR as follows: - - - -PaddlePaddle can support model parallelism by converting the IR so that the user no longer needs to manually perform the computation and operations in the Python component: - - - -The IR for PaddlePaddle after refactoring is called a `Block`, it specifies the computation dependency graph and the variables used in the computation. - -### Limitation 3 - -The user can not directly specify the parameter update rule for the parameter server in the Python module, since the parameter server does not use the same computation definition as the trainer. Instead, the update rule is baked inside the parameter server. The user can not specify the update rule explicitly. - -This could be fixed by making the parameter server also run an IR, which can be different to the trainer side -For a detailed explanation, refer to this document - -[Design Doc: Parameter Server](./parameter_server.md) - -## Distributed Training Architecture - -The revamped distributed training architecture can address the above discussed limitations. Below is the illustration of how it does so: - - - -The major components are: *Python API*, *Distribute Transpiler* and *Remote Executor*. - -### Python API - -Python API is the Python library that user's Python code invokes, to read the data, build the neural network topology, and start training, etc. - -```Python -images = fluid.layers.data(name='pixel', shape=[1, 28, 28], dtype='float32') -label = fluid.layers.data(name='label', shape=[1], dtype='int64') -... -predict = fluid.layers.fc(input=conv_pool_2, size=10, act="softmax") -cost = fluid.layers.cross_entropy(input=predict, label=label) -avg_cost = fluid.layers.mean(x=cost) -optimizer = fluid.optimizer.Adam(learning_rate=0.01) -optimizer.minimize(avg_cost) - -train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=500), - batch_size=BATCH_SIZE) - -place = fluid.CPUPlace() -exe = fluid.Executor(place) - -for pass_id in range(10): - for data in train_reader(): - loss, acc = exe.run(trainer_prog, - feed=feeder.feed(data), - fetch_list=[avg_cost]) -``` - -The code above is a typical local training program, the "Training Program" is built using helper functions such as -`fluid.layer.fc`. The training is done by calling `Executor.run` -iteratively. - -For more details, the implementation of IR is [Program](../program.md), and `ProgramDesc` is the protobuf type. - -[Executor](../executor.md) simply runs the `ProgramDesc`. For local training you generally use -`Executor` to run the program locally. For any kind of distributed training, you can use -`RemoteExecutor` to specify desired distributed training method with some optional arguments. - -### Distributed Transpiler - -The Distributed Transpiler automatically converts the IR (in protobuf format) to partitioned IRs. Then -the Remote Executor dispatches the new IRs to Remote Executors across the cluster. -Below are the steps that are followed : - -1. User only need to change `Executor` to `RemoteExecutor` to change local program to distributed program. -1. `RemoteExecutor` calls `Distributed Transpiler` to "transpile" user's program to several IRs representing a - distributed training program: - 1. Parse configurations from `RemoteExecutor`. - 1. Determine the type of distributed program, can be DataParallelism, ModelParallelism or Streaming. - 1. Partition the `ProgramDesc` according to type and add `send` / `recv` OP pair on the boundaries. Take - DataParallelism type for example, it removes the optimization operators and add a `send` OP to the - "trainer" role, then add the optimization operators to the parameter server role within the `recv` OP. -1. Dispatch the partitioned graph to different `RemoteExecutor` in the cluster. -1. `RemoteExecutor` on each node run the received `ProgramDesc` utill the end. - - -### RemoteExecutor - -As shown in the graph, `RemoteExecutor.run` sends the IR to the cluster for Execution. -You can also use parameter `fetch_list` to interactively fetch variable back to local for -log printing. - -The Python `RemoteExecutor` is derived from `Executor` class. - -```python -exe = RemoteExecutor( - feed=feeder.feed(data), - fetch_list=[avg_cost], - job_desc=JobDesc( - jobname, - num_trainer, - num_pserver, - cpu_per_trainer, - gpu_per_trainer, - mem_per_trainer, - cpu_per_pserver, - mem_per_pserver - )) -for data in train_reader(): - loss, acc = exe.run(trainer_prog, - feed=feeder.feed(data), - fetch_list=[avg_cost]) -``` - -`JobDesc` object describe the distributed job resource specification to run on -Cluster environment. - - - -`RemoteExecutor.run` sends the `ProgramDesc` and -[TrainingJob](https://github.com/PaddlePaddle/cloud/blob/develop/doc/autoscale/README.md#training-job-resource) -to a server in the cluster which executes `RemoteExecutor.listen`. This server is responsible -to start the final Kubernetes Jobs to run the different role of `ProgramDesc` from `ConfigMap`. - - -### Placement Algorithm - -Our first implementation will only support "trainer-parameter server" placement: the parameters, initializers, and optimizers are all placed on the PaddlePaddle runtimes with the parameter server role. Everything else will be placed on the PaddlePaddle runtimes with the trainer role. This has the same functionality as the "trainer-parameter server" architecture of PaddlePaddle v0.10.0, but is more generic and flexible. - -In the future, a more general placement algorithm should be implemented, which makes placements according to the input IR, and a model of device computation time and device communication time. Model parallelism requires the generic placement algorithm. - - -### Local Training Architecture - -The local training architecture will be the same as the distributed training architecture, the difference is that everything runs locally, and there is just one PaddlePaddle runtime: - - - - -### Training Data - -In PaddlePaddle v0.10.0, training data is typically read -with [data reader](../reader/README.md) from Python. This approach is -no longer efficient when training distributedly since the Python -process no longer runs on the same node with the trainer processes, -the Python reader will need to read from the distributed filesystem -(assuming it has the access) and send to the trainers, doubling the -network traffic. - -When doing distributed training, the user can still use Python data -reader: the training data are sent with `Executor.run`. However, should -be used for debugging purpose only. The users are encouraged to use -the read data OPs. - - -## References: - -[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf) - -[2] [TensorFlow: A System for Large-Scale Machine Learning](https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf) diff --git a/develop/doc/_sources/design/dist_refactor/multi_cpu.md.txt b/develop/doc/_sources/design/dist_refactor/multi_cpu.md.txt deleted file mode 100644 index a8d8ee0422acc84835170a44eb83f9b5f0c6bb40..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/dist_refactor/multi_cpu.md.txt +++ /dev/null @@ -1,43 +0,0 @@ -# Design Doc: Execute the Program with Multi CPU - -## Abstract - -This Design Doc propose an approach to make the user-defined Op graph -running with multi-CPU, we will use an auto transpiler to convert the user-defined -Op graph to a multi-CPU Op graph, and run `ParallelDo` Op to run the graph. - -## Transpiler - - - -After converted: - - - -## Implement - -- `Multi-CPU Transpiler` will convert the graph to a multi-CPU graph - which would be executed with multi-threads. -- `BlockingCounter` will `Init/Decrement` an atomic counter, and Blocking `Wait` - for the atomic counter become `0`: - ```cpp - BlockingCounter bc(thread_count); - for (int i = 0; i < thread_count; ++i) { - thread_pool->Start([&bc] {bc.DecrementCount(); }) - } - bc.Wait(); - ``` -- `ParallelDo` Operator - - Initialize a thread pool which is a Singleton. - - Use a block id as the input, and create run the specify Block on independent scope - with multi-threads. - - Initialize a `BlockingCounter` instance and wait until all threads are done. -- `Split` Operator will split the Input Tensor into a TensorArray. -- `Merge` merge all the gradients which calculated in different threads - with `mean/sum/max/min...` method, and then run the Optimizer Op to optimize `W`. - -## TODO - -- Improve the optimizer stage with multi-threads, since we could - assign the parameters to the different threads and execute - optimizer with multi-threads. diff --git a/develop/doc/_sources/design/dist_refactor/parameter_server.md.txt b/develop/doc/_sources/design/dist_refactor/parameter_server.md.txt deleted file mode 100644 index 805dd13048d41b995d2a01cda52b2ea33e4bbe1d..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/dist_refactor/parameter_server.md.txt +++ /dev/null @@ -1,96 +0,0 @@ -# Design Doc: Parameter Server - -## Abstract - -We propose an approach to implement the parameter server. In this -approach, there is no fundamental difference between the trainer and -the parameter server: they both run subgraphs, but subgraphs of -different purposes. - -## Background - -The previous implementations of the parameter server do not run a -fluid sub-program. Parameter initialization, optimizer computation, network -communication and checkpointing are implemented twice on both the -trainer as well as the parameter server. - -It would be great if we can write code once and use them on both: the -trainer and the parameter server, since this reduces code duplication and -improves extensibility. Given that after the current refactoring, we are -representing everything as a computation graph on the -trainer. Representing everything as a computation graph on the parameter -server becomes a natural extension. - -## Design - -### Distributed Transpiler - -The *Distributed Transpiler* converts the user-defined fluid program -into sub-programs to be scheduled on different nodes with the following -steps: - -1. OP placement: the OPs will be placed on different nodes according - to a heuristic that minimizes the estimated total computation - time. Currently we will use a simple heuristic that puts parameter - variable on parameter server workers and everything else on trainer - workers. -1. Add communication OPs to enable the communication between nodes. - -We will need these OPs: *Send*, *Recv*, *Enqueue*, *Dequeue*. - -Below is an example of converting the user defined graph to the -subgraphs for the trainer and the parameter server: - - - -After converting: - - - -1. The parameter variable W and its optimizer program are placed on the parameter server. -1. Operators are added to the program. - - *Send* sends data to the connected *Recv* operator. The - scheduler on the receive node will only schedule *Recv* operator - to run when the *Send* operator has ran (the *Send* OP will mark - the *Recv* OP runnable automatically). - - *Enqueue* enqueues the input variable, it can block until space - become available in the queue. - - *Dequeue* outputs configurable numbers of tensors from the - queue. It will block until the queue has the required number of - tensors. - - -### Benefits - -- Model parallelism becomes easier to implement: it is an extension to - the trainer - parameter server approach. We can have several "Transpilers" - to achieve different goals. -- User-defined optimizer is easier to add - user can now express it as - a sub-program. -- No more duplication logic inside the trainer and the parameter - server mentioned in the background section. - -### Challenges - -- It is important to balance the parameter shards on multiple - parameter servers. If a single parameter is very big (for example: some - word-embedding, fully connected, softmax layer), we need to - automatically partition the single parameter onto different - parameter servers when possible (only element-wise optimizer depends - on the parameter variable). -- In the "Async SGD" figure, the "W" variable on the parameter server - could be read and written concurrently. See - [here](https://github.com/PaddlePaddle/Paddle/pull/6394) for more - details about concurrent program in Fluid. - -### Discussion - -- Can the Enqueue OP be implemented under our current tensor design - (put the input tensor into the queue tensor)? -- *Dequeue* OP will have variable numbers of output (depending on the - `min_count` attribute), does our current design support it? (similar - question for the *Add* OP) - - -### References: -[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf) diff --git a/develop/doc/_sources/design/error_clip.md.txt b/develop/doc/_sources/design/error_clip.md.txt deleted file mode 100644 index 58aa73b8cd38d01e2426278a3479714e4fb6a3b0..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/error_clip.md.txt +++ /dev/null @@ -1,92 +0,0 @@ -# Error Clip - -## Overview - -Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables' gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next `grad_op` and be shrunk if necessary. -## Usage - -Users are allowed to assign different error clip methods or attributes to different `Variable`s. Users can specify it as a parameter of `Variable`'s constructor: - -```python -var = framework.Variable(..., error_clip=myErrorClip, ...) -``` - -The default value of `error_clip` is `None`, which means no error clip is employed. When it's not `None`, it should take an object of `BaseErrorClipAttr`'s derived class. So far, `BaseErrorClipAttr` has only one derived class: `ErrorClipByValue`, whose constructor is: - -```python -ErrorClipByValue(max, min=None) -``` - -`max` and `min` represent the maximal and minimal clip threshold respectively. In backward pass, all values of `var`'s gradient greater than `max` or less than `min` will be clipped to `max` and `min` respectively. When the `min` is None, the minimal threshold will be assigned with `-max` automatically. - -So we can enable the error clip with threshold `[-5.0, 5.0]` for variable `var` by: - -```python -var = framework.Variable(..., error_clip=ErrorClipByValue(max=5.0), ...) -``` - -## Implementation - -The `BaseErrorClipAttr` and its derived class `ErrorClipByValue` are defined in *clip.py*. - -```python -class BaseErrorClipAttr(object): - def append_clip_op(self, block, grad_name): - raise NotImplementedError() - - -class ErrorClipByValue(BaseErrorClipAttr): - def __init__(self, max, min=None): - max = float(max) - if min is None: - min = -max - else: - min = float(min) - self.max = max - self.min = min - - def append_clip_op(self, block, grad_name): - clip_op_desc = block.desc.append_op() - clip_op_desc.set_type("clip") - clip_op_desc.set_input("X", [grad_name]) - clip_op_desc.set_output("Out", [grad_name]) - clip_op_desc.set_attr("min", self.min) - clip_op_desc.set_attr("max", self.max) -``` - -The `BaseErrorClipAttr` have one main member functions: `append_clip_op(self, block, grad_name)`. - -This function is used to create a `clip_op` and append it to the end of given `block`. For different error clip algorithm require different `clip_op`, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function. - -These `clip_op`s should be inserted after `grad_op`s whose output gradients need to be clipped. It is equivalent to appending some `clip_op`s to the end of the target block every time a new `grad_op` is added. - -```python -for op_desc in grad_op_descs: - new_op_desc = target_block.desc.append_op() - new_op_desc.copy_from(op_desc) - callback(block=target_block, context=grad_to_var) -``` - -Here we employ a callback function to complete this kind of jobs. In `_append_backward_ops_` function, each time after a `grad_op` is added to the `target_block`, a callback function is invoked. The logic of `clip_op` appending can be implemented inside the callback function. - -The callback function for `clip_op` appending is defined in *clip.py*: - -```python -def error_clip_callback(block, context): - # the context is a grad_to_var map - grad_to_var = context - op_desc = block.desc.op(block.desc.op_size() - 1) - for grad_n in filter(lambda n: grad_to_var.has_key(n), - op_desc.output_arg_names()): - fwd_var = block.var_recursive(grad_to_var[grad_n]) - error_clip = getattr(fwd_var, "error_clip", None) - if not (error_clip is None or isinstance(error_clip, - BaseErrorClipAttr)): - raise TypeError( - "Variable's error_clip should be an instance of BaseErrorClipAttr or None." - ) - if error_clip is not None: - error_clip.append_clip_op(block, grad_n) -``` - -This function takes a `block` and a `context`(which is actually a grad\_to\_var map) as inputs. It checks each output of the last `OpDesc` in the `block`. Notice that the last `OpDesc` of the `block` must be a `grad_op` and its outputs must be some forward variables' gradients. If an output gradient's corresponding forward variable has an attribute of `error_clip`, `error_clip_callback` will call the `error_clip`'s `append_clip_op` function to append the required `clip_op` into the `block`. diff --git a/develop/doc/_sources/design/evaluator.md.txt b/develop/doc/_sources/design/evaluator.md.txt deleted file mode 100644 index 11cc129d56905a9ee666da92fbe6f8559c6d325a..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/evaluator.md.txt +++ /dev/null @@ -1,58 +0,0 @@ -## Evaluator Design - -### Problem Statement - -During training or inference, we provide an evaluation function to measure the model performance, for example, accuracy, precision, etc. In the operator based framework design, the data passes through the network pipeline batch by batch. As a result, inside the operator, we only calculate the metrics for one minibatch. Thus, we need to provide a mechanism to calculate the metrics for each N pass/batch the user wants. - -### Evaluator Design -Currently, every operation is expressed in the graph. We divide the evaluator process into three steps. - -1. Initialize the metric state and add it into the block. - -2. Calculate the concerned metrics for every mini-batch. The single evaluator operator is only responsible for calculating the necessary statistics for one mini-batch. For example, the accuracy operator only calculates the accuracy for a minibatch data if run once. - - -3. Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. When it comes to distributed training/Multi-GPU training, aggregate the value from different devices. - -### Implementation -This design is shown in the Python API. -Each metric operator needs to caculate the metric statistic and return the batch-aware states. Python side is responsible for accumulating the states for each pass. - - -```python -class Evaluator(object): - """ - Evaluator Base class. - """ - def __init__(self, name, **kwargs): - """ - Different evaluator may has different metric states. E.g, Accuracy need two variables, total and right sample counts. - Auc need four variables, `true_positives`, - `true_negatives`, `false_positives` and `false_negatives`. So every evaluator should create its needed variables and append to main_program - - The initialization of Evaluator should be responsible for: - create metric states and append to the main_program - """ - pass - - def _update_ops(self, input, label, **kwargs) - """ - Add mini-batch evaluator caculate operators to the main_program. - Add increment operator to accumulate the metric states. - """ - - - def reset(self, executor, reset_program=None): - """ - Reset metric states at the begin of each pass/user specified batch number. - Execute the reset_program to reset the states. - """ - - - def eval(self, executor, eval_program=None): - """ - Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. - Execute the eval_program and return the result. - """ - return eval_result -``` diff --git a/develop/doc/_sources/design/executor.md.txt b/develop/doc/_sources/design/executor.md.txt deleted file mode 100644 index 2d4b371cc56db82ce5747da6db07f05aa7f7e6c1..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/executor.md.txt +++ /dev/null @@ -1,29 +0,0 @@ -# Executor Design Doc - -## Motivation -In [fluid](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/fluid.md), we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message -[`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree). - -The executor runs the `ProgramDesc` like an interpreter. `ProgramDesc` contains the intrinsics (operators in this case) and variables which will be used, executor explicitly executes the stored precompiled code. - -## Overview - -An executor takes a `ProgramDesc`, a `block_id` and a `Scope`. The `ProgramDesc` is a list of blocks and each block contains the protobuf definition of all the parameters and operators in the block. The `block_id` specifies the entrance block. And the `Scope` is the container of all the variable instances, which is persistent throughout different runs. - -## Executor - -The `Executor` explicitly executes all the intrinsics (operators here) in the `block_id`th block of a `ProgramDesc`. Essentially, it instantiates Variables and Operators, then runs all the operators in sequence one-by-one. -It is very similar to how a push stack frame works when entering a block, following which it cleans up all the temporary variables when a mini-batch is finished. It does not however, have the stack frame pop process. - -### The interface -```c++ - Executor(places); -``` -A executor does not own any computing resources, a user can only construct an executor using the specified places. - -### Running an Executor - -``` - void Run(ProgramDesc, Scope, block_id, create_local_scope); -``` -An `Executor` only provides a unified way to execute `ProgramDesc`. `ProgramDesc` is the target that will be executed, the `Scope` specifies the variable container, the `block_id` indicates the entrance block and `create_local_scope` is a boolean that states whether it will destroy the temporary variables after the execution is finished. diff --git a/develop/doc/_sources/design/file_manager/README.md.txt b/develop/doc/_sources/design/file_manager/README.md.txt deleted file mode 100644 index 3df10d801e568834729f902aace483d033340e2d..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/file_manager/README.md.txt +++ /dev/null @@ -1,87 +0,0 @@ -# FileManager设计文档 -## 目标 -在本文档中,我们设计说明了名为FileManager系统,方便用户上传自己的训练数据以进行分布式训练 - -主要功能包括: - -- 提供常用的命令行管理命令管理文件和目录 -- 支持大文件的断点上传、下载 - -## 名词解释 -- PFS:是`Paddlepaddle cloud File System`的缩写,是对用户文件存储空间的抽象,与之相对的是local filesystem。目前我们用CephFS来搭建。 -- [CephFS](http://docs.ceph.com/docs/master/cephfs/):一个POSIX兼容的文件系统。 -- Chunk:逻辑划上文件分块的单位。 - -## 模块 -### 架构图 - - -### PFSClient -- 功能: 详细设计[link](./pfs/pfsclient.md) - - 提供用户管理文件的命令 - - 需要可以跨平台执行 - -- 双向验证 - PFSClient需要和Ingress之间做双向验证[tls](#tls),所以用户需要首先在`cloud.paddlepaddle.org`上注册一下,申请用户空间,并且把系统生成的CA(certificate authority)、Key、CRT(CA signed certificate)下载到本地,然后才能使用PFSClient。 - -### [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) -- 功能: - 提供七层协议的反向代理、基于粘性会话的负载均衡功能。 - -- 透传用户身份的办法 - Ingress需要把PFSClient的身份信息传给PFSServer,配置的方法参考[link](http://www.integralist.co.uk/posts/clientcertauth.html#3) - -### PFSServer -PFSServer提供RESTful API接口,接收处理PFSClient端的文件管理请求,并且把结果返回PFSClient端。 - -RESTful API - -- /api/v1/files - - `GET /api/v1/files`: Get metadata of files or directories. - - `POST /api/v1/files`: Create files or directories. - - `PATCH /api/v1/files`: Update files or directories. - - `DELETE /api/v1/files`: Delete files or directories. - -- /api/v1/file/chunks - - `GET /api/v1/storage/file/chunks`: Get chunks's metadata of a file. - -- /api/v1/storage/files - - `GET /api/v1/storage/files`: Download files or directories. - - `POST /api/v1/storage/files`: Upload files or directories. - -- /api/v1/storage/file/chunks - - `GET /api/v1/storage/file/chunks`: Download chunks's data. - - `POST /api/v1/storage/file/chunks`: Upload chunks's data. - -## 文件传输优化 - -### 分块文件传输 -用户文件可能是比较大的,上传到Cloud或者下载到本地的时间可能比较长,而且在传输的过程中也可能出现网络不稳定的情况。为了应对以上的问题,我们提出了Chunk的概念,一个Chunk由所在的文件偏移、数据、数据长度及校验值组成。文件的上传和下载都是通过对Chunk的操作来实现的。由于Chunk比较小(默认256K),完成一个传输动作完成的时间也比较短,不容易出错。PFSClient需要在传输完毕最后一个Chunk的时候检查destination文件的MD5值是否和source文件一致。 - -一个典型的Chunk如下所示: - -``` -type Chunk struct { - fileOffset int64 - checksum uint32 - len uint32 - data []byte -} -``` - -### 生成sparse文件 -当destination文件不存在或者大小和source文件不一致时,可以用[Fallocate](https://Go.org/pkg/syscall/#Fallocate)生成sparse文件,然后就可以并发写入多个Chunk。 - -### 覆盖不一致的部分 -文件传输的的关键在于需要PFSClient端对比source和destination的文件Chunks的checksum是否保持一致,不一致的由PFSClient下载或者传输Chunk完成。这样已经传输成功的部分就不用重新传输了。 - -## 用户使用流程 -参考[link](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/data_dispatch.md) - -## 框架生成 -用[swagger](https://github.com/swagger-api/swagger-codegen)生成PFSClient和PFSServer的框架部分,以便我们可以把更多的精力放到逻辑本身上。 - -## 参考文档 -- [TLS complete guide](https://github.com/k8sp/tls/blob/master/tls.md) -- [aws.s3](http://docs.aws.amazon.com/cli/latest/reference/s3/) -- [linux man document](https://linux.die.net/man/) diff --git a/develop/doc/_sources/design/file_manager/pfs/pfsclient.md.txt b/develop/doc/_sources/design/file_manager/pfs/pfsclient.md.txt deleted file mode 100644 index 56bc70c54bbc92b78d66e04fb495b1300cf8ebe0..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/file_manager/pfs/pfsclient.md.txt +++ /dev/null @@ -1,129 +0,0 @@ -# PFSClient - -## Description -The `pfs` command is a Command Line Interface to manage your files on PaddlePaddle Cloud - -## Synopsis -``` -paddle [options] pfs [parameters] -``` - -## Options -``` ---profile (string) - Use a specific profile from your credential file. - ---help (string) - Display more information about command - ---version - Output version information and exit - ---debug - Show detailed debugging log - ---only-show-errors (boolean) - Only errors and warnings are displayed. All other output is suppressed. -``` - -## Path Arguments -When using a command, we need to specify path arguments. There are two path argument type: `localpath` and `pfspath`. - -A `pfspath` begin with `/pfs`, eg: `/pfs/$DATACENTER/home/$USER/folder`. - -[Here](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/data_dispatch.md#上传训练文件) is how to config datacenters. - -## order of Path Arguments -Commonly, if there are two path arguments, the first is the source, and the second is the destination. - -## Subcommonds -- rm - remove files or directories - -``` -Synopsis: - rm [-r] [-v] ... - -Options: - -r - Remove directories and their contents recursively - -v - Cause rm to be verbose, showing files after they are removed. - -Examples: - paddle pfs rm /pfs/$DATACENTER/home/$USER/file - paddle pfs rm -r /pfs/$DATACENTER/home/$USER/folder -``` -- mv - move (rename) files - -``` -Synopsis: - mv [-f | -n] [-v] - mv [-f | -n] [-v] ... - mv [-f | -n] [-v] - mv [-f | -n] [-v] ... - mv [-f | -n] [-v] - mv [-f | -n] [-v] ... - -Options: - -f - Do not prompt for confirmation before overwriting the destination path. (The -f option overrides previous -n options.) - -n - Do not overwrite an existing file. (The -n option overrides previous -f options.) - -v - Cause mv to be verbose, showing files after they are moved. - -Examples: - paddle pfs mv ./text1.txt /pfs/$DATACENTER/home/$USER/text1.txt -``` -- cp - copy files or directories - -``` -Synopsis: - cp [-r] [-f | -n] [-v] [--preserve--links] - cp [-r] [-f | -n] [-v] [--preserve--links] ... - cp [-r] [-f | -n] [-v] [--preserve--links] - cp [-r] [-f | -n] [-v] [--preserve--links] ... - cp [-r] [-f | -n] [-v] [--preserve--links] - cp [-r] [-f | -n] [-v] [--preserve--links] ... - -Options: - -r - Copy directories recursively - -f - Do not prompt for confirmation before overwriting the destination path. (The -f option overrides previous -n options.) - -n - Do not overwrite an existing file. (The -n option overrides previous -f options.) - -v - Cause cp to be verbose, showing files after they are copied. - --preserve--links - Reserve links when copy links - -Examples: - paddle pfs cp ./file /pfs/$DATACENTER/home/$USER/file - paddle pfs cp /pfs/$DATACENTER/home/$USER/file ./file -``` -- ls- list files - -``` -Synopsis: - ls [-r] ... - -Options: - -R - List directory(ies) recursively - -Examples: - paddle pfs ls /pfs/$DATACENTER/home/$USER/file - paddle pfs ls /pfs/$DATACENTER/home/$USER/folder -``` - -- mkdir - mkdir directory(ies) -Create intermediate directory(ies) as required. - -``` -Synopsis: - mkdir ... - -Examples: - paddle pfs mkdir /pfs/$DATACENTER/home/$USER/folder -``` diff --git a/develop/doc/_sources/design/float16.md.txt b/develop/doc/_sources/design/float16.md.txt deleted file mode 100644 index 1ea95ed6b5d6792171569b6ff76d09be92fcb13e..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/float16.md.txt +++ /dev/null @@ -1,105 +0,0 @@ -# Design Doc: float16 - -## Why float16 -Half precision (float16) is a binary floating-point format that occupies 16 bits in memory. float16 is half the size of traditional 32-bit single precision format (float) and has lower precision and smaller range. - -When high precision computation is not required, using float16 data type could potentially - -- reduce storage space, memory bandwidth, and power usages; -- increase the chance of data fitting into a smaller cache of lower latency; -- provide arithmetic speed up if supported by hardware. - -## Survey of current float16 support -A brief survey of float16 support on different compilers, hardwares, and libraries can be found below. Interested readers can refer to [link1](https://github.com/PaddlePaddle/Paddle/issues/4853) and [link2](https://github.com/Xreki/Xreki.github.io/blob/master/multi_data_types_in_dl_framework/ppt/float16_and_quantized_type.md) for more info. - -The goal of float16 is to serve as a key for the executor to find and run the correct version of compute method specialized for float16 in operator kernel. It should be compatible with various natively supported float16 implementations including `__half` for cuda, `float16_t` for ARM, and `Eigen::half` for Eigen to make writing customized float16 kernels easier. - -### Compiler -- nvcc supports `__half` data type after CUDA 7.5. -- `__fp16` or `float16_t` is supported as storage type for gcc >= 6.1 and clang >= 3.4. -- `__fp16` or `float16_t` is supported as arithmetic type for gcc >= 7.1 and clang >= 3.9. - -### Hardware -- `__half` is supported on GPU with compute capability >= 5.3. -- `__fp16` is supported as storage type for ARMv7-A, ARMv8-A, and above. -- `__fp16` is supported as arithmetic type after ARMv8.2-A (currently, the only microarchitecture implementing ARMv8.2-A is ARM Cortex-A75, which is announced in May 2017. There seems to be no application processors currently available on market that adopts this architecture. It is reported that Qualcomm Snapdragon 845 uses Cortex-A75 design and will be available in mobile devices in early 2018). - -### Libraries -- [Eigen](https://github.com/RLovelett/eigen) >= 3.3 supports float16 calculation on both GPU and CPU using the `Eigen::half` class. It is mostly useful for Nvidia GPUs because of the overloaded arithmetic operators using cuda intrinsics. It falls back to using software emulation on CPU for calculation and there is no special treatment to ARM processors. -- [ARM compute library](https://github.com/ARM-software/ComputeLibrary) >= 17.02.01 supports NEON FP16 kernels (requires ARMv8.2-A CPU). - -### CUDA version issue -There are currently three versions of CUDA that supports `__half` data type, namely, CUDA 7.5, 8.0, and 9.0. -CUDA 7.5 and 8.0 define `__half` as a simple struct that has a `uint16_t` data (see [`cuda_fp16.h`](https://github.com/ptillet/isaac/blob/9212ab5a3ddbe48f30ef373f9c1fb546804c7a8c/include/isaac/external/CUDA/cuda_fp16.h)) as follows: -``` -typedef struct __align__(2) { - unsigned short x; -} __half; - -typedef __half half; -``` -This struct does not define any overloaded arithmetic operators. So you have to directly use `__hadd` instead of `+` to correctly add two half types: -``` -__global__ void Add() { - half a, b, c; - c = __hadd(a, b); // correct - c = a + b; // compiler error: no operator "+" matches these operands -} -``` -CUDA 9.0 provides a major update to the half data type. The related code can be found in the updated [`cuda_fp16.h`](https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.h) and the newly added [`cuda_fp16.hpp`](https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.hpp). - -Essentially, CUDA 9.0 renames the original `__half` type in 7.5 and 8.0 as `__half_raw`, and defines a new `__half` class type that has constructors, conversion operators, and also provides overloaded arithmetic operators such as follows: -``` -typedef struct __CUDA_ALIGN__(2) { - unsigned short x; -} __half_raw; - - -struct __CUDA_ALIGN__(2) __half { -protected: - unsigned short __x; -public: - // constructors and conversion operators from/to - // __half_raw and other built-in data types -} - -typedef __half half; - -__device__ __forceinline__ -__half operator+(const __half &lh, const __half &rh) { - return __hadd(lh, rh); -} - -// Other overloaded operators -``` -This new design makes `c = a + b` work correctly for CUDA half data type. - -## Implementation -The float16 class holds a 16-bit `uint16_t` data internally. -``` -struct float16 { - uint16_t x; -}; -``` - -float16 supports the following features: - - constructors / assignment operators that take input from primitive data types including bool, integers of various length, float, and double. - - constructors / assignment operators that take input from `__half` on cuda, `float16_t` on ARM, and `Eigen::half` on Eigen. - - conversion operators to primitive data types and half precision data types on cuda, ARM and Eigen. - - overloaded arithmetic operators for cuda, arm, and non-arm cpu, respectively. These operators will take advantage of the cuda and ARM intrinsics on the corresponding hardware. - -To support the above features, two fundamental conversion functions are provided: -``` -float16 float_to_half_rn(float f); // convert to half precision in round-to-nearest-even mode -float half_to_float(float16 h); -``` -which provides one-to-one conversion between float32 and float16. These twos functions will do different conversion routines based on the current hardware. CUDA/ARM instrinsics will be used when the corresonding hardware is available. If the hardware or compiler level does not support float32 to float16 conversion, software emulation will be performed to do the conversion. - -## To do -After float16 class is available, some of the future items are below: - -- Update pybind/tensor_py.h to bind c++ float16 with numpy float16. - -- Modify `GetKernelType()` method in `framework/operator.h` to make it compatible with float16. - -- Create a type-casting operator that can convert the data type in tensor between float16 and other types. diff --git a/develop/doc/_sources/design/fluid.md.txt b/develop/doc/_sources/design/fluid.md.txt deleted file mode 100644 index f78fa8c1914124f33b9730f918c8887ced4f8d9d..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/fluid.md.txt +++ /dev/null @@ -1,114 +0,0 @@ -# Design Doc: PaddlePaddle Fluid - -## Why Fluid - -When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system at the time was Caffe. However, when PaddlePaddle was open-sourced in 2016, many other choices were available. There was a challenge -- what is the need for open sourcing yet another deep learning framework? - -Fluid is the answer. Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the "process" of training or inference using the concept of a model. In fact in PyTorch, TensorFlow Eager Execution and Fluid, there is no concept of a model at all. The details are covered in the sections below. Fluid is currently more extreme in the above mentioned idea than PyTorch and Eager Execution, and we are trying to push Fluid towards the directions of a compiler and a new programming language for deep learning. - -## The Evolution of Deep Learning Systems - -Deep learning infrastructure is one of the fastest evolving technologies. Within four years, there have already been three generations of technologies invented. - -| Existed since | model as sequence of layers | model as graph of operators | No model | -|--|--|--|--| -| 2013 | Caffe, Theano, Torch, PaddlePaddle | | | -| 2015 | | TensorFlow, MxNet, Caffe2, ONNX, n-graph | | -| 2016 | | | PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid | - -From the above table, we see that the deep learning technology is evolving towards getting rid of the concept of a model. To understand the reasons behind this direction, a comparison of the *programming paradigms* or the ways to program deep learning applications using these systems, would be helpful. The following section goes over these. - -## Deep Learning Programming Paradigms - -With the systems listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following: - -```python -x = layer.data("image") -l = layer.data("label") -f = layer.fc(x, W) -s = layer.softmax(f) -c = layer.mse(l, s) - -for i in xrange(1000): # train for 1000 iterations - m = read_minibatch() - forward({input=x, data=m}, minimize=c) - backward(...) - -print W # print the trained model parameters. -``` - -The above program includes two parts: - -1. The first part describes the model, and -2. The second part describes the training process (or inference process) for the model. - -This paradigm has a well-known problem that limits the productivity of programmers. If the programmer made a mistake in configuring the model, the error messages wouldn't show up until the second part is executed and `forward` and `backward` propagations are performed. This makes it difficult for the programmer to debug and locate a mistake that is located blocks away from the actual error prompt. - -This problem of being hard to debug and re-iterate fast on a program is the primary reason that programmers, in general, prefer PyTorch over the older systems. Using PyTorch, we would write the above program as following: - -```python -W = tensor(...) - -for i in xrange(1000): # train for 1000 iterations - m = read_minibatch() - x = m["image"] - l = m["label"] - f = layer.fc(x, W) - s = layer.softmax(f) - c = layer.mse(l, s) - backward() - -print W # print the trained model parameters. -``` - -We can see that the main difference is the moving the model configuration part (the first step) into the training loop. This change would allow the mistakes in model configuration to be reported where they actually appear in the programming block. This change also represents the model better, or its forward pass, by keeping the configuration process in the training loop. - -## Describe Arbitrary Models for the Future - -Describing the process instead of the model also brings Fluid, the flexibility to define different non-standard models that haven't been invented yet. - -As we write out the program for the process, we can write an RNN as a loop, instead of an RNN as a layer or as an operator. A PyTorch example would look like the following: - -```python -for i in xrange(1000): - m = read_minibatch() - x = m["sentence"] - for t in xrange x.len(): - h[t] = the_step(x[t]) -``` - -With Fluid, the training loop and the RNN in the above program are not really Python loops, but just a "loop structure" provided by Fluid and implemented in C++ as the following: - -```python -train_loop = layers.While(cond) -with train_loop.block(): - m = read_minibatch() - x = m["sentence"] - rnn = layers.While(...) - with rnn.block(): - h[t] = the_step(input[t]) -``` - -An actual Fluid example is described [here](https://github.com/PaddlePaddle/Paddle/blob/bde090a97564b9c61a6aaa38b72ccc4889d102d9/python/paddle/fluid/tests/unittests/test_while_op.py#L50-L58). - -From the example, the Fluid programs look very similar to their PyTorch equivalent programs, except that Fluid's loop structure, wrapped with Python's `with` statement, could run much faster than just a Python loop. - -We have more examples of the [`if-then-else`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/if_else_op.md) structure of Fluid. - -## Turing Completeness - -In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine. For a programming language, if it provides if-then-else and loop, it is Turing complete. From the above examples, Fluid seems to be Turing complete; however, it is noteworthy to notice that there is a slight difference between the `if-then-else` of Fluid and that of a programming language. The difference being that the former runs both of its branches and splits the input mini-batch into two -- one for the True condition and another for the False condition. This hasn't been researched in depth if this is equivalent to the `if-then-else` in programming languages that makes them Turing-complete. Based on a conversation with [Yuang Yu](https://research.google.com/pubs/104812.html), it seems to be the case but this needs to be looked into in-depth. - -## The Execution of a Fluid Program - -There are two ways to execute a Fluid program. When a program is executed, it creates a protobuf message [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree). - -There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program. - -Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md). - -## Backward Compatibility of Fluid - -Given all the advantages from the removal of the concept of a *model*, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph). Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators. - -For Fluid, we can write a converter that extracts the parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format. diff --git a/develop/doc/_sources/design/fluid_compiler.md.txt b/develop/doc/_sources/design/fluid_compiler.md.txt deleted file mode 100644 index 2a6beafc52e815fa067b273bb5887ddcf6ab15ae..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/fluid_compiler.md.txt +++ /dev/null @@ -1,110 +0,0 @@ -# PaddlePaddle Fluid: Towards a Compiled Programming Language - -As described in [fluid.md](fluid.md), when a Fluid application program -runs, it generates a `ProgramDesc` protobuf message as an intermediate -representation of itself. The C++ class `Executor` can run this -protobuf message as an interpreter. This article describes the Fluid -compiler. - -![](fluid-compiler.png) - -## ProgramDesc - -Before we go deeper into the idea of compiled language, let us take a -look at a simple example Fluid application. - -```python -import "fluid" - -func paddlepaddle() { - X = fluid.read(...) - W = fluid.Tensor(...) - Y = fluid.mult(X, W) -} -``` - -This program consists of a [block](block.md) of three operators -- -`read`, `assign`, and `mult`. Its `ProgramDesc` message looks like -the following - -```protobuf -message ProgramDesc { - block[0] = Block { - vars = [X, W, Y], - ops = [ - read(output = X) - assign(input = ..., output = W) - mult(input = {X, W}, output = Y) - ], - } -} -``` - -## Transpilers - -We can write a transpiler program that takes a `ProgramDesc`, e.g., -the above one, and outputs another `ProgramDesc`. Let us take some -examples: - -1. *Memory optimization transpiler*: We can write a transpiler that - inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so - to free memory early, before the end of an iteration, so to keep a - small memory footprint. - -1. *Distributed training transpiler*: We can write a transpiler that - converts a`ProgramDesc` into its distributed version of two - `ProgramDesc`s -- one for running by the trainer processes and the - other for the parameter server. - -In the rest of this article, we talk about a special kind of -transpiler, *Native code generator*, which takes a `ProgramDesc` and -generates a `.cu` (or `.cc`) file, which could be built by C++ -compilers (gcc, nvcc, icc) into binaries. - -## Native Code Generator - -For the above example, the native code generator transpiler, say, the -CUDA code generator, should generate a `main` function: - -```c++ -void main() { - auto X = fluid_cuda_read(...); - auto W = fluid_cuda_create_tensor(...); - auto Y = fluid_cuda_mult(X, W); -} -``` - -and the definitions of functions `fluid_cuda_read`, -`fluid_cuda_create_tensor`, and `fluid_cuda_mult`. Please be aware -that each function could just define a C++ instance of an operator and -run it. For example - -```c++ -paddle::Tensor fluid_cuda_read(...) { - paddle::Tensor t; - paddle::operator::Read r(&t, ...); - r.Run(); - return t; -} -``` - -For computational operators that have multiple *kernels*, each for a -specific hardware platform, for example, the `mult` operator, the -generated code should call its CUDA kernel: - -```c++ -paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a, - const paddle::Tensor& b) { - paddle::Tensor t; - paddle::operator::Mult m(a, b, ...); - Mult.Run(cuda_context); -} -``` - -where `cuda_context` could be a global variable of type -`paddle::CUDADeviceContext`. - -## Multi-Block Code Generation - -Most Fluid application programs may have more than one blocks. To -execute them, we need to trace [scopes](scope.md). diff --git a/develop/doc/_sources/design/functions_operators_layers.md.txt b/develop/doc/_sources/design/functions_operators_layers.md.txt deleted file mode 100644 index 984b59f4c6971dfb6f46dfe342f2751f392c0e88..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/functions_operators_layers.md.txt +++ /dev/null @@ -1,100 +0,0 @@ -# Design Doc: Functions, Operators, and Layers - -In a DL system, we can compose one or more fine grained operators into a coarse grained one. For example, the FC layer can be composed of a multiplication operator and an add operator. - -Historically, some fine grained operations are known as operators, and some coarse level ones are known as layers. But we need a well-defined separation. - -In general, operators are those very fine grained operations, e.g., mul and add. In the implementation, we can write them as C++ functions: - -```c++ -template T add(T x, T y) { return x + y; } -template T mul(T x, T y) { return x * y; } -``` - -Then we can wrap them into operators which are C++ classes and can be created from Python bindings by name. A C macro can do this. For example, the following macro invocation - -```c++ -#define MAKE_FUNCTION_OPERATOR(mul); -``` - -generates - -```c++ -template class mulOp : public OperatorBase {...}; -REGISTER_OP(mulOp, "mul"); -``` - -so that in Python we can create operator mul by: - -```python -X1 = Var() -X2 = Var() -Y = Var() -paddle.cpp.create_operator("mul", input=[X1, X2], output=Y) -``` - -Also, at the same time, we can compose a coarse level C++ operator class by composing functions `mul` and `add`: - -```c++ -template -class FCOp : public OperatorBase { - public: - void Run(...) { - add(mul(Input("X"), Input("W")), Input("b"); - } -}; -REGISTER_OP(FCOp, "fc"); -``` - -We need to support such composition in Python as well. To do so, we need a higher level Python wrapping of operator creation than `paddle.cpp.create_operator`. This higher level operator API should be compatible with the layer API. - -Let's explain using an example. Suppose that we are going to compose the FC using mul and add in Python, we'd like to have Python functions `mul` and `add` defined in module `operator`: - -```python -def operator.mul(X1, X2): - O = Var() - paddle.cpp.create_operator("mul", input={X1, Y1}, output=O) - return O - -def operator.add(X1, X2): - O = Var() - paddle.cpp.create_operator("add", input={X1, X2}, output=O) - return O -``` - -Above code snippets are automatically generated. Given them, users can define - -```python -def layer.fc(X): - W = Var() - b = Var() - return operator.add(operator.mul(X, W), b) -``` - -If we don't have `operator.mul` and `operator.add`, the definiton of `layer.fc` would be complicated: - -```python -def layer.fc(X): - W = Var() - b = Var() - O1 = Var() - paddle.cpp.create_operator("mul", input=[X, W], output=O1) - O2 = Var() - paddle.cpp.create_operator("add", input=[O1, b], output=O2) - return O2 -``` - -We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example: - - -| C++ functions/functors | mul | add | | | -|------------------------|--------------|--------------|-------------|----------| -| C++ operator class | mulOp | addOp | FCOp | | -| Python binding | operator.mul | operator.add | operator.fc | | -| Python function | | | | layer.fc | - - -This is how we differentiate layer and operators in PaddlePaddle: - -- those defined in C++ and have a lightweighted Python wrapper in module `operators` are operators; whereas -- those who don't have C++ implementations but a Python implementation that compose C++ operators are known as layers. diff --git a/develop/doc/_sources/design/gan_api.md.txt b/develop/doc/_sources/design/gan_api.md.txt deleted file mode 100644 index fb41df8615f73d9fd4c32995eab265833eac1a55..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/gan_api.md.txt +++ /dev/null @@ -1,253 +0,0 @@ -# Design for GAN - -GAN (General Adversarial Net [https://arxiv.org/abs/1406.2661]) is an important model for unsupervised learning and widely used in many areas. - -It applies several important concepts in machine learning system design, including building and running subgraphs, dependency tracing, different optimizers in one executor and so forth. - -In our GAN design, we wrap it as a user-friendly easily customized python API to design different models. We take the conditional DC-GAN (Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [https://arxiv.org/abs/1511.06434]) as an example due to its good performance on image generation. - -

-
-Figure 1. The overall running logic of GAN. The black solid arrows indicate the forward pass; the green dashed arrows indicate the backward pass of generator training; the red dashed arrows indicate the backward pass of the discriminator training. The BP pass of the green (red) arrow should only update the parameters in the green (red) boxes. The diamonds indicate the data providers. d\_loss and g\_loss marked in red and green are the two targets we would like to run. -

- -The operators, layers and functions required/optional to build a GAN demo is summarized in https://github.com/PaddlePaddle/Paddle/issues/4563. - -

-
-Figure 2. Photo borrowed from the original DC-GAN paper. -

- -## The Conditional-GAN might be a class. -This design we adopt the popular open source design in https://github.com/carpedm20/DCGAN-tensorflow and https://github.com/rajathkmp/DCGAN. It contains following data structure: - -- DCGAN(object): which contains everything required to build a GAN model. It provides following member functions methods as API: - -- __init__(...): Initialize hyper-parameters (like conv dimension and so forth), and declare model parameters of discriminator and generator as well. - -- generator(z, y=None): Generate a fake image from input noise z. If the label y is provided, the conditional GAN model will be chosen. -Returns a generated image. - -- discriminator(image): -Given an image, decide if it is from a real source or a fake one. -Returns a 0/1 binary label. - -- build_model(self): -build the whole GAN model, define training loss for both generator and discrimator. - -## Discussion on Engine Functions required to build GAN -- Trace the tensor and variable dependency in the engine executor. (Very critical, otherwise GAN can'be be trained correctly) -- Different optimizers responsible for optimizing different loss. - -To be more detailed, we introduce our design of DCGAN as following: - -### Class member Function: Initializer -- Set up hyper-parameters, including condtional dimension, noise dimension, batch size and so forth. -- Declare and define all the model variables. All the discriminator parameters are included in the list self.theta_D and all the generator parameters are included in the list self.theta_G. -```python -class DCGAN(object): - def __init__(self, y_dim=None): - - # hyper parameters - self.y_dim = y_dim # conditional gan or not - self.batch_size = 100 - self.z_dim = z_dim # input noise dimension - - # define parameters of discriminators - self.D_W0 = pd.Variable(shape=[3,3, 1, 128], data=pd.gaussian_normal_randomizer()) - self.D_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data - self.D_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer()) - self.D_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data - self.D_W2 = pd.Varialble(np.random.rand(128, 1)) - self.D_b2 = pd.Variable(np.zeros(128)) - self.theta_D = [self.D_W0, self.D_b0, self.D_W1, self.D_b1, self.D_W2, self.D_b2] - - # define parameters of generators - self.G_W0 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer()) - self.G_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data - self.G_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer()) - self.G_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data - self.G_W2 = pd.Varialble(np.random.rand(128, 1)) - self.G_b2 = pd.Variable(np.zeros(128)) - self.theta_G = [self.G_W0, self.G_b0, self.G_W1, self.G_b1, self.G_W2, self.G_b2] -``` - -### Class member Function: Generator -- Given a noisy input z, returns a fake image. -- Concatenation, batch-norm, FC operations required; -- Deconv layer required, which is missing now... -```python -class DCGAN(object): - def generator(self, z, y = None): - # input z: the random noise - # input y: input data label (optional) - # output G_im: generated fake images - - if not self.y_dim: - z = pd.layer.concat(1, [z, y]) - - G_h0 = pd.layer.fc(z, self.G_w0, self.G_b0) - G_h0_bn = pd.layer.batch_norm(G_h0) - G_h0_relu = pd.layer.relu(G_h0_bn) - - G_h1 = pd.layer.deconv(G_h0_relu, self.G_w1, self.G_b1) - G_h1_bn = pd.layer.batch_norm(G_h1) - G_h1_relu = pd.layer.relu(G_h1_bn) - - G_h2 = pd.layer.deconv(G_h1_relu, self.G_W2, self.G_b2)) - G_im = pd.layer.tanh(G_im) - return G_im -``` - -### Class member function: Discriminator -- Given a noisy input z, returns a fake image. -- Concatenation, Convolution, batch-norm, FC, Leaky-ReLU operations required; -```python -class DCGAN(object): - def discriminator(self, image): - # input image: either generated images or real ones - # output D_h2: binary logit of the label - - D_h0 = pd.layer.conv2d(image, w=self.D_w0, b=self.D_b0) - D_h0_bn = pd.layer.batchnorm(h0) - D_h0_relu = pd.layer.lrelu(h0_bn) - - D_h1 = pd.layer.conv2d(D_h0_relu, w=self.D_w1, b=self.D_b1) - D_h1_bn = pd.layer.batchnorm(D_h1) - D_h1_relu = pd.layer.lrelu(D_h1_bn) - - D_h2 = pd.layer.fc(D_h1_relu, w=self.D_w2, b=self.D_b2) - return D_h2 -``` - -### Class member function: Build the model -- Define data readers as placeholders to hold the data; -- Build generator and discriminators; -- Define two training losses for discriminator and generator, respectively. -If we have execution dependency engine to back-trace all tensors, the module building our GAN model will be like this: -```python -class DCGAN(object): - def build_model(self): - if self.y_dim: - self.y = pd.data(pd.float32, [self.batch_size, self.y_dim]) - self.images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size]) - self.faked_images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size]) - self.z = pd.data(tf.float32, [None, self.z_size]) - - # step 1: generate images by generator, classify real/fake images with discriminator - if self.y_dim: # if conditional GAN, includes label - self.G = self.generator(self.z, self.y) - self.D_t = self.discriminator(self.images) - # generated fake images - self.sampled = self.sampler(self.z, self.y) - self.D_f = self.discriminator(self.G) - else: # original version of GAN - self.G = self.generator(self.z) - self.D_t = self.discriminator(self.images) - # generate fake images - self.sampled = self.sampler(self.z) - self.D_f = self.discriminator(self.images) - - # step 2: define the two losses - self.d_loss_real = pd.reduce_mean(pd.cross_entropy(self.D_t, np.ones(self.batch_size)) - self.d_loss_fake = pd.reduce_mean(pd.cross_entropy(self.D_f, np.zeros(self.batch_size)) - self.d_loss = self.d_loss_real + self.d_loss_fake - - self.g_loss = pd.reduce_mean(pd.cross_entropy(self.D_f, np.ones(self.batch_szie)) -``` - -If we do not have dependency engine but blocks, the module building our GAN model will be like this: -```python -class DCGAN(object): - def build_model(self, default_block): - # input data in the default block - if self.y_dim: - self.y = pd.data(pd.float32, [self.batch_size, self.y_dim]) - self.images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size]) - # self.faked_images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size]) - self.z = pd.data(tf.float32, [None, self.z_size]) - - # step 1: generate images by generator, classify real/fake images with discriminator - with pd.default_block().g_block(): - if self.y_dim: # if conditional GAN, includes label - self.G = self.generator(self.z, self.y) - self.D_g = self.discriminator(self.G, self.y) - else: # original version of GAN - self.G = self.generator(self.z) - self.D_g = self.discriminator(self.G, self.y) - self.g_loss = pd.reduce_mean(pd.cross_entropy(self.D_g, np.ones(self.batch_szie)) - - with pd.default_block().d_block(): - if self.y_dim: # if conditional GAN, includes label - self.D_t = self.discriminator(self.images, self.y) - self.D_f = self.discriminator(self.G, self.y) - else: # original version of GAN - self.D_t = self.discriminator(self.images) - self.D_f = self.discriminator(self.G) - - # step 2: define the two losses - self.d_loss_real = pd.reduce_mean(pd.cross_entropy(self.D_t, np.ones(self.batch_size)) - self.d_loss_fake = pd.reduce_mean(pd.cross_entropy(self.D_f, np.zeros(self.batch_size)) - self.d_loss = self.d_loss_real + self.d_loss_fake -``` -Some small confusion and problems with this design: -- D\_g and D\_f are actually the same thing, but has to be written twice; i.e., if we want to run two sub-graphs conceptually, the same codes have to be written twice if they are shared by the graph. -- Requires ability to create a block anytime, rather than in if-else or rnn only; - -## Main function for the demo: -Generally, the user of GAN just need to the following things: -- Define an object as DCGAN class; -- Build the DCGAN model; -- Specify two optimizers for two different losses with respect to different parameters. -```python -# pd for short, should be more concise. -from paddle.v2 as pd -import numpy as np -import logging - -if __name__ == "__main__": - # dcgan class in the default graph/block - # if we use dependency engine as tensorflow - # the codes, will be slightly different like: - # dcgan = DCGAN() - # dcgan.build_model() - with pd.block() as def_block: - dcgan = DCGAN() - dcgan.build_model(def_block) - - # load mnist data - data_X, data_y = self.load_mnist() - - # Two subgraphs required!!! - with pd.block().d_block(): - d_optim = pd.train.Adam(lr = .001, beta= .1) - d_step = d_optim.minimize(dcgan.d_loss, dcgan.theta_D) - with pd.block.g_block(): - g_optim = pd.train.Adam(lr = .001, beta= .1) - g_step = pd.minimize(dcgan.g_loss, dcgan.theta_G) - - # executor - sess = pd.executor() - - # training - for epoch in xrange(10000): - for batch_id in range(N / batch_size): - idx = ... - # sample a batch - batch_im, batch_label = data_X[idx:idx+batch_size], data_y[idx:idx+batch_size] - # sample z - batch_z = np.random.uniform(-1., 1., [batch_size, z_dim]) - - if batch_id % 2 == 0: - sess.run(d_step, - feed_dict = {dcgan.images: batch_im, - dcgan.y: batch_label, - dcgan.z: batch_z}) - else: - sess.run(g_step, - feed_dict = {dcgan.z: batch_z}) -``` - -# More thinking about dependency engine v.s. block design: -- What if we just want to run an intermediate result? Do we need to run the whole block/graph? -- Should we call eval() to get the fake images in the first stage? And then train the discriminator in the second stage? diff --git a/develop/doc/_sources/design/graph.md.txt b/develop/doc/_sources/design/graph.md.txt deleted file mode 100644 index 7519a65df835a39fe14f6ef45530afff170191ff..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/graph.md.txt +++ /dev/null @@ -1,70 +0,0 @@ -# Design Doc: Computations as a Graph - -A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before. - -This document explains that the construction of a graph as three steps: - -- construct the forward part -- construct the backward part -- construct the optimization part - -## The Construction of a Graph - -Let us take the problem of image classification as a simple example. The application program that trains the model looks like: - -```python -x = layer.data("images") -l = layer.data("label") -y = layer.fc(x) -cost = layer.mse(y, l) -optimize(cost) -train(cost, reader=mnist.train()) -``` - -### Forward Part - -The first four lines of above program build the forward part of the graph. - -![](images/graph_construction_example_forward_only.png) - -In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x. `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b, and the initialization operators. - -Initialization operators are kind of "run-once" operators -- the `Run` method increments a class data member counter so to run at most once. By doing so, a parameter wouldn't be initialized repeatedly, say, in every minibatch. - -In this example, all operators are created as `OpDesc` protobuf messages, and all variables are `VarDesc`. These protobuf messages are saved in a `BlockDesc` protobuf message. - -### Backward Part - -The fifth line `optimize(cost)` calls two functions, `ConstructBackwardGraph` and `ConstructOptimizationGraph`. - -`ConstructBackwardGraph` traverses the forward graph in the `BlockDesc` protobuf message and builds the backward part. - -![](images/graph_construction_example_forward_backward.png) - -According to the chain rule of gradient computation, `ConstructBackwardGraph` would - -1. create a gradient operator G for each operator F, -1. make all inputs, outputs, and outputs' gradient of F as inputs of G, -1. create gradients for all inputs of F, except for those who don't have gradients, like x and l, and -1. make all these gradients as outputs of G. - -### Optimization Part - -For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient. Here results in the complete graph: - -![](images/graph_construction_example_all.png) - -## Block and Graph - -The word block and graph are interchangable in the desgin of PaddlePaddle. A [Block](https://github.com/PaddlePaddle/Paddle/pull/3708) is a metaphore of the code and local variables in a pair of curly braces in programming languages, where operators are like statements or instructions. A graph of operators and variables is a representation of the block. - -A Block keeps operators in an array `BlockDesc::ops` - -```protobuf -message BlockDesc { - repeated OpDesc ops = 1; - repeated VarDesc vars = 2; -} -``` - -in the order that they appear in user programs, like the Python program at the beginning of this article. We can imagine that in `ops`, we have some forward operators, followed by some gradient operators, and then some optimization operators. diff --git a/develop/doc/_sources/design/graph_survey.md.txt b/develop/doc/_sources/design/graph_survey.md.txt deleted file mode 100644 index 6c6db08f463ae0a2b94fc4546f123a1d7c151870..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/graph_survey.md.txt +++ /dev/null @@ -1,232 +0,0 @@ -## Survey on Graph - -Neural network framework often provides symbolic API for users to write network topology conveniently. This doc manily focus on symbolic API in most popular neural network frameworks, and try to find out how to parse symbolic configuration to a portable file, such as protobuf or json. - -### Mxnet - -The core concept of symbolic API is `Symbol`. Mxnet implements `Symbol` class in C++, and export to Python using C-API. Please refer to the comments in Mxnet: - - -`Symbol` is help class used to represent the operator node in Graph. -`Symbol` acts as an interface for building graphs from different components like Variable, Functor and Group. `Symbol` is also exported to python front-end (while Graph is not) to enable quick test and deployment. Conceptually, symbol is the final operation of a graph and thus including all the information required (the graph) to evaluate its output value. - - -A simple network topology wrote by Symbol is as follows: - -```python -def get_symbol(num_classes=10, **kwargs): - data = mx.symbol.Variable('data') - data = mx.symbol.Flatten(data=data) - fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128) - act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu") - fc2 = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64) - act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu") - fc3 = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=num_classes) - mlp = mx.symbol.SoftmaxOutput(data = fc3, name = 'softmax') - return mlp -``` - - - -Varible here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own NodeAttr. There is a op field in NodeAttr class, when a Symbol represents Variable(often input data), the op field is null. - -Symbol contains a data member, std::vector outputs, and NodeEntry cantains a poniter to Node. We can follow the Node pointer to get all the Graph. - -And Symbol can be saved to a Json file. - -Here is a detailed example: - -``` ->>> import mxnet as mx ->>> data = mx.symbol.Variable('data') ->>> print data.debug_str() -Variable:data - ->>> data = mx.symbol.Flatten(data=data) ->>> print data.debug_str() -Symbol Outputs: - output[0]=flatten0(0) -Variable:data --------------------- -Op:Flatten, Name=flatten0 -Inputs: - arg[0]=data(0) version=0 - ->>> fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128) ->>> print fc1.debug_str() -Symbol Outputs: - output[0]=fc1(0) -Variable:data --------------------- -Op:Flatten, Name=flatten0 -Inputs: - arg[0]=data(0) version=0 -Variable:fc1_weight -Variable:fc1_bias --------------------- -Op:FullyConnected, Name=fc1 -Inputs: - arg[0]=flatten0(0) - arg[1]=fc1_weight(0) version=0 - arg[2]=fc1_bias(0) version=0 -Attrs: - num_hidden=128 - -``` - - -### TensorFlow - - -The core concept of symbolic API is `Tensor`. Tensorflow defines `Tensor` in Python. Please refer to the comments in TensorFlow: - -A `Tensor` is a symbolic handle to one of the outputs of an `Operation`. It does not hold the values of that operation's output, but instead provides a means of computing those values in a TensorFlow [Session](https://www.tensorflow.org/api_docs/python/tf/Session). - -A simple example is as follows: - -```python - # Build a dataflow graph. - c = tf.constant([[1.0, 2.0], [3.0, 4.0]]) - d = tf.constant([[1.0, 1.0], [0.0, 1.0]]) - e = tf.matmul(c, d) - - # Construct a `Session` to execute the graph. - sess = tf.Session() - - # Execute the graph and store the value that `e` represents in `result`. - result = sess.run(e) -``` - - -The main method of `Tensor` is as follows: - - -```python -@property -def op(self): - """The `Operation` that produces this tensor as an output.""" - return self._op - -@property -def dtype(self): - """The `DType` of elements in this tensor.""" - return self._dtype - -@property -def graph(self): - """The `Graph` that contains this tensor.""" - return self._op.graph - -@property -def name(self): - """The string name of this tensor.""" - if not self._op.name: - raise ValueError("Operation was not named: %s" % self._op) - return "%s:%d" % (self._op.name, self._value_index) - -@property -def device(self): - """The name of the device on which this tensor will be produced, or None.""" - return self._op.device -``` - - -Tensor can be taken as target to run by session. Tensor contains all the information of Graph, and tracks data dependency. - - -Here is a detailed example: - - -``` ->>> import tensorflow as tf ->>> c = tf.constant([[1.0, 2.0], [3.0, 4.0]]) ->>> print c.graph - ->>> d = tf.constant([[1.0, 1.0], [0.0, 1.0]]) ->>> print d.graph - ->>> e = tf.matmul(c, d) ->>> print e.graph - -``` - -### Dynet - - -The core concept of symbolic API is `Expression`, and Dynet defines `Expression` class in C++. - - -A simple example is as follows: - -```cpp -ComputationGraph cg; -Expression W = parameter(cg, pW); - -Expression in = input(cg, xs[i]); -Expression label = input(cg, ys[i]); -Expression pred = W * in; -Expression loss = square(pred - label); -``` - -The input data and parameter are also represented by Expression. Every basci Expression corresponds to a Node. And input data is also a Node. - -Expression has a data member ComputationGraph, and ComputationGraph will be modified in users' configuring process. Expression can be a running target, beacuse Expression contains all dependency. - - -Here is a detailed example: - -write topology in C++ - -``` -ComputationGraph cg; -Expression W = parameter(cg, pW); -cg.print_graphviz(); - -Expression pred = W * xs[i]; -cg.print_graphviz(); - -Expression loss = square(pred - ys[i]); -cg.print_graphviz(); -``` - -compile and print - -``` -# first print -digraph G { - rankdir=LR; - nodesep=.05; - N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"]; -} -# second print -digraph G { - rankdir=LR; - nodesep=.05; - N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"]; - N1 [label="v1 = v0 * -0.98"]; - N0 -> N1; -} -# third print -digraph G { - rankdir=LR; - nodesep=.05; - N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"]; - N1 [label="v1 = v0 * -0.98"]; - N0 -> N1; - N2 [label="v2 = -1.88387 - v1"]; - N1 -> N2; - N3 [label="v3 = -v2"]; - N2 -> N3; - N4 [label="v4 = square(v3)"]; - N3 -> N4; -} -``` - -### Conclusion - - -Actually, Symbol/Tensor/Expression in Mxnet/TensorFlow/Dynet are the same level concepts. We use a unified name Expression here, this level concept has following features: - -- Users wirte topoloy with symbolic API, and all return value is Expression, including input data and parameter. -- Expression corresponds with a global Graph, and Expression can also be composed. -- Expression tracks all dependency and can be taken as a run target diff --git a/develop/doc/_sources/design/if_else_op.md.txt b/develop/doc/_sources/design/if_else_op.md.txt deleted file mode 100644 index 26d140f06db4ecefa86be015eaa731ffddc6910c..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/if_else_op.md.txt +++ /dev/null @@ -1,51 +0,0 @@ -# The `IfElse` Operator - -PaddlePaddle's `IfElse` operator differs from TensorFlow's: - -- the TensorFlow version takes a scalar boolean value as the condition so that the whole mini-batch goes to either the true or the false branch, whereas -- the PaddlePaddle version takes a vector of boolean value as the condition, and instances corresponding to true values go to the true branch, those corresponding to false values go to the false branch. - -## Example - -The following PaddlePaddle program shows the usage of the IfElse operator: - -```python -import paddle as pd - -x = minibatch([10, 20, 30]) # shape=[None, 1] -y = var(1) # shape=[1], value=1 -z = minibatch([10, 20, 30]) # shape=[None, 1] -cond = larger_than(x, 15) # [false, true, true] - -ie = pd.ifelse() -with ie.true_block(): - d = pd.layer.add(x, y) - ie.output(d, pd.layer.softmax(d)) -with ie.false_block(): - d = pd.layer.fc(z) - ie.output(d, d+1) -o1, o2 = ie(cond) -``` - -A challenge to implement the `IfElse` operator is to infer those variables to be split, or, say, to identify the variable of the mini-batch or those derived from the mini-batch. - -An equivalent C++ program is as follows: - -```c++ -namespace pd = paddle; - -int x = 10; -int y = 1; -int z = 10; -bool cond = false; -int o1, o2; -if (cond) { - int d = x + y; - o1 = z; - o2 = pd::layer::softmax(z); -} else { - int d = pd::layer::fc(z); - o1 = d; - o2 = d+1; -} -``` diff --git a/develop/doc/_sources/design/infer_var_type.md.txt b/develop/doc/_sources/design/infer_var_type.md.txt deleted file mode 100644 index d9d5397becba2ef1806d9341cd49cd9aabbf4a6a..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/infer_var_type.md.txt +++ /dev/null @@ -1,78 +0,0 @@ -# Design Doc: InferVarType - -## The Problem Posed - -The variable in our design can hold variant types. Such as `LoDTensor` and `SelectedRows`. An operator should be able to inference the variable types of its output. - -For example, a `lookup table` operator takes two `LoDTensor`; one is a float tensor as the embedding table, the other is an int tensor as word ID. The gradient operator of `lookup table` will generate a `SelectedRows` as its output. A `sum` operator can take both `LoDTensor` and `SelectedRows` as its inputs and will generate a `LoDTensor` if any of its inputs is `LoDTensor`, otherwise, the `sum` operator will generate `SelectedRows` as its output. - -The variable type will be constant at runtime. Every variable's type can either be set by the user (input data and parameter) or be inferred by the operator in compile time. - -## Proposed Solution - -The `InferVarType` is a compile-time function which is registered to each operator. The inferface of that function is: - - -```c++ -using InferVarTypeFN = std::function< - void (const OpDescBind& /*op_desc*/, BlockDescBind* /*block*/)>; -``` - -It takes an operator description as its input and will write the output variable type and store them in block description. - -The `InferVarTypeFN` will be registered in `OpInfo`, to replace `infer_var_type_` field. The `OpInfo` should be - -```cpp -struct OpInfo { - InferVarTypeFN infer_var_type_; - ... -}; -``` - -The default `InferVarType` will set output type as `LoDTensor`. It can be done by `GetInferVarType()`. - -```cpp -void DefaultInferVarType(const OpDescBind& op_desc, BlockDescBind* block) { - // set the output type of variable as `LoDTensor`. - // ... -} - -struct OpInfo { - InferVarTypeFN infer_var_type_; - InferVarTypeFN GetInferVarType() const { - if (infer_var_type_) { - return infer_var_type_; - } else { - return DefaultInferVarType; - } - } -}; -``` - -## Register InferVarType - -We provide a thin base class for registering an `InferVarTypeFN`. To use a base class will ease the implementation of registry since we can detect the registry entry is an `InferVarTypeFN` or not. - -```cpp -class VarTypeInferer { -public: - virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const = 0; -} -``` - -Operator developers can write the specialize `VarTypeInferer` as follow. - -```cpp -class SpecialVarTypeInferer : public VarTypeInferer { -public: - virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const { - // .. own logic - } -} -``` - -Then user can register the `InferVarType` just like `GradOpDescMaker` and `OpInfoMaker`. - -``` -REGISTER_OPERATOR(some_op, OpType, SpecialVarTypeInferer, ...); -``` diff --git a/develop/doc/_sources/design/kernel_hint_design.md.txt b/develop/doc/_sources/design/kernel_hint_design.md.txt deleted file mode 100644 index a54b7da045e1a362626ef066f9ebb56af2c3181a..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/kernel_hint_design.md.txt +++ /dev/null @@ -1,57 +0,0 @@ -## Problem -In PaddlePaddle's [Design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md), one Operator may have multiple kernels. Users may have some personal preference to choose a certain type of kernel for an operator, such as `force_cpu` to choose a CPU kernel, `use_cudnn` to choose a CUDNN kernel, we need to provide a way for users to do this. - -In the current design, we use KernelType to describe one kernel. - -```cpp -struct KernelType { - Place place_; - DataType data_type_; - LayoutType layout_; -}; -``` - `place_` `data_type_` and `layout_` can be got from the input tensors of the operator, `GetActualKernelType(inputs)` use inputs to infer the proper kernel key that fit the incoming data, but users can not directly configure it. - -The [design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md) also provides a virtual method `GetExpectedKernelType` that user can overload and use to choose the KernelType they want to use. - -So we should send the information user defined in proto to `GetExpectedKernelType` for choosing a kernel. - -The problem is, how should we define and send the information for `GetExpectedKernelType` to use? - -## Solution - -### Potential choice -1. Do nothing, let the user add the information they want to operator‘s attribute and get them inside `GetExpectedKernelType`, this can work properly. But there is a little problem that users may define many kinds of hints for the same purpose, such as `force_cpu`, `use_cpu`, `cpu_kernel` to choose CPU kernel, and `use_cudnn`, `force_cudnn`, `cudnn_kernel` to choose CUDNN kernel. - -2. Pre-define all the needed option and use a single attr key such as `kernel_hint` for the user, this is not so flexible if the user wants to define some more kind of hint. - -### Final choice -To provide enough flexibility while avoiding confusion definition, we can define some global constants for these attribute names, such as `force_cpu`, `use_cudnn`, `use_mkldnn` for a user to choose. - -In C++ - -```cpp -const std::string kForceCPU = "force_cpu"; -const std::string kUseCUDNN = "use_cudnn"; -const std::string kUseMKLDNN = "use_mkldnn"; - -KernelType GetExpectedKernelType() { - if (Attr(kForceCPU)) { - return KernelType(CPUPlace, ...) - } else { - ... - } -} -``` - -In Python code - -```python -FORCE_CPU = core.kForceCPU() - -def xx_layer(..., force_cpu=false): - layer_helper = LayerHelper(...) - layer_helper.append_op( - type="xx", - attr={FORCE_CPU: force_cpu}) -``` diff --git a/develop/doc/_sources/design/kernel_selection.md.txt b/develop/doc/_sources/design/kernel_selection.md.txt deleted file mode 100644 index 9719e031c70979cd95400701efd30879662e19bc..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/kernel_selection.md.txt +++ /dev/null @@ -1,99 +0,0 @@ -## Background -Every operator has many kernels because there are multiple data types, places, data layout, library type that Fluid supports. We use the `OpKernelType ` to describe kernel types that operators can hold. - -The `OpKernelType ` is as follows: - -```cpp -struct OpKernelType { - Place place_; - DataType data_type_; - DataLayout data_layout_; - LibraryType library_type_; -}; -``` - -- The `place_` is a descriptor of the device, e.g., CPUPlace, CUDAPlace. - -- The `data_type_` is the data type that this kernel performs on, e.g., `FP32`, `INT64`. Note that one kernel may have inputs with different data types. However, it will be a major `data_type`. For example, the `cross_entropy` takes `int64` as it label, and `double`/`float` as its input logit and output cost. The major `data_type` of `cross_entropy` is `float` or `double`. - -- The `data_layout_ ` is useful for some computational library. One example is that MKLDNN uses many kinds of layout, such as `nChw8c`. Each kind of layout will invoke the different kernel. - -- The `library_type_` describes the computational library, e.g., `MKLDNN`, `CUDNN`. - -## Problem - -We register a kernel for every operator and every kernel type ideally. However, it is impracticable for the following situations. - -1. Some operators, like CRF, are complicated and inefficient to be implemented on GPU. The CRF operator will only have a CPU kernel. -2. Some operators will take too many memory. It is better to force them into CPU. However, the rest of operators in this neural network will be performed on GPU, i.e., model parallel problem. -3. Some layout and place are particular. One example is that MKLDNN uses `nChw8` and there is no other library uses `nChw8c`. - -Take one situation to give a detailed explanation, if we have two Operators: OP1 and OP2, OP1 has one output `op1_to_op2`, and `op1_to_op2` is the input of OP2. - -If OP1 and OP2 run on the same place(for example CPUPlace), then `op1_2_op2` can be used directly by OP2. - -``` -OP1(CPUPlace) - | - op1_2_op2 - | -OP2(CPUPlace) -``` - -If OP1 and OP2 run one different place, then OP2 cannot `use op1_2_op2` directly. - -Problems under these situations are similar. We can formalize this problem as follow. - -We register kernels with types $KT = \{kt_1, kt_2, kt_3, ...\}$ for one operator. The inputs of this operator should be run on kernel type $kt_{?}$, which the $kt_{?} \notin KT$. How to cast the input of this operator from $kt_{?}$ to any of kernel type in $KT$. - -## Solution: data transform - -It is clear that transforming inputs of an operator to adapt another kernel type is not related to the particular operator. So we should register these transformation methods as global methods. - -We can infer kernel type for each input of an operator. We let this kernel type as `actual kernel type for var`, which means this kernel type is the kernel type that can process this input variable. - -We can get a kernel type by 1) The configuration of operator description. (Users may want to force use `MKL` for `conv` operator). 2) The place of the current executor. (Executor is running on GPU). This kernel type is what we expect the operator will be performed on. We let this kernel type as `expect kernel type`. - -We transform the input data from `actual` to `expect` if the actual kernel type is not as same as expect kernel type. - -The algorithm is described as following - -```cpp -void OperatorWithKernel::Run( - const Scope& scope, - const platform::Place& place) const { - ExecutionContext ctx(...); - auto expected_kernel_key = this->GetExpectedKernelType(ctx); - - Scope& new_scope = scope.NewScope(); - - for (auto& var_name : this->Inputs()) { - auto* tensor_in = GetTensor(var_name); - auto kernel_type_for_var = this->GetKernelTypeForVar(...); - if (kernel_type_for_var.place_ != expected_kernel_key.place_) { - auto* trans_var = new_scope.Var(var_name); - auto* out = DataTransform(expected_kernel_key, - kernel_type_for_var, - *tensor_in); - CopyVariableWithTensor(...); - } - } - - auto kernel = kernels.find(expected_kernel_key); - kernel->Compute(ExecutionContext(...)); -} -``` - -then the actual process for the multi-device above will be: - -``` -OP1(CPUPlace) - | -op1_2_op2(on CPU) - | -[transform](from CPU to GPU) - | -op1_2_op2(on GPU) - | -OP2(CUDAPlace) -``` diff --git a/develop/doc/_sources/design/memory_optimization.md.txt b/develop/doc/_sources/design/memory_optimization.md.txt deleted file mode 100644 index 285464ada728d8f7a086a26beca6cfa4418e98e4..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/memory_optimization.md.txt +++ /dev/null @@ -1,217 +0,0 @@ -# Memory Optimization - - -## Problem - -In a lecture from Andrew Ng, he attributes the recent sucess of AI due to a combination of these: - -- Availability of Big Data -- Supercomputing power to process this Big Data over very large neural networks -- Modern algorithms - -Following graph shows the details: - -![](images/deep_learning.png) - -Larger model usually bring better performance. However, GPU memory is limited. For example, the memory size of a GTX TITAN X is only 12GB. To train complex and large models, we have to take care of memory usage. Besides, memory optimization is also necessary in both online/mobile inference. - -## Solution - -### Basic Strategy - -There are some basic strategies to improve memory usage, including in-place operations and memory sharing. - -#### In-place Operation -In a relu activation operator: - -$y = \max(x, 0)$ - -If the variable x is not used in any other operator, we can make an in-place operation. In other words, the memory block of variable y and variable x will be the same. In-place operations will save 50% memory occupancy immediately. - -#### Memory Sharing - -Not all operators support in-place operations. Memory sharing is a more general strategy. - -Following is an example: - -``` -a = op1(b, c); -d = op2(a) -e = op3(d, f) -``` - -In this case, variable a is no longer used, and op2 does not support in-place operation. After op2 finishes, we can put the memory of variable a to a memory pool. Then, variable e can share the memory of variable a from the pool. - - -### Live Variable Analysis - -It's not enough to only have some basic strategies. The pre-requisite of memory optimization is to know if a variable is still "live" after an operation. - -In our design, the neural network topology is defined as a program. Luckily, [live variable analysis](https://en.wikipedia.org/wiki/Live_variable_analysis) is a classic problem in compilers which can be used in many stages, such as register allocation. - -In compilers, the front end of the compiler translates programs into an intermediate language with an unbounded number of temporary variables. This program must run on a machine with a bounded number of registers. Two temporary variables a and b can fit into the same register, if a and b are never "in use" at the same time. Thus, many temporary variables can fit in few registers; if they don't all fit, the excess tempory variables can be kept in memory. - -Therefore, the compiler needs to analyze the intermediate-representation program to determine which temporary variables are in use at the same time. We say a variable is "live" if it holds a value that may be needed in the future, so this analysis is called liveness analysis. - -We can leran these techniques from compilers. There are mainly two stages to make live variable analysis: - -- construct a control flow graph -- solve the dataflow equations - - -#### Control Flow Graph -To perform analysis on a program, it is often useful to make a control flow graph. A [control flow graph](https://en.wikipedia.org/wiki/Control_flow_graph) (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. Each statement in the program is a node in the flow graph; if statemment x can be followed by statement y, there is an egde from x to y. - -Following is the flow graph for a simple loop. - -![](images/control_flow_graph.png) - -#### Dataflow Analysis - -Liveness of variable "flows" around the edges of the control flow graph; determining the live range of each variable is an example of a dataflow problem. [Dataflow analysis](https://en.wikipedia.org/wiki/Data-flow_analysis) is a technique for gathering information about the possible set of values calculated at various points in a computer program. - -A simple way to perform data-flow analysis of programs is to set up dataflow equations for each node of the control flow graph and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes. - -- Flow Graph Terminology - -A flow graph node has out-edges that lead to sucessor nodes, and in-edges that come from predecessor nodes. The set *pred[n]* is all the predecessors of node n, and *succ[n]* is the set of sucessors. -In former control flow graph, the out-edges of node 5 are 5 --> 6 and 5 --> 2, and *succ[5]* = {2, 6}. The in-edges of 2 are 5 --> 2 and 1 --> 2, and *pred[2]* = {1, 5}. - -- Uses and Defs - -An assignmemt to a variable or temporary defines that variable. An occurence of a variable on the right-hand side of an assginment(or in other expressions) uses the variable. We can define the *def* of a variable as the set of graph nodes that define it; or the *def* of a graph node as the set of variables that it defines; and the similarly for the *use* of a variable or graph node. In former control flow graph, *def(3)* = {c}, *use(3)* = {b, c}. - -- Liveness - -A variable is *live* on an edge if there is a directed path from that edge to a *use* of the variable that does not go through any *def*. A variable is *live-in* at a node if it is live on any of the in-edges of that node; it is *live-out* at a node if it is live on any of the out-edges of the node. - - -The calcution of liveness can be solved by iteration until a fixed pointer is reached. Following is the recursive formula: - -![](images/dataflow_equations.png) - -### Memory optimization transpiler - -At last, we take basic strategy and liveness analysis techniques learning from compilers to implement our memory optimization transpiler. - -#### add in-place attribute - -In-place is a built-in attribute of an operator. Since we treat in-place and other operators differently, we have to add an in-place attribute for every operator. - - -#### contruct control flow graph - -Following is the ProgramDesc protobuf of [machine translation](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_machine_translation.py) example. - -- Block0: - -``` -lookup_table -mul -... -while(sub-block idx 1) -... -array_to_lod_tensor -cross_entropy -... -while_grad(sub-block idx 2) -read_from_array -array_to_lod_tensor -... -``` - -- Block1 - -``` -read_from_array -read_from_array -... -write_to_array -increment -write_to_array -less_than -``` - -- Block2 - -``` -read_from_array -increment -... -write_to_array -write_to_array -``` - -We can transfer all the operators and variables in ProgramDesc to build a control flow graph. - -```python -class ControlFlowGraph(object): - def __init__(self, Program): - self._sucessors = defaultdict(set) - self._presucessors = defaultdict(set) - self._uses = defaultdict(set) - self._defs = defaultdict(set) - self._live_in = defaultdict(set) - self._live_out = defaultdict(set) - self._program = Program - - def build(self): - pass - - def dataflow_analysis(self): - pass - - def memory_optimization(self): - pass - - def get_program(self): - return self._program -``` - -#### Make dataflow analysis - -We follow the guide from compilers and try to solve the dataflow equation to get liveness of every variable. If the live-in of an operator node is different from the live-out, then we can make memory sharing. - -For example: - -``` -a = op1(b, c); -d = op2(a) -e = op3(d, f) -``` - -The dataflow analysis result is: - -``` -live_in(op1) = {b, c, f} -live_out(op1) = {a, f} - -live_in(op2) = {a, f} -live_out(op2) = {d, f} - -live_in(op3) = {d, f} -live_out(op3) = {} -``` - -After op1, we can process variable b and variable c; After op2, we can process variable a. After op3, we can process variable d and variable f. - -#### memory sharing policy - -A memory pool will be mantained in the stage of memory optimization. Each operator node will be scanned to determine memory optimization is done or not. If an operator satifies the requirement, following policy will be taken to handle input/output variables. - -``` -if op.support_inplace(): - i --> pool - pool --> o -else: - pool --> o - i --> pool -``` - - - -## Reference - -- [Lecture Notes From Artificial Intelligence Is The New Electricity By Andrew Ng](https://manavsehgal.com/lecture-notes-from-artificial-intelligence-is-the-new-electricity-by-andrew-ng-4712dcbf26e5) -- Modern compiler implementation in ML, by Andrew W. Appel -- [Optimizing Memory Consumption in Deep learning](https://mxnet.incubator.apache.org/architecture/note_memory.html) diff --git a/develop/doc/_sources/design/mkl/mkl_packed.md.txt b/develop/doc/_sources/design/mkl/mkl_packed.md.txt deleted file mode 100644 index 0123315ad4368e68b377f66119949bfd6c1c7860..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/mkl/mkl_packed.md.txt +++ /dev/null @@ -1,108 +0,0 @@ -# Intel® MKL Packed on PaddlePaddle: Design Doc - - -## Contents - -- [Overview](#overview) -- [Key Points](#key-points) - - [Background](#background) - - [Solution](#solution) -- [Actions](#actions) - - [CMake](#cmake) - - [Layers](#layers) - - [Unit Tests](#unit-tests) - - [Python API](#python-api) - - [Benchmarking](#benchmarking) - - -## Overview -我们计划将 Intel® MKL 中引入的 GEMM Packed APIs\[[1](#references)\] 集成到 PaddlePaddle 中,充分发挥英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。 -现阶段的优化主要针对 Recurrent Neural Network(以下简称RNN)相关层(包括`RecurrentLayer`, `GatedRecurrentLayer`和`LstmLayer`), 以及 PaddlePaddle V1 API。 - -## Key Points - -### Background -目前PaddlePaddle采用了 Intel® MKL库的[cblas_?gemm](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm)函数,这个函数本身会在计算前将原数据转换为更适合英特尔平台的内部格式。 - -1. 转换耗时 \ -这一数据格式的转换操作(Packing),在问题本身的计算量比较小的时候,显得相对来说较为耗时。例如在DeepSpeech2 \[[2](#references)\] 的Vanilla RNN部分中,矩阵大小是`batch_size * 2048`。 -2. 转换冗余 \ -由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。 - -为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API: - * [cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc) - * [cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack) - * [cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute) - * [cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free) - -通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。 - -### Solution -在RNN的情况下,同一次前向、后向(forward/backward)过程中所有时间步(time step)共享同一个权重(weight)。当只做推断(inference)时,各次前向之间也都使用了相同的权重,没有必要在每次前向中每个时间步的计算时对权重进行重复的Packing操作。 - -我们通过使用新引入的GEMM Packed APIs,在层初始化的时候,先完成对权重的Packing操作,然后在前向,后向时复用已经转换过的权重,并在每次权重更新后,对新的权重进行转换用于下次迭代。 - -* 优化前,对于序列长度(sequence length)为`T`的网络模型(model), `N`次迭代执行的转换次数为: - - `inference`: `N * T` - - `training`: `2 * N * T` -* 优化后,对于同样设置的网络模型,其转换次数减少至: - - `inference`: `1` - - `training`: `2 * N` - -## Actions - -添加的相关文件和目录结构如下: - -```txt -PaddlePaddle/Paddle -├── ... -└── paddle/ - ├── ... - └── gserver/ - ├── ... - ├── layers/ - │ ├── ... - │ ├── MKLPackedRecurrentLayer.* - | ├── MKLPackedGatedRecurrentLayer.* - | ├── MKLPackedLstmLayer.* - | └── MKLPackedGemm.h - └── tests/ - ├── ... - └── test_MKLPacked.cpp -``` - -### CMake -在对应的`CMakeLists.txt`中根据`WITH_MKL`是否打开,来决定是否开启MKL Packed相关功能。 - -### Layers -所有的`MKLPacked*Layer`都继承于PaddlePaddle的基类`Layer`, 并添加头文件 `MKLPackedGemm.h`,该文件对相关GEMM Packed APIs做了封装。 - -### Unit Tests -我们会添加`test_MKLPacked.cpp`用于MKL Packed优化后layer的测试。 -对于每一个新加的RNN layer,我们会对比如下2个方面: -1. 对比优化后layer自身,sequence mode(`rnn_use_batch=false`)与batch mode(`rnn_use_batch=true`)的结果。 -2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。 - -### Python API -计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。 - -同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。 - -具体实现方式比如: - -```python -use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0))) -if use_mkl_packed: - self.layer_type = mkl_packed_* -``` - -所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。 - - -### Benchmarking -会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。 - -## References -1. [Introducing the new Packed APIs for GEMM](https://software.intel.com/en-us/articles/introducing-the-new-packed-apis-for-gemm) -2. [DeepSpeech2 on PaddlePaddle](https://github.com/PaddlePaddle/DeepSpeech#deepspeech2-on-paddlepaddle) - diff --git a/develop/doc/_sources/design/mkl/mkldnn.md.txt b/develop/doc/_sources/design/mkl/mkldnn.md.txt deleted file mode 100644 index e2fe1e6b26ffa73fda81863abfadf697c0acbfcf..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/mkl/mkldnn.md.txt +++ /dev/null @@ -1,210 +0,0 @@ -# Intel® MKL-DNN on PaddlePaddle: Design Doc - -我们计划将英特尔深度神经网络数学库[Intel MKL-DNN](https://github.com/01org/mkl-dnn) -(Intel Math Kernel Library for Deep Neural Networks)集成到PaddlePaddle, -充分展现英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。 - -
-
-Figure 1. PaddlePaddle on IA -
- -近期目标 - -- 完成常用Layer的MKL-DNN实现。 -- 完成常见深度神经网络VGG,GoogLeNet 和 ResNet的MKL-DNN实现。 - -目前的优化,主要针对PaddlePaddle在重构之前的代码框架以及V1的API。 -具体的完成状态可以参见[这里](https://github.com/PaddlePaddle/Paddle/projects/21)。 - -## Contents - -- [Overview](#overview) -- [Actions](#actions) - - [CMake](#cmake) - - [Matrix](#matrix) - - [Layers](#layers) - - [Activations](#activations) - - [Parameters](#parameters) - - [Gradients](#gradients) - - [Unit Tests](#unit-tests) - - [Python API](#python-api) - - [Benchmarking](#benchmarking) - - [Others](#others) -- [Design Concerns](#design-concerns) - -## Overview - -我们会把MKL-DNN会作为第三方库集成进PaddlePaddle,与其他第三方库一样,会在编译PaddlePaddle的时候下载并编译MKL-DNN。 - -同时,为了进一步提升PaddlePaddle在基本数学运算的计算速度,我们也将MKLML即(MKL small library\[[1](#references)\]) -作为另一个第三方库集成进PaddlePaddle,它只会包括生成好的动态库和头文件。 - -MKL,MKLML以及MKL-DNN三者关系如下表: - -| Name | Open Source | License | Descriptions | -| :---------- | :--------------- | :---------- | :------------ | -| MKL | No | Proprietary | Accelerate math processing routines | -| MKLML | No | Proprietary | Small package of MKL, especially for Machine Learning | -| MKL-DNN | Yes | Apache 2.0 | Accelerate primitives processing routines especially for Deep Neural Networks | - -MKLML可以与MKL-DNN共同使用,以此达到最好的性能。 - -
-
-Figure 2. PaddlePaddle with MKL Engines -
- -## Actions - -添加的相关文件和目录结构如下: - -```txt -PaddlePaddle/Paddle -├── ... -├── cmake/ -│ ├── external/ -│ │ ├── ... -│ │ ├── mkldnn.cmake -│ │ └── mklml.cmake -└── paddle/ - ├── ... - ├── math/ - │ ├── ... - │ └── MKLDNNMatrix.* - └── gserver/ - ├── ... - ├── layers/ - │ ├── ... - │ └── MKLDNN*Layer.* - ├── activations/ - │ ├── ... - │ └── MKLDNNActivations.* - └── tests/ - ├── ... - ├── MKLDNNTester.* - └── test_MKLDNN.cpp -``` - -### CMake -在`CMakeLists.txt`中提供一个与MKL有关的总开关:`WITH_MKL`,它负责决定编译时是否使用MKLML和MKL-DNN - -- `WITH_MKLML` 控制是否使用MKLML库。 -当打开`WITH_MKL`时,会自动使用MKLML库作为PaddlePaddle的CBLAS和LAPACK库,同时会开启Intel OpenMP用于提高MKLML的性能。 -编译时会把对应的头文件和库放在`build/third_party/install/mklml/*`目录下对应的地方。 -MKLML的库目前都是动态库,主要包括`libiomp5.so`和`libmklml_intel.so`。 -- `WITH_MKLDNN` 控制是否使用MKL-DNN。 -当开启`WITH_MKL`时,会自动根据硬件配置[[2](#references)]选择是否编译MKL-DNN。 -编译时会把对应的头文件和库放在`build/third_party/install/mkldnn/*`目录下对应的地方。 -MKL-DNN的库目前只有动态库`libmkldnn.so`。 - -### Matrix -目前在PaddlePaddle中数据都是以`NCHW`的格式存储,但是在MKL-DNN中的排列方式不止这一种。 -所以我们定义了一个`MKLDNNMatrix`用于管理MKL-DNN数据的不同格式以及相互之间的转换。 - -
-
-Figure 3. MKLDNNMatrix -
- -### Layers -所有MKL-DNN的Layers都会继承于`MKLDNNLayer`,该类继承于PaddlePaddle的基类`Layer`。 -在`MKLDNNLayer`中会提供一些必要的接口和函数,并且会写好`forward`和`backward`的基本逻辑, -子类只需要使用定义好的接口,实现具体的函数功能即可。 - -
-
-Figure 4. MKLDNNLayer -
- -每个MKLDNNLayer都包含用于内部存储和外部存储的一系列MKLDNNMatrix: - -- 内部存储(internel memory):`inVal_`,`inGrad_`,`outVal_`和`outGrad_`,分别代表输入数据,输入梯度,输出数据和输出梯度。 -- 外部存储(external memory):都是以ext开头,比如`extInVal_`和`extInGrad_`,它们主要是用于, -当数据格式与PaddlePaddle默认的`NCHW`格式不匹配时,转换内存的工作。 -需要注意的是,PaddlePaddle的activation会直接使用`output_.value`和`output_.grad`, -所以`extOutVal_`和`extOutGrad_`必须分别与`output_.value`和`output_.grad`共享内存, -如果不需要外部存储用于转换,那么对应的内部存储也会与它们共享内存。 -- 转换函数(resetXXX): 包括`resetInValue`,`resetInGrad`,`resetOutValue`和`resetOutGrad`, -表示对输入数据,输入梯度,输出数据和输出梯度的转换。 -这些函数会根据输入参数重新设置内部和外部存储,当然这两者也可以相等,即表示不需要转换。 - -注意:每个`MKLDNNlayer`的子类只需要使用内部存储就可以了,所有外部的转换工作都会在reset系列函数中都准备好。 - -### Activations -在重构前的PaddlePaddle中,激活函数是独立于`Layer`的概念,并且输入输出都是共用一块内存, -所以添加了对应的`MKLDNNActivation`来实现,方式类似于`MKLDNNLayer`。 - -### Parameters -对于有参数的层,我们会保证`MKLDNNLayer`使用的参数与PaddlePaddle申请的buffer共用一块内存。 -如果存在数据排列格式不一样的情况时,我们会在网络训练之前把格式转换为MKL-DNN希望的格式, -在训练结束的时候再保存为PaddlePaddle的格式,但是整个训练过程中不需要任何转换。 -这样既使得最终保存的参数格式与PaddlePaddle一致,又可以避免不必要的转换。 - -### Gradients -由于MKL-DNN的操作都是直接覆盖的形式,也就是说输出的结果不会在原来的数据上累加, -这样带来的好处就是不需要一直清空memory,节省了不必要的操作。 -但是注意的是,当网络出现分支且在`backward`的时候,需要累加不同Layer传过来的梯度。 -所以在`MKLDNNlayer`中实现了一个merge的方法,此时每个小分支的`Input Gradient` -会先临时保存在`MKLDNNMatrix`中,由分支处的Layer负责求和,并把结果放到当前层的`output_.grad`中。 -所以整体上,在实现每个子类的时候就不需要关心分支的事情了。 - -
-
-Figure 5. Merge Gradients -
- -### Unit Tests -我们会添加`test_MKLDNN.cpp`和`MKLDNNTester.*`用于MKL-DNN的测试。 -测试分为每个Layer(或Activation)的单元测试和简单网络的整体测试。 -每个测试会对比PaddlePaddle中CPU算出的结果与MKL-DNN的结果,小于某个比较小的阈值认为通过。 - -### Python API -目前只考虑**v1 API**。 - -计划在`python/paddle/trainer/config_parser.py`里面添加`use_mkldnn`这个选择,方便用户选择使用MKL-DNN的layers。 - -具体实现方式比如: - -```python -use_mkldnn = bool(int(g_command_config_args.get("use_mkldnn", 0))) -if use_mkldnn - self.layer_type = mkldnn_* -``` - -所有MKL-DNN的`layer_type`会以*mkldnn_*开头,这些会在`MKLDNN*Layer`注册layer的时候保证,以示区分。 - -同时,会在`paddle/utils.Flags`中添加一个`use_mkldnn`的flag,用于选择是否使用MKL-DNN的相关功能。 - -### Benchmarking -会添加相应的脚本在[这里](https://github.com/PaddlePaddle/Paddle/tree/develop/benchmark/paddle/image),用于测试和对比在使用MKL-DNN前后的CNN网络性能。 -测试的性能对比结果会在[IntelOptimizedPaddle.md](https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/IntelOptimizedPaddle.md) - -### Others -1. 如果在使用MKL-DNN的情况下,会把CPU的Buffer对齐为4096,具体可以参考MKL-DNN中的[memory](https://github.com/01org/mkl-dnn/blob/master/include/mkldnn.hpp#L673)。 -2. 深入PaddlePaddle,寻找有没有其他可以优化的可能,进一步优化。比如可能会用OpenMP改进SGD的更新性能。 - -## Design Concerns - -为了更好的符合PaddlePaddle的代码风格\[[3](#references)\],同时又尽可能少的牺牲MKL-DNN的性能\[[4](#references)\]。 - -我们总结出一些特别需要注意的点: - -1. 使用**deviceId_**。为了尽可能少的在父类Layer中添加变量或者函数, -我们决定使用已有的`deviceId_`变量来区分layer的属性,定义`-2`为`MKLDNNLayer`特有的设备ID。 -2. 重写父类Layer的**init**函数,修改`deviceId_`为`-2`,代表这个layer是用于跑在MKL-DNN的环境下。 -3. 创建`MKLDNNBase`,定义一些除了layer和memory相关的类和函数。 -包括MKL-DNN会用到`MKLDNNStream`和`CPUEngine`,和未来可能还会用到`FPGAEngine`等。 -4. 如果MKL-DNN layer的后面接有cpu device,那么就会使`output_.value`与`extOutVal_`共享内存, -同时数据格式就是`NCHW`,这样下一个cpu device就能拿到正确的数据。 -在有普通的CPU layer时, `extOutVal_`和`extOutGrad_`的格式始终是`NCHW`或者`NC`。 - -## References -1. [MKL small library](https://github.com/01org/mkl-dnn#linking-your-application)是[Intel MKL](https://software.intel.com/en-us/mkl)的一个子集。 -主要包括了深度学习相关的数学原语与操作,一般由MKL-DNN在发布[新版本](https://github.com/01org/mkl-dnn/releases)时一起更新。 -2. [MKL-DNN System Requirements](https://github.com/01org/mkl-dnn#system-requirements)。 -目前在PaddlePaddle中,仅会在支持AVX2指令集及以上的机器才使用MKL-DNN。 -3. [原来的方案](https://github.com/PaddlePaddle/Paddle/pull/3096)会引入**nextLayer**的信息。 -但是在PaddlePaddle中,无论是重构前的layer还是重构后的op,都不会想要知道next layer/op的信息。 -4. MKL-DNN的高性能格式与PaddlePaddle原有的`NCHW`不同(PaddlePaddle中的cuDNN部分使用的也是`NCHW`,所以不存在这个问题)。 -所以需要引入一个转换方法,并且只需要在必要的时候转换这种格式,才能更好的发挥MKL-DNN的性能。 diff --git a/develop/doc/_sources/design/mkl/mkldnn_fluid.md.txt b/develop/doc/_sources/design/mkl/mkldnn_fluid.md.txt deleted file mode 100644 index bef126f3f0577b69f646dfe5d10539b372c6a8a5..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/mkl/mkldnn_fluid.md.txt +++ /dev/null @@ -1,149 +0,0 @@ -# Design Doc: Add MKLDNN Kernel in Fluid Operator - -## Principles - -First of all, we should follow some basical principles like: -1. [How to write a new operator](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md). We are trying to add a new kind of kernel into operators, so basically we should follow this doc. -2. [Supporting new Device/Library](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/support_new_device.md). Since MKLDNN is a new library to fluid, we should add `MKLDNNDeviceContext` and maybe `mkldnn_helper.h`, just like [cudnn_helper.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/cudnn_helper.h). -3. [Switch Kernel](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md). Another important point is that we should ensure the data synchronization between different kernel types, which is this [topic](https://github.com/PaddlePaddle/Paddle/issues/6549). So basically we should override `GetExpectedKernelType` and `trans` functions to support switching kernels. -4. [The Keys of Operator Kernel Type](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md). Kernel Type is a pivotal conception which can record the `Place`, `Library`, `DataType` and `Layout`. - -## Sulution - -In general, there are four parts we should follow to run a MKL-DNN primitive. -- Create a primitive descriptor that describe this operator -- Create a primitive itself by primitive descriptor and the engine -- Create all memory buffers that primitive needed -- Launch a stream to execute the primitive created -More details can refer to [here](http://01org.github.io/mkl-dnn). - -It's better to avoid reinitialization of primitives and memory handles in the first three stages in every iteration. \ -So we plan to create a map to record all the `primitive` and `memory`, which should not take too much memories as discussed [here](https://github.com/PaddlePaddle/Paddle/issues/6822). - -It's assumed that following three conditions should be satisfied. -1. there is a unique key for each operator instance. May be the actual name of `Output Tensor`. -2. the `Input Tensor` inside `Compute` function is the one after converted. -3. we can get the phase(eg. `is_test`) inside `Compute` function, otherwise we need to expose this attribue to user. - -### Compute -The algorithm of `Compute` would be described as follow, let's take conv like an example. - -```c++ - - PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), "It must use CPUPlace."); - PADDLE_ENFORCE(platform::is_mkldnn_library(ctx.GetLibrary()), "It must use MKLDNN Library."); - - auto& dev_ctx = ctx.template device_context(); - - // find primitive by unique key from mkldnn context - // the op_key should be a unique name of this op instance - auto& p = dev_ctx.findPrimitive(op_key + "_fwd"); - - // assuming the input tensor inside this compute function is the one after converted - // this point should be guarantee by another mechanism - auto& i = dev_ctx.findMemory(op_key + "_input"); - - if (p == nullptr || i == nullptr || inputSizeChanged(p, i)) { - auto fwd_primitive_desc = createPrimitiveDesc(ctx); - auto* input = ctx.Input("Input"); - auto* filter = ctx.Input("Filter"); - auto* output = ctx.Output("Output"); - shared_ptr in(new mkldnn::memory(fwd_primitive_desc->src_primitive_desc(), input->data())); - shared_ptr wgt(new mkldnn::memory(fwd_primitive_desc->weights_primitive_desc(), filter->data())); - shared_ptr out(new mkldnn::memory(fwd_primitive_desc->dst_primitive_desc(), output->mutable_data(ctx.GetPlace()))); - shared_ptr fwd_primitive(new mkldnn::conv_fwd(*fwd_primitive_desc, *in, *wgt, *out)); - - dev_ctx.addMemory(op_key+"_input", in); - dev_ctx.addMemory(op_key+"_output", out); - dev_ctx.addMemory(op_key+"_filer", wgt); - dev_ctx.addPrimitive(op_key+"_fwd", fwd_primitive); - dev_ctx.addPrimitiveDesc(op_key+"_fwd_PD", fwd_primitive_desc); - } - - p = dev_ctx.findPrimitive(op_key + "_fwd"); - - PADDLE_ENFORCE(p, "Should have forward Primitive"); - PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_input"), "Should have input memory"); - PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_output"), "Should have output memory"); - PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_filter"), "Should have filter memory"); - PADDLE_ENFORCE(dev_ctx.findPrimitiveDesc(op_unique_key+"_fwd_PD"), "Should have forward PrimitiveDesc"); - dev_ctx.submit(p); - dev_ctx.execute(); // the convert primitive should have already contained. - -``` - -The `createPrimitiveDesc` returns the primitive descripotor of this operator, would be like this: -```c++ - auto* input = ctx.Input("Input"); - auto* filter = ctx.Input("Filter"); - auto* output = ctx.Output("Output"); - std::vector strides = ctx.Attr>("strides"); - std::vector paddings = ctx.Attr>("paddings"); - std::vector dilations = ctx.Attr>("dilations"); - int groups = ctx.Attr("groups"); - algorithm algo = static_cast(ctx.Attr("convolution_algorithm_option")); - prop_kind pk = ctx.Attr("is_test") ? prop_kind::forward_inference : prop_kind::forward_training; - - auto fwd_desc = mkldnn::conv_fwd::desc(/* all the setting above*/); - shared_ptr fwd_primitive_desc(new mkldnn::conv_fwd::primitive_desc(fwd_desc, ctx.getEngine())); - - return fwd_primitive_desc; - } -``` - -### MKLDNNDeviceContext -`MKLDNNDeviceContext`, which is very straightforward, should contain some base information like: `stream`, `engine` and the map needed. - - -### mkldnn_helper -Some functions would be put in `paddle/platform/mkldnn_helper.h`. -- create MKLDNN memories -- create MKLDNN primitives -- error check function -- etc - - -### Kernel Switch -We should `reorder` the different Layout from other device or to other device. `GetExpectedKernelType` and `trans` functions can help us to implement it. - -`GetExpectedKernelType` should get the context, and this operator can return the best `KernelType`. -`trans` would be like this: - -```c++ -void trans(inputs, ctx) override { - if (NoNeedTrans()) { - return; - } - // find reorder primitive by op_key from context - auto& dev_ctx = ctx.template device_context(); - auto& p = dev_ctx.findPrimitive(op_key + "_reorder_input"); - auto& i = dev_ctx.findMemory(op_key + "_src_input"); - - if (p == nullptr || i == nullptr || changeSized(i, input)) { - auto prim = createPrimitiveDesc(ctx); - auto src = createMemory(memoryDesc(input->dims(), actual_layout), input->data); - auto newbuffer = paddle::memory::Alloc(ctx.GetPlace(), input->size_in_bytes()); - auto dst = createMemory(p->expected_desc(), newbuffer->data); - auto reorder_primitive(new mkldnn::reorder(src, dst)); - - dev_ctx.addMemory(op_key+"_src_input", src); - dev_ctx.addMemory(op_key+"_input", dst); - dev_ctx.addPrimitive(op_key+"_reorder_input", reorder_primitive); - } - - p = dev_ctx.findPrimitive(op_key + "_reorder_input"); - PADDLE_ENFORCE(p, "Should have Reorder Primitive"); - dev_ctx.submit(p); - if (! this->isMKLDNNKernel()) { - // execute immediately only if this is not mkldnn kernel function. - // otherwise, it can be executed with the operator primitive in Compute - dev_ctx.stream(); - } - // after submit, the input tensor in ExecutionContext should be changed as the converted one - // there should be another mechanism to ensure this -} -``` - -### Unit Test -All the functions should be tested corresponding. -TBD diff --git a/develop/doc/_sources/design/model_format.md.txt b/develop/doc/_sources/design/model_format.md.txt deleted file mode 100644 index e29129fddf775939c9f7a8b49d850d523e6e5a45..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/model_format.md.txt +++ /dev/null @@ -1,36 +0,0 @@ -# Design Doc: Model Format - -## Motivation - -A model is an output of the training process. One complete model consists of two parts, the **topology** and the **parameters**. In order to support industrial deployment, the model format must be self-complete and must not expose any training source code. - -As a result, In PaddlePaddle, the **topology** is represented as a [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/doc/design/program.md), which describes the model structure. The **parameters** contain all the trainable weights in the model. We must support large size parameters and efficient serialization/deserialization of parameters. - -## Implementation - -The topology is saved as a plain text in a detailed self-contain protobuf file. - -The parameters are saved as a binary file. As we all know, the protobuf message has a limit of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We have done a [benchmark experiment](https://github.com/PaddlePaddle/Paddle/pull/4610), which shows that protobuf is not fit for the task. - -As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of [LoDTensorDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99). We save the DescProto as the byte string header. It contains all the necessary information, such as the `dims`, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is, - -The table below shows a tensor's byte view in detail. Note that all the signed values are written in the little-endian format. - -|field name | type | description | -| --- | --- | --- | -| version | uint32_t | Version of saved file. Always 0 now. | -| tensor desc length | uint32_t | TensorDesc(Protobuf message) length in bytes. | -| tensor desc | void* | TensorDesc protobuf binary message | -| tensor data | void* | Tensor's data in binary format. The length of `tensor_data` is decided by `TensorDesc.dims()` and `TensorDesc.data_type()` | -| lod_level | uint64_t | Level of LoD | -| length of lod[0] | uint64_t | [Optional] length of lod[0] in bytes. | -| data of lod[0] | uint64_t* | [Optional] lod[0].data() | -| ... | ... | ... | - - - -## Summary - -- We introduce a model format. -- The model represented by its forward-pass computation procedure is saved in a **ProgramDesc** protobuf message. -- A bunch of specified format binary tensors describe the **parameters**. diff --git a/develop/doc/_sources/design/multi_language_interface/00.why_plain_c.md.txt b/develop/doc/_sources/design/multi_language_interface/00.why_plain_c.md.txt deleted file mode 100644 index a1443093342c5a3ed698fb6b52a751dfc7cb5319..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/multi_language_interface/00.why_plain_c.md.txt +++ /dev/null @@ -1,118 +0,0 @@ -# Paddle多语言接口实现 -## 背景 - -Paddle需要一个多语言接口,这个接口需要做到: - -* 有标准的,良好的文档 - * 例如Python可以使用[Sphinx](http://www.sphinx-doc.org/en/stable/)生成API文档,golang可以使用[GoDoc](https://godoc.org/golang.org/x/tools/cmd/godoc)生成文档。这都需要这个接口按照约定俗成的规则来注释完备。 -* 不同语言的接口适应不同语言的特性 - * 例如Java与Python的错误处理是直接扔出来Exception,而对于golang错误处理应该使用返回值。 - -## 基本要求 - -Paddle的多语言接口实现包括一下几个方面: - -* 我们使用动态库来分发Paddle。在这个动态库中不嵌入任何其他语言的解释器,也不使用其他动态库。 -* 这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号。 -* 不导出Paddle内部的结构体、类,仅仅使用`void*`指针作为类型的句柄(handler)。 -* 不使用SWIG这种代码生成器,而是手写多语言绑定。 - - -## 原因 - -### 使用动态库来分发Paddle - -* Paddle的链接方式比较复杂 - * 如果用户要把Paddle的静态库(libpaddle.a)链接到自己的程序里,得使用 `--whole-archive` (for GCC) 或者 `--force_load` (for Clang) 参数,来确保把 libpaddle.a 里所有的符号都写入自己的程序的二进制文件里。这是因为 Paddle 的源码里使用了[object factory design pattern](http://stackoverflow.com/a/1310326/724872)。 -* 编译型语言,例如C/C++使用静态库和动态库难度差不多。但是解释性语言,例如[Python](http://stackoverflow.com/questions/19560594/how-to-import-static-library-in-python)或者[Java](http://stackoverflow.com/questions/24493337/linking-static-library-with-jni),只能调用Paddle的动态库,否则得把Paddle静态库链接到解释器里。 - * 解释性语言实际运行的二进制是解释器本身,如果调用静态库只能将静态库与解释器链接。例如对于Java来说,便是将静态库加入JVM中。这对于通常的Java的开发者来说,是不常见的做法。 - -### 动态库中不嵌入任何其他语言的解释器 - -* 目前Paddle的进程模型是C++内部驱动Python解释器进行模型配置解析和数据读取 -* 我们最终的动态库中不嵌入Python或者其他任何语言的解释器。模型配置解析,数据读取均交由其他语言完成 - -现阶段Paddle有一个问题是,Paddle内嵌的Python解释器和外部使用的Python如果版本不同,会直接报错退出。 - -### Paddle动态库中,不引用其他动态库 - -* 即这个动态库是不依赖于其他任何文件的,可以在任何机器上执行的。 - -### 这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号 - -* 由于C++编译器没有[名字修饰](https://en.wikipedia.org/wiki/Name_mangling#C.2B.2B)的规范,不同版本的编译器之间,对于同一段C++代码生成的符号可能不一致。而多语言接口需要直接读取生成的二进制(动态库),需要有稳定的导出符号。 -* C语言是有导出符号的标准的,并且在常见的平台上,都是ABI调用标准的。 -* 大多数语言都支持使用C语言API -* 使用C99而不使用C89,是因为C99支持[Fixed-width integer types](https://en.wikipedia.org/wiki/C_data_types#Fixed-width_integer_types)和[Boolean type](https://en.wikipedia.org/wiki/C_data_types#Boolean_type)。 -* 使用C99而不使用C11的原因是,[C11](https://en.wikipedia.org/wiki/C11_(C_standard_revision))并没有Paddle特别需要的特性,且C99相对于C11使用更加广泛。 - -### 不导出Paddle内部的结构体、类,仅仅使用`void*`指针作为类型的句柄(handler) - -* Paddle内部的类为C++书写,直接导出到C的接口比较困难。 -* 在C-API中使用`void*`来表示Paddle内部类。再在每一个API中自己检查类型。 - -在C的头文件 `paddle_matrix.h` 中: - -```C -typedef void* paddle_matrix; -typedef int paddle_error; - -extern "C" -paddle_error paddle_matrix_get_shape(paddle_matrix matrix, - uint64_t* width, - uint64_t* height); -``` -而在CPP里面实现这个C的接口,文件 `paddle_matrix.cpp` - -```cpp -#include "paddle/math/matrix.h" -extern "C" -paddle_error paddle_matrix_shape(paddle_matrix matrix, - uint64_t *width, - uint64_t *height) { - auto m = (paddle::capi::CMatrix*)(matrix); - *width = m->width(); - *height = m->height(); -} -``` - -其中`paddle/capi/CMatrix.hpp`文件内容为: - -```cpp -namespace paddle { -namespace math { - -class CMatrix { - std::shared_ptr mat; -}; - -} // namespace math -} // namespace paddle -``` - -### 不使用SWIG这种代码生成器,而是手写多语言绑定 - -* [SWIG](http://www.swig.org/)是一个多语言接口的代码生成器。他的目标是使用C/C++写代码,SWIG直接读取C/C++的头文件,生成各种语言的绑定代码。 - * 对于多语言接口,SWIG需要写一个interface文件。这个文件具有独特的语法,学习成本高。且增加一个第三方语言,就需要对这个第三方语言增加一些定义。有的时候,interface文件的写法非常[tricky](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/api/Paddle.swig#L36)。社区贡献代码学习成本高。 - * SWIG暴露的接口保留了C++的接口样式,很难保证多语言代码风格的一致性。(函数命名,错误处理) - * 因为SWIG在第三方语言中暴露的函数名,类名和C++中完全一致。C++的命名风格并不能适应其他第三方语言。如果使用SWIG我们需要将在interface文件里,将大量的`SomeCppClass`重命名成`some_python_class`,或者`SomeGoTypes`。 - * 对于不同语言,错误处理的方式也不尽相同。例如对于Java或者Python,最常见的错误处理方式是Exception,而对于Golang,错误处理方式是返回值。而SWIG只能简单的暴露C++接口,无法做到对于各种语言错误处理方式的适配。 - * 对于大多数语言,直接使用C语言的.h并不困难。例如Python的[cffi](https://cffi.readthedocs.io/en/latest/overview.html#simple-example-abi-level-in-line)或者[Cython](http://cython.org/), golang的[cgo](https://golang.org/cmd/cgo/)。 - * SWIG支持的语言或者解释器有局限。例如对于Python,使用SWIG只支持CPython解释器,而不支持PyPy解释器。 - - -## 原因列表 - -| 结论 | 对比 | 原因 | -|---| --- | --- | -| 使用动态库 | 不使用静态库 | 解释型语言只能调用动态库,Paddle静态库链接复杂 | -| 不嵌入其他语言解释器 | 不嵌入Python解释器 | Paddle C++目前嵌入Python解释器,会导致不同版本Python在一个进程里的bug | -| 不引用其他动态库 | | Paddle一个动态库可以在任何Linux系统上运行 | -| 使用C99做接口 | 不使用C++做接口 | C有标准的ABI,C99是目前C最广泛的使用标准,且C99支持bool类型和定长整数(uint64_t等)类型 | -| 使用void*作为类句柄 | 不显示的写每个类具体包含什么| 实现简单,并且让接口脱离实现细节 | -| 手写多语言绑定 | 不使用SWIG | 使用SWIG需要多语言绑定的开发人员熟练掌握SWIG配置,社区参与困难。SWIG生成的代码不能保证多语言代码风格的一致性 | - - -## 实现 - -参考[Inference implementation](01.inference_implementation.md) diff --git a/develop/doc/_sources/design/multi_language_interface/01.inference_implementation.md.txt b/develop/doc/_sources/design/multi_language_interface/01.inference_implementation.md.txt deleted file mode 100644 index 9820284523246a062581f322616d196f575c9d29..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/multi_language_interface/01.inference_implementation.md.txt +++ /dev/null @@ -1,131 +0,0 @@ -# C-API 模型推断实现文档 - -本文档描述Paddle C-API的实现细节。Paddle C-API是多语言API的基础部分。Paddle需要暴露的API很多。先实现模型推断的API,通过模型推断API的实现作为一个样例,来进行讨论。至于为什么需要C-API,请参考[Why Plain C](./00.why_plain_c.md)。 - -## Table of Contents - * [C-API 模型推断实现文档](#c-api-模型推断实现文档) - * [暴露接口原则](#暴露接口原则) - * [目录结构](#目录结构) - * [实现方式](#实现方式) - * [capi.h](#capih) - * [具体某种类型的头文件](#具体某种类型的头文件) - * [capi_private.h](#capi_privateh) - * [具体某种类型的实现文件](#具体某种类型的实现文件) - * [libpaddle_capi_shared.{so, dylib}](#libpaddle_capi_sharedso-dylib) - * [libpaddle_capi_whole.a](#libpaddle_capi_wholea) - * [examples](#examples) - * [编译选项](#编译选项) - - -## 暴露接口原则 - -1. 所有的接口均为C接口。即使用`extern "C"` -2. 除构造某种类型的函数(`paddle_matrix_create`等),其他函数均返回`paddle_error`。且调用时不能抛出异常或出现运行时错误。 -3. 所有类型名为`paddle_类型名`,所有与类型相关的函数,函数名为`paddle_类型名_函数名` -4. 如果某一个Paddle Core概念(GradientMachine/Matrix)需要被暴露到其他语言,那么 - * 为了暴露的接口尽量简单。只暴露概念的接口,而不暴露概念的实现。即暴露`GradientMachine`或者`Matrix`但不暴露`RecurrentGradientMachine`和`CpuSparseMatrix`。 - * 暴露这个概念必要函数。`必要`是指,即完成某一个任务的最少函数。 -5. 不在`capi`接口层做过多封装。 - * 如果某一个Paddle概念必须要暴露,但是又过于琐碎。不在`capi`这一层进行封装,而是直接修改Paddle Core。让Paddle核心中,这一概念不再琐碎。 - - -## 目录结构 - -```text -Paddle - `-- paddle - `-- capi - `-- examples # The example project for C-API. - `-- tests # unittests for C-API - `-- capi.h # C-API header file. - `-- capi_private.h # The shared header file between implementation sources. - `-- matrix.{h, cpp} - `-- gradient_machine.{h, cpp} - `-- ... -``` - - -Paddle的C-API目录结构如上图表所示。这个目录中除了`capi_private.h`之外的所有头文件,均会被安装到include/paddle路径下。C-API生成的二进制文件会被安装到`lib`目录下。即,安装后的目录结构为 - -```text -`-- include - `-- paddle - `-- capi.h - `-- matrix.h - `-- gradient_machine.h - `-- ... -`-- lib - `-- libpaddle_capi_shared.{so, dylib} # In mac, dynamic libary's file name extention is `dylib` - `-- libpaddle_capi_whole.a # static library for all symbols of Paddle. -``` - -## 实现方式 - -下面分别介绍某一类文件的实现方式。 - -### capi.h - -`capi.h`是用户使用C-API时所唯一需要引入的头文件。在`capi.h`中,引入了类型的头文件,`matrix.h`, `gradient_machine.h`。在引入其他类型的头文件时,使用相对路径的引用方式。即`#include "matrix.h"` - -### 具体某种类型的头文件 - -具体某种类型的头文件,即例如`matrix.h`,`gradient_machine.h`等。在这些头文件中,包含了某种类型的类型定义和暴露的全部函数。 - -这个头文件不假设其他文件的引用顺序,即使用户直接引用某种类型的头文件,也不应该报错(虽然不鼓励这样)。如果某一个类型需要引用另一个类型,例如`gradient_machine`需要引用`matrix`,则直接引入另一种类型的头文件,即`#include "matrix.h"`。 - -### capi_private.h - -`capi_prviate.h`是各个实现中共享的头文件,他主要包含了实际暴露的类型结构。在用户使用C-API时,Paddle的类型全部退化成`void *`,即`typedef paddle_matrix void*`。但,对于每种C-API暴露的类型,均是在`capi_private.h`中实现的结构体。 - -```cpp -struct CMatrix { - int type = MatrixType; - std::shared_ptr mat; -}; -``` - -通常,这个结构体包含两个项目。 - -* `type`是一个类型的标志。对于每种类型,type字段均不尽相同。这样,即使C-API接受的类型全是`void *`,我们也可以确定每一个参数的类型。 - - ```cpp - void some_c_api_function(void* some_instance) { - int* type = (int *) some_instance; - switch (*type) { - case MatrixType: - CMatrix* mat = (CMatrix *) some_instance; - ... - ... - } - } - ``` -* 这个结构体中的另一个项目是,Paddle Core中这一类型接口的智能指针(shared_ptr)。 - * 使用智能指针的原因是: 用户可以安全的释放某个C-API的实例,而不必在意Paddle Core是否还在使用这个实例。 - * 例如,用户通过C-API获得了神经网络的参数实例。当用户使用完这个参数后,直接删除这个参数即可。即便Paddle Core中的模型还在使用这个参数,这个参数也不会一并删除。 - -### 具体某种类型的实现文件 - -具体某种类型的实现文件,即`matrix.cpp`, `gradient_machine.cpp`等文件。在这些文件中,使用C++ 11实现了C-API的接口,并且使用`extern "C"`导出这些接口。在实现过程中,对输入参数的安全性进行了必要的判断,并将C-API接口的参数转发给`Paddle Core`。 - -### libpaddle\_capi_shared.{so, dylib} - -`libpaddle_capi_shared`是C-API导出的动态库。这个动态库的连接参数与Paddle的其他二进制(例如`paddle_trainer`)类似。用户可以直接使用这个动态库来引入Paddle C-API。具体使用方法为`-lpaddle_capi_shared`。 - -### libpaddle\_capi_whole.a - -`libpaddle_capi_whole`是C-API导出的静态库。这个静态库包含了Paddle的全部符号。他是将`libpaddle_gserver.a`, `libpaddle_math.a`, `libpaddle_capi.a`等全部静态库中的目标文件全部打包后产生的文件。具体使用方法为`--whole-archive -lpaddle_capi_whole --no-whole-archive`。 - - -### examples - -在样例中,使用`C99`开发了模型预测的样例代码。具体请参考[example/README.md](../../../paddle/capi/examples/README.md)。 - -## 编译选项 - -C-API的编译选项默认关闭,打开这个编译选项,需要在cmake的时候,设置 - -```bash -cmake ${YOUR_SOURCE_ROOT} -DWITH_C_API=ON -DWITH_PYTHON=OFF -DWITH_SWIG_PY=OFF -``` - -编译C-API的时候推荐Paddle不嵌入Python解释器,也不生成`SWIG`接口,具体原因参考[Why Plain C](./00.why_plain_c.md)。 diff --git a/develop/doc/_sources/design/operator_kernel_type.md.txt b/develop/doc/_sources/design/operator_kernel_type.md.txt deleted file mode 100644 index f86e6b7a564ed23f2bddbec25da1c110014f941d..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/operator_kernel_type.md.txt +++ /dev/null @@ -1,91 +0,0 @@ -# Design Doc: The Keys of Operator Kernel Type -## Problem -An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses `OpKernelType` as a key to identify a unique kernel. Before an operator runs, a certain type of kernel must be chosen via a key of `OpKernelType`. Currently, `OpKernelType` is defined as follows: - -```cpp -struct OpKernelType { - platform::Place place_; - proto::DataType data_type_; -}; -``` -For more details, please refer to [codes](https://github.com/PaddlePaddle/Paddle/blob/2d5ec16bc8a09fb8e0f62c89b116b0cd1d333907/paddle/framework/operator.h#L348-L374) in github. - -It contains two keys, `Place` and `DataType`. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys do not provide enough information. We need a more complete representation of `OpKernelType`. - -We often implement a kernel of an operator with some computing library on certain device(place). Please note that computing library and device do not have a one-to-one correspondence. A device can have a lot of computing libraries and a computing library can also support different devices. - -For example, Eigen library supports Nvidia GPU/AMD GPU/CPU and MKLDNN library supports Intel CPU/Intel FPGA. Both `Place` and `Library` should be a key of `OpKernelType`. - -Different DataTypes, such as fp64/fp32/int8, will obviously have different kernels. But different data layout of a Tensor will also lead to different implementations. Please refer to the batch norm operator [kernels](https://github.com/PaddlePaddle/Paddle/blob/a948fac4d0ad7e0412d373b8aabeb711c2899563/paddle/operators/batch_norm_op.cc#L180-L209) as an example. Data layout should also be taken into consideration. - -## Solution - -There are four keys to determine a kernel type of an operator: `Place`/`Library`/`DataType`/`Layout`. - -```cpp -struct OpKernelType { - platform::Place place_; - platform::Library library_; - proto::DataType data_type_; - framework::Layout layout_; -}; -``` - -The details are as follows: - -### Place - -`Place` is defined as: - -```cpp -typedef boost::variant Place; -``` - -`Place` represents the device memory where data is located. - - -### Library - -One operator kernel is usually implemented based on one library. `Library` is defined as a enum variable: - -```cpp -enum Library { Plain, MKLDNN, CUDNN }; -``` - -We use `Plain` enumerator to represent default library. Since most operators in Fluid are implemented based on the `Eigen` library, we take `Eigen` library as the `Plain` enumerator. -A library usually has a corresponding `DeviceContext` which contains some handles needed for computation. Fluid now has two default DeviceContexts for CPU and CUDA, namely, `CPUDeviceContext` and `CUDADeviceContext`. `CPUDeviceContext` contains an Eigen library handle and `CDUADeviceContext` contains an Eigen library handle and a cuBLAS handle. - -If we want to support new library, a new enumerator need to be added to `Library` and a corresponding new `LibraryDeviceContext` need to be created. - - -### DataType - - -`DataType` is defined in [framework.proto](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto). Currently, int32/int64/fp32/fp64 are supported. - -### Layout - -Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout. - -Different layout leads to different implementation of the operator kernel. There are mainly 4 principles we have to follow to support layout in our Fluid framework. - -- We take layout as a data member of Tensor. Layout is actually a enum variable. If Fluid is built with MKLDNN, then the memory format in MKLDNN will also be added into this enum variable. - -- Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout for generating data. Of course, we can have some default layout, like NCHW. - -- The inference of Layout is at run-time, not at compile-time. - -- Every operator has to implement different kernels for different layouts. Let's take MKLDNN as an example. If we want to implement an MKLDNN convolution operator, we have to implement all the kernels for different layouts, which are listed [here](http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html). And we will have a special macro to register kernels for MKLDNN operators. - -`Layout` is also defined as a enum variable: - -```cpp -enum Layout { - kNCHW, - kNHWC, -#ifdef PADDLE_WITH_MKLDNN - knChw8c - ... -#endif -}; -``` diff --git a/develop/doc/_sources/design/ops/rnn.md.txt b/develop/doc/_sources/design/ops/rnn.md.txt deleted file mode 100644 index 2f4854793fa1f0b02e4dc17b51a48a972be61c06..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/ops/rnn.md.txt +++ /dev/null @@ -1,153 +0,0 @@ -# RNNOp design - -This document describes the RNN (Recurrent Neural Network) operator and how it is implemented in PaddlePaddle. The RNN op requires that all instances in a mini-batch have the same length. We will have a more flexible dynamic RNN operator in the future. - -## RNN Algorithm Implementation - -

- -

- -The above diagram shows an RNN unrolled into a full network. - -There are several important concepts here: - -- *step-net*: the sub-graph that runs at each step. -- *memory*, $h_t$, the state of the current step. -- *ex-memory*, $h_{t-1}$, the state of the previous step. -- *initial memory value*, the memory of the first (initial) step. - -### Step-scope - -There could be local variables defined in each step-net. PaddlePaddle runtime realizes these variables in *step-scopes* which are created for each step. - -

-
-Figure 2 illustrates the RNN's data flow -

- -Please be aware that every step runs the same step-net. Each step does the following: - -1. Creates the step-scope. -2. Initializes the local variables including step-outputs, in the step-scope. -3. Runs the step-net, which uses the above mentioned variables. - -The RNN operator will compose its output from step outputs in each of the step scopes. - -### Memory and Ex-memory - -Let's give more details about memory and ex-memory using a simple example: - -$$ -h_t = U h_{t-1} + W x_t -$$, - -where $h_t$ and $h_{t-1}$ are the memory and ex-memory (previous memory) of step $t$ respectively. - -In the implementation, we can make an ex-memory variable either "refer to" the memory variable of the previous step, -or copy the memory value of the previous step to the current ex-memory variable. - -### Usage in Python - -For more information on Block, please refer to the [design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md). - -We can define an RNN's step-net using a Block: - -```python -import paddle as pd - -X = some_op() # x is some operator's output and is a LoDTensor -a = some_op() - -# declare parameters -W = pd.Variable(shape=[20, 30]) -U = pd.Variable(shape=[20, 30]) - -rnn = pd.create_rnn_op(output_num=1) -with rnn.stepnet(): - x = rnn.add_input(X) - # declare a memory (rnn's step) - h = rnn.add_memory(init=a) - # h.pre_state(), the previous memory of rnn - new_state = pd.add_two( pd.matmul(W, x) + pd.matmul(U, h.pre_state())) - # update current memory - h.update(new_state) - # indicate that h variables in all step scopes should be merged - rnn.add_outputs(h) - -out = rnn() -``` - -Python API functions in above example: - -- `rnn.add_input`: indicates that the parameter is a variable that will be segmented into step-inputs. -- `rnn.add_memory`: creates a variable used as the memory. -- `rnn.add_outputs`: marks the variables that will be concatenated across steps into the RNN output. - -### Nested RNN and LoDTensor - -An RNN whose step-net includes other RNN operators is known as an *nested RNN*. - -For example, we could have a 2-level RNN, where the top level corresponds to paragraphs, and the lower level corresponds to sentences. Each step of the higher level RNN also receives an input from the corresponding step of the lower level, and additionally the output from the previous time step at the same level. - -The following figure illustrates feeding in text into the lower level, one sentence at a step, and the feeding in step outputs to the top level. The final top level output is about the whole text. - -

- -

- -```python -import paddle as pd - -W = pd.Variable(shape=[20, 30]) -U = pd.Variable(shape=[20, 30]) - -W0 = pd.Variable(shape=[20, 30]) -U0 = pd.Variable(shape=[20, 30]) - -# a is output of some op -a = some_op() - -# chapter_data is a set of 128-dim word vectors -# the first level of LoD is sentence -# the second level of LoD is a chapter -chapter_data = pd.Variable(shape=[None, 128], type=pd.lod_tensor, level=2) - -def lower_level_rnn(paragraph): - ''' - x: the input - ''' - rnn = pd.create_rnn_op(output_num=1) - with rnn.stepnet(): - sentence = rnn.add_input(paragraph, level=0) - h = rnn.add_memory(shape=[20, 30]) - h.update( - pd.matmul(W, sentence) + pd.matmul(U, h.pre_state())) - # get the last state as sentence's info - rnn.add_outputs(h) - return rnn - -top_level_rnn = pd.create_rnn_op(output_num=1) -with top_level_rnn.stepnet(): - paragraph_data = rnn.add_input(chapter_data, level=1) - low_rnn = lower_level_rnn(paragraph_data) - paragraph_out = low_rnn() - - h = rnn.add_memory(init=a) - h.update( - pd.matmul(W0, paragraph_data) + pd.matmul(U0, h.pre_state())) - top_level_rnn.add_outputs(h) - -# output the last step -chapter_out = top_level_rnn(output_all_steps=False) -``` - -In the above example, the construction of the `top_level_rnn` calls `lower_level_rnn`. The input is an LoD Tensor. The top level RNN segments input text data into paragraphs, and the lower level RNN segments each paragraph into sentences. - -By default, the `RNNOp` will concatenate the outputs from all the time steps. -If the `output_all_steps` is set to False, it will only output the final time step. - - -

- -

diff --git a/develop/doc/_sources/design/ops/sequence_decoder.md.txt b/develop/doc/_sources/design/ops/sequence_decoder.md.txt deleted file mode 100644 index c4a9bbeeefca0e05c335dd60233691e8bac33015..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/ops/sequence_decoder.md.txt +++ /dev/null @@ -1,229 +0,0 @@ -# Design: Sequence Decoder Generating LoDTensors -In tasks such as machine translation and visual captioning, -a [sequence decoder](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md) is necessary to generate sequences, one word at a time. - -This documentation describes how to implement the sequence decoder as an operator. - -## Beam Search based Decoder -The [beam search algorithm](https://en.wikipedia.org/wiki/Beam_search) is necessary when generating sequences. It is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set. - -In the old version of PaddlePaddle, the C++ class `RecurrentGradientMachine` implements the general sequence decoder based on beam search, due to the complexity involved, the implementation relies on a lot of special data structures that are quite trivial and hard to be customized by users. - -There are a lot of heuristic tricks in the sequence generation tasks, so the flexibility of sequence decoder is very important to users. - -During the refactoring of PaddlePaddle, some new concepts are proposed such as: [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) and [TensorArray](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md) that can better support the sequence usage, and they can also help make the implementation of beam search based sequence decoder **more transparent and modular** . - -For example, the RNN states, candidates IDs and probabilities of beam search can be represented all as `LoDTensors`; -the selected candidate's IDs in each time step can be stored in a `TensorArray`, and `Packed` to the sentences translated. - -## Changing LoD's absolute offset to relative offsets -The current `LoDTensor` is designed to store levels of variable-length sequences. It stores several arrays of integers where each represents a level. - -The integers in each level represent the begin and end (not inclusive) offset of a sequence **in the underlying tensor**, -let's call this format the **absolute-offset LoD** for clarity. - -The absolute-offset LoD can retrieve any sequence very quickly but fails to represent empty sequences, for example, a two-level LoD is as follows -```python -[[0, 3, 9] - [0, 2, 3, 3, 3, 9]] -``` -The first level tells that there are two sequences: -- the first's offset is `[0, 3)` -- the second's offset is `[3, 9)` - -while on the second level, there are several empty sequences that both begin and end at `3`. -It is impossible to tell how many empty second-level sequences exist in the first-level sequences. - -There are many scenarios that rely on empty sequence representation, for example in machine translation or visual captioning, one instance has no translation or the empty candidate set for a prefix. - -So let's introduce another format of LoD, -it stores **the offsets of the lower level sequences** and is called **relative-offset** LoD. - -For example, to represent the same sequences of the above data - -```python -[[0, 3, 6] - [0, 2, 3, 3, 3, 9]] -``` - -the first level represents that there are two sequences, -their offsets in the second-level LoD is `[0, 3)` and `[3, 5)`. - -The second level is the same with the relative offset example because the lower level is a tensor. -It is easy to find out the second sequence in the first-level LoD has two empty sequences. - -The following examples are based on relative-offset LoD. - -## Usage in a simple machine translation model -Let's start from a simple machine translation model that is simplified from the [machine translation chapter](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation) to draw a blueprint of what a sequence decoder can do and how to use it. - -The model has an encoder that learns the semantic vector from a sequence, and a decoder which uses the sequence encoder to generate new sentences. - -**Encoder** -```python -import paddle as pd - -dict_size = 8000 -source_dict_size = dict_size -target_dict_size = dict_size -word_vector_dim = 128 -encoder_dim = 128 -decoder_dim = 128 -beam_size = 5 -max_length = 120 - -# encoder -src_word_id = pd.data( - name='source_language_word', - type=pd.data.integer_value_sequence(source_dict_dim)) -src_embedding = pd.embedding(size=source_dict_size, size=word_vector_dim) - -src_word_vec = pd.lookup(src_embedding, src_word_id) - -encoder_out_seq = pd.gru(input=src_word_vec, size=encoder_dim) - -encoder_ctx = pd.last_seq(encoder_out_seq) -# encoder_ctx_proj is the learned semantic vector -encoder_ctx_proj = pd.fc( - encoder_ctx, size=decoder_dim, act=pd.activation.Tanh(), bias=None) -``` - -**Decoder** - -```python -def generate(): - decoder = pd.while_loop() - with decoder.step(): - decoder_mem = decoder.memory(init=encoder_ctx) # mark the memory - generated_ids = decoder.memory() # TODO init to batch_size s - generated_scores = decoder.memory() # TODO init to batch_size 1s or 0s - - target_word = pd.lookup(trg_embedding, gendrated_ids) - # expand encoder_ctx's batch to fit target_word's lod - # for example - # decoder_mem.lod is - # [[0 1 3], - # [0 1 3 6]] - # its tensor content is [a1 a2 a3 a4 a5] - # which means there are 2 sentences to translate - # - the first sentence has 1 translation prefixes, the offsets are [0, 1) - # - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6) - # the target_word.lod is - # [[0, 1, 6] - # [0, 2, 4, 7, 9 12]] - # which means 2 sentences to translate, each has 1 and 5 prefixes - # the first prefix has 2 candidates - # the following has 2, 3, 2, 3 candidates - # the encoder_ctx_expanded's content will be - # [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5] - encoder_ctx_expanded = pd.lod_expand(encoder_ctx, target_word) - decoder_input = pd.fc( - act=pd.activation.Linear(), - input=[target_word, encoder_ctx_expanded], - size=3 * decoder_dim) - gru_out, cur_mem = pd.gru_step( - decoder_input, mem=decoder_mem, size=decoder_dim) - scores = pd.fc( - gru_out, - size=trg_dic_size, - bias=None, - act=pd.activation.Softmax()) - # K is an config - topk_scores, topk_ids = pd.top_k(scores, K) - topk_generated_scores = pd.add_scalar(topk_scores, generated_scores) - - selected_ids, selected_generation_scores = decoder.beam_search( - topk_ids, topk_generated_scores) - - # update the states - decoder_mem.update(cur_mem) # tells how to update state - generated_ids.update(selected_ids) - generated_scores.update(selected_generation_scores) - - decoder.output(selected_ids) - decoder.output(selected_generation_scores) - -translation_ids, translation_scores = decoder() -``` -The `decoder.beam_search` is an operator that, given the candidates and the scores of translations including the candidates, -returns the result of the beam search algorithm. - -In this way, users can customize anything on the input or output of beam search, for example: - -1. Make the corresponding elements in `topk_generated_scores` zero or some small values, beam_search will discard this candidate. -2. Remove some specific candidate in `selected_ids`. -3. Get the final `translation_ids`, remove the translation sequence in it. - -The implementation of sequence decoder can reuse the C++ class: [RNNAlgorithm](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/paddle/operators/dynamic_recurrent_op.h#L30), -so the python syntax is quite similar to that of an [RNN](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop). - -Both of them are two-level `LoDTensors`: - -- The first level represents `batch_size` of (source) sentences. -- The second level represents the candidate ID sets for translation prefix. - -For example, 3 source sentences to translate, and has 2, 3, 1 candidates. - -Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape, and an `lod_expand` operator is used to expand the LoD of the previous state to fit the current state. - -For example, the previous state: - -* LoD is `[0, 1, 3][0, 2, 5, 6]` -* content of tensor is `a1 a2 b1 b2 b3 c1` - -the current state is stored in `encoder_ctx_expanded`: - -* LoD is `[0, 2, 7][0 3 5 8 9 11 11]` -* the content is - - a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates) - - a2 a2 - - b1 b1 b1 - - b2 - - b3 b3 - - None (c1 has 0 candidates, so c1 is dropped) - -The benefit from the relative offset LoD is that the empty candidate set can be represented naturally. - -The status in each time step can be stored in `TensorArray`, and `Pack`ed to a final LoDTensor. The corresponding syntax is: - -```python -decoder.output(selected_ids) -decoder.output(selected_generation_scores) -``` - -The `selected_ids` are the candidate ids for the prefixes, and will be `Packed` by `TensorArray` to a two-level `LoDTensor`, where the first level represents the source sequences and the second level represents generated sequences. - -Packing the `selected_scores` will get a `LoDTensor` that stores scores of each translation candidate. - -Packing the `selected_generation_scores` will get a `LoDTensor`, and each tail is the probability of the translation. - -## LoD and shape changes during decoding -

- -

- -According to the image above, the only phase that changes the LoD is beam search. - -## Beam search design -The beam search algorithm will be implemented as one method of the sequence decoder and has 3 inputs: - -1. `topk_ids`, the top K candidate ids for each prefix. -2. `topk_scores`, the corresponding scores for `topk_ids` -3. `generated_scores`, the score of the prefixes. - -All of these are LoDTensors, so that the sequence affiliation is clear. Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix. - -It will return three variables: - -1. `selected_ids`, the final candidate beam search function selected for the next step. -2. `selected_scores`, the scores for the candidates. -3. `generated_scores`, the updated scores for each prefix (with the new candidates appended). - -## Introducing the LoD-based `Pack` and `Unpack` methods in `TensorArray` -The `selected_ids`, `selected_scores` and `generated_scores` are LoDTensors that exist at each time step, -so it is natural to store them in arrays. - -Currently, PaddlePaddle has a module called `TensorArray` which can store an array of tensors. It is better to store the results of beam search in a `TensorArray`. - -The `Pack` and `UnPack` in `TensorArray` are used to pack tensors in the array to an `LoDTensor` or split the `LoDTensor` to an array of tensors. -It needs some extensions to support the packing or unpacking an array of `LoDTensors`. diff --git a/develop/doc/_sources/design/optimizer.md.txt b/develop/doc/_sources/design/optimizer.md.txt deleted file mode 100644 index 691081c268b848811bf5ee6d6a41edfe0f47eec0..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/optimizer.md.txt +++ /dev/null @@ -1,91 +0,0 @@ -## Optimizer Design - -### The Problem - -A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works: - -1. the forward pass, which computes intermediate results and the cost(s), -1. the backward pass, which derives gradients from intermediate results and costs, and -1. the optimization pass, which update model parameters to optimize the cost(s). - -These works rely on three kinds of operators: - -1. forward operators, -1. gradient operators, and -1. optimization operators. - -It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically. - -In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass. - - -### High-level Python API to describe the training process - -1. User write code to describe the network: - - ```python - images = layer.data("images") - labels = layer.data("labels") - w1 = pd.var("w1") - b1 = pd.var("b1") - hidden = layer.fc(images, w=w1, b=b1) - cost = layer.mse(hidden, labels) - ``` - - The above code snippet will create forward operators in [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md). - - -2. Users create a certain kind of Optimizer with some argument. - - ```python - optimizer = AdagradOptimizer(learing_rate=0.001) - ``` - -3. Users use the optimizer to `minimize` a certain `cost` through updating parameters in parameter_list. - - ```python - opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1]) - ``` - The above code snippet will create gradient and optimization operators in Block. The return value of `minimize()` is list of optimization operators that will be run by session. - -4. Users use Session/Executor to run this opt_op_list as target to do training. - - ```python - sess.run(target= opt_op_list, ...) - ``` - -#### Optimizer Python interface: - -```python -class Optimizer(object): - """Optimizer Base class. - - """ - - def __init__(self): - pass - - def create_optimization_pass(self, parameters_and_grads): - """Add optimization operators to update gradients to variables. - - Args: - parameters_and_grads: a list of (variable, gradient) pair to update. - - Returns: - optmization_op_list: a list of optimization operator that will update parameter using gradient. - """ - return None - - def minimize(self, loss, parameter_list): - """Add operations to minimize `loss` by updating `parameter_list`. - - This method combines interface `append_backward()` and - `create_optimization_pass()` into one. - """ - params_grads = self.create_backward_pass(loss, parameter_list) - update_ops = self.create_optimization_pass(params_grads) - return update_ops - -``` - -Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer. diff --git a/develop/doc/_sources/design/paddle_nccl.md.txt b/develop/doc/_sources/design/paddle_nccl.md.txt deleted file mode 100644 index c7dac70998a6cfec3a6d2fc72b698ff9722e6805..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/paddle_nccl.md.txt +++ /dev/null @@ -1,65 +0,0 @@ -# Design Doc: NCCL support in Paddle Fluid - -## Abstract - -This Design Doc refers to the NCCL feature in paddle. We propose an approach to support NCCL library both on a single machine and multiple machines. We wrapper the NCCL primitives `Broadcast`, `Allreduce`, `Reduce` as operators to utilize Multi-GPU powers in one script. - - -## Motivation - -[NCCL](https://developer.nvidia.com/nccl) is a NVIDIA library support Multi-GPU communicating and optimized for NVIDIA GPUs, it provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, that can achieve high bandwidth over PCIe and NVLink high-speed interconnect. With NCCL library, we can easily accelerate the training in parallel. - -- Pros -1. easily plug-in with [NCCL2](https://developer.nvidia.com/nccl) library. -1. high performance in NVIDIA GPUs. -1. MPI like primitives, which have low learning cost for users. - -- Cons -1. Only design for NVIDIA GPUs, not a general multi-device solution. -1. Although NCCL1 is opensourced under BSD license, but NCCL2 is not opensourced anymore. - -At the beginning of training, the framework needs to distribute the same parameters to every GPU, and merge the gradients at any time user interests. - -As a result, during training, we need the operations of peer to peer copy between different GPUs, aggregating gradients/parameters from GPUs, and broadcasting parameters to GPUs. Every GPU only need to run the operator with correct place information. - -Besides, it needs interfaces to synchronize model update with each different GPU Cards. - -## Implementation - -As mentioned above, we wrap the NCCL routines as several kinds of operators. Need to note that NCCL need to create Communicator between gpu at the beginning, so there is a NCCLInit operator created. - -### Transpiler - -To be compatible with [parameter server design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/ops/dist_train.md), the transpiler compiles the user defined operation graph into sub-graphs to be executed on different devices. - -1. The user-defined model will be a single device program - -2. Broadcast/Reduce operators between GPUs will be inserted into the program, even for the multi-node, may insert the `Send`, `Recv` operator. - - *Broadcast, AllReduce in a single machine. And Broadcast, AllReduce, [Send, Recv](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/ops/dist_train.md#graph-converter) in multiple machines* - - - -After compiling, the graph as shows - - - -Operators are added to the sub-graphs. Every GPU assigned a role of `rank0`, `rank1` etc. - -- **Broadcast**. Broadcast operator distribute initialized parameter to all the GPUs from the GPU who owns it. e.g. from`rank0` GPU. -- **AllReduce**. AllReduce operator synchronizes parameters/gradients between GPUs. AllReduce implemented in the Ring-Based communicating method, avoid of the bottle neck in a single GPU. - -Need to notice that AllReduce operator force GPUs synchronized at that point. The whole training process in asynchronous or synchronous mode depends on the AllReduce point in the graph. - -As it shown in the picture, when each GPU compute the gradient of `W`, followed with a `AllReduce` operator, accumulate the `dW` to full batch of data, then run the optimize process individually and apply the gradient to its `W`. - -- **AllReduce** - Need to note that our AllReduce operator is a ring-base AllReduce implementation. If we use the NCCL2 AllReduce primitive, every GPU optimized full batch of data, wasted (n-1) GPU compute resources. In addition, NCCL2 built-in AllReduce will only utilize the communicating resource during synchronization, then update the gradient will be a subsequent phase. In fact, we can amortize the update gradient time cost into the communicating phase. The process is -1. Every parameter has its root card. That card will responsible for aggregating the gradients from GPUs. -2. The whole model's parameter will be hashed to different root card, ensure the load balance between GPUs. -3. Logically neighberhood card will start send parameter to the next one. After one round, the parameter main card will aggregate the full gradients. -4. Then the root card will optimize the parameter. -5. This parameter card will send its optimized result to its neighberhood, then the neighberhood will send parameter to its next one. -6. Finish the sychronization round. - -The total time cost will be 2 * (n-1) * per-parameter-send-time, we reach the goal of amortize the upgrade time into communicating phase. diff --git a/develop/doc/_sources/design/parallel_do.md.txt b/develop/doc/_sources/design/parallel_do.md.txt deleted file mode 100644 index 42bd136f825986d94fafaeaa5f58edb02848a74c..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/parallel_do.md.txt +++ /dev/null @@ -1,163 +0,0 @@ -# Design Doc: Parallel_Do in PaddlePaddle - -In PaddlePaddle, we use parallel_do primitive to represent multithread data parallel processing. - -## Design overview - -The definition of a parallel_do op looks like the following - -```c++ -AddInput(kInputs, "Inputs needed to be split onto different devices").AsDuplicable(); -AddInput(kParameters, "Parameters are duplicated over different devices") - .AsDuplicable(); -AddInput(kPlaces, "Devices used for parallel processing"); -AddOutput(kOutputs, "Outputs needed to be merged from different devices").AsDuplicable(); -AddOutput(kParallelScopes, - "Scopes for all local variables in forward pass. One scope for each device"); -AddAttr(kParallelBlock, - "List of operaters to be executed in parallel"); -``` - -A vanilla implementation of parallel_do can be shown as the following (`|` means single thread and -`||||` means multiple threads) - -``` -In the forward pass - | Split input onto different devices - | Copy parameter onto different devices - |||| Compute forward pass in parallel - | Merge output from different devices - -In the backward pass - | Split output@grad onto different devices - |||| Compute backward pass in parallel - | accumulate param@grad from different devices to the first device - | Merge input@grad from different devices -  | Copy param@grad to the place of parallel_do_op -``` - -This implementation allows to write mixed device program like this - -```python -W1 = fluid.tensor(size=[100,20], parameter=true) -W2 = fluid.tensor(size=[20,15], parameter=true) - -data = layers.data() - -gpu_places = layers.get_place(use_gpu=True) -# parallel processing on multiple GPUs -pd = ParallelDo(gpu_places) -with pd.do(input=data): - prediction = softmax(fc(fc(data, W1), W2)) - write_output(prediction) -prediction = pd() -loss = cross_entropy(prediction, label) -``` - -And the programDesc are like the following - -``` -# start_program will be run by executor(CPUPlace), all w1, w2 will be allocated on CPU -start_program -{ - vars: w1, w2 - ops: init(w1), init(w2) -} - -main_program -{ -block0 { - vars: data, places, w1, w2, w1_grad, w2_grad, - ops: data, get_place, parallel_do(block1), - parallel_do_grad(block2), - sgd(w2, w2_grad), - sgd(w1, w1_grad) -} -block1 { # the forward pass - parent_block: 0 - vars: data, h1, h2, loss - ops: fc, fc, softmax -} -block2 { # the backward pass - parent_block: 1 - vars: data_grad, h1_grad, h2_grad, loss_gard, local_w1_grad, local_w2_grad - ops: softmax_grad, - fc_grad - fc_grad -} -} -``` - -## Performance Imporvement - -There are serial places we can make this parallel_do faster. - -### forward: split input onto different devices - -If the input of the parallel_do is independent from any prior opeartors, we can avoid this step by -prefetching the input onto different devices in a seperate background thread. And the python code -looks like this. -```python -pd = ParallelDo(gpu_places) -with pd.do(): -    feature = get_data_from_prefetch_queue(gpu_places) - prediction = my_net(feature) - write_output(activation) -``` - -### forward: Copy parameter to onto different devices - -We can avoid this step by making each device have a copy of the parameter. This requires: - -1. `fluid.default_start_up_program()` to be run on all devices -1. In the backward, allreduce param@grad at different devices, this requires - 1. `backward.py` add `allreduce` operators at parallel_do_grad - 1. `allreduce` operators need to be called in async mode to achieve maximum throughput -1. apply gradients related op(i.e. cliping, normalization, decay, sgd) on different devices in parallel - -By doing so, we also avoided "backward: accumulate param@grad from different devices to the first device". -And the ProgramDesc looks like the following - -``` -# w1, w2 will be allocated on all GPUs -start_program -{ -block0 { - parallel_do(block1) -} -block1 { - parent_block: 0 - vars: w1, w2 - ops: init(w1), init(w2) -} -} - -main_program -{ -block0 { - vars: data, places, w1, w2 - ops: data, get_place, parallel_do(block1), - parallel_do_grad(block2), # append_backward - parallel_do(block3) # append_optimization - -} -block1 { - parent_block: 0 - vars: data, h1, h2, loss - ops: fc, fc, softmax -} -block2 { - parent_block: 1 - vars: data_grad, h1_grad, h2_grad, loss_gard, w1_grad, w2_grad - ops: softmax_grad, - fc_grad, allreduce(places, scopes, w1_grad), - fc_grad, allreduce(places, scopes, w2_grad) -} -block3 { - parent_block: 0 - vars: lr - ops: sgd(w2, w2_grad), - sgd(w1, w1_grad) -} -} -``` diff --git a/develop/doc/_sources/design/parameter_average.md.txt b/develop/doc/_sources/design/parameter_average.md.txt deleted file mode 100644 index 2c4edee9fe31d502ea62b9fe5c8757c0a4c5e79f..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/parameter_average.md.txt +++ /dev/null @@ -1,72 +0,0 @@ -# Averaging Parameter in PaddlePaddle - -## Why Averaging -In a large scale machine learning setup where the size of the training data is huge, it could take us a large number of iterations over the training data before we can achieve the optimal values of parameters of our model. Looking at the problem setup, it is desirable if we can obtain the optimal values of parameters by going through the data in as few passes as we can. - -Polyak and Juditsky (1992) showed that the test performance of simple average of parameters obtained by Stochastic Gradient Descent (SGD) is as good as that of parameter values that are obtained by training the model over and over again, over the training dataset. - -Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for
. The averaging is done as follows: - -
- -We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above. - -### How to perform Parameter Averaging in PaddlePaddle - -Parameter Averaging in PaddlePaddle works in the following way during training : -1. It will take in an instance of a normal optimizer as an input, e.g. RMSPropOptimizer -2. The optimizer itself is responsible for updating the parameters. -3. The ParameterAverageOptimizer maintains a separate copy of the parameters for itself: - 1. In concept, the values of this copy are the average of the values of the parameters in the most recent N batches. - 2. However, saving all the N instances of the parameters in memory is not feasible. - 3. Therefore, an approximation algorithm is used. - -Hence, overall we have have two copies of the parameters: one for the optimizer itself, and one for the ParameterAverageOptimizer. The former should be used in back propagation, while the latter should be used during testing and should be saved. - -During the testing/ saving the model phase, we perform the following steps: -1. Perform the delayed operations. -2. Save current values of the parameters to a temporary variable. -3. Replace the values of the parameters with the averaged values. -4. Perform testing and/or save the parameters. -5. Restore the values of the parameters once done. - -### How to implement Averaging of Parameter in PaddlePaddle - -We can add the ParameterAverageOptimizer op to the graph through Python API. Using this approach, we manually add this op to the graph and direct the output of the optimizer op to this op during training. - - **Advantages**: - - Allows for greater flexibility to the users of PaddlePaddle. Using this approach, the users can plug different optimizers into ParameterAverageOptimizer by passing in the optimizer to the op. - - Makes it easy for the users to customize and extend the framework. - - **Disadvantages**: - - Implementation requires re-writing the averaging methodology in Python. - -### Low-Level implementation - -In the new design, we propose to create a new operation for averaging parameter updates (ParameterAverageOptimizer). For now, we can add an op that takes in the following as input: -- the optimizer -- the window_size to keep the updates - -The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for [Operators](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/rmsprop_op.h). We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU. - -The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) in Python API. - -### Python API implementation for ParameterAverageOptimizer - -Based on Polyak and Juditsky (1992), we can generalize the averaging of updates to any optimizer. The input to the op would be the following: -- Any optimizer (RMSProp , AdaGrad etc.) -- A window size. The op keeps accumulating updated parameter values over a window of N batches and takes an average. Move the averaged value to a buffer when window is full to avoid loss of precision. - -Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions. -We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other [Optimizers](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/rmsprop_op.cc) - -#### Creation of the ParameterAverageOptimizer operator -There are two ways for creating the ParameterAverageOptimizer op: -1. We create the op immediately while building the computation graph. -2. We add the op in a lazy manner, just before the backward pass, similar to the way the optimization ops are added. - -The proposal is to add the op immediately while building the computation graph. - -#### High-level API - -In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions. diff --git a/develop/doc/_sources/design/parameters_in_cpp.md.txt b/develop/doc/_sources/design/parameters_in_cpp.md.txt deleted file mode 100644 index a7ac3f17c44ca94a669a8f1e283b291bceb42317..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/parameters_in_cpp.md.txt +++ /dev/null @@ -1,41 +0,0 @@ -# Design Doc: The C++ Class `Parameters` - -`Parameters` is a concept we designed in PaddlePaddle V2 API. `Parameters` is a container of parameters, which makes PaddlePaddle capable of sharing parameter between topologies. We described usages of `Parameter` in [api.md](./api.md). - -We used Python to implement Parameters when designing V2 API before. There are several defects for the current implementation: -* We just use `memcpy` to share Parameters between topologies, but this is very inefficient. -* We did not support sharing Parameters while training. We just trigger `memcpy` when start training. - -It is necessary that we implement Parameters in CPP side. However, it could result a code refactoring for PaddlePaddle, because PaddlePaddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current PaddlePaddle implementation, there are three concepts associated with `Parameters`: - -1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`. -It is evident that we should use `paddle::Parameter` when developing `Parameters`. -However, the `Parameter` class contains many functions and does not have a clear interface. -It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`. -When we developing `Parameters`, we only use `create/store Parameter` functionality. -We should extract functionalities of Parameter into many classes to clean PaddlePaddle CPP implementation. - -2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`. -We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies. -Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs. -`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device. - -3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. -So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD). - - -The step by step approach for implementation Parameters in PaddlePaddle C++ core is listed below. Each step should be a PR and could be merged into PaddlePaddle one by one. - -1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters. - -2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member. - -3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies. -Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs. -`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`. - * We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs. - * The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler. - -4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies. - -5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear. diff --git a/develop/doc/_sources/design/profiler.md.txt b/develop/doc/_sources/design/profiler.md.txt deleted file mode 100644 index b20b5efdc1f1f10ce7cec835adcc6fb374ed4e20..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/profiler.md.txt +++ /dev/null @@ -1,97 +0,0 @@ -## Introduction - -There are many performance analysis tools for [different programming languages and different software frameworks](https://en.wikipedia.org/wiki/List_of_performance_analysis_tools). For most popular deep learning frameworks, they use several programming languages and adapt to heterogeneous platforms. Similar to most of the deep learning frameworks, PaddlePaddle also uses C++, CUDA and Python as the basic programming languages to adapt to run on CPU and GPU devices. The [`nvprof` tools](http://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview) is usually used to analyse the CUDA program. We have [a document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/optimization/cpu_profiling.md) to profile CPU and Python program by [yep](https://pypi.python.org/pypi/yep) and [Google's perftools](https://github.com/google/pprof) to profile only the CPU and Python program. But for [PaddlePaddle fluid](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/fluid.md), the operator is the basic computing unit. The developers usually want to collect the time of each operator and locate bottlenecks. The `nvprof` usually collect the timeline of CUDA-related activities on both CPU and GPU, including kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels. And the `yep` and `Google's perftools` can't collect the timeline for CUDA program. All these tools can't collect time in the operator level. So we design this profiling tool. - -## Architecture - -The work flow for most task is as follows. Each operator will run many times in the all iterations. So the profiler must collect the total time of each operator during the iteration. For more, sometimes, the developers may want to collect more detailed time span inside the operator or record time span for elsewhere, this requires that the profiler must support to record the nested time span. And in order to speedup training, all the deep learning frameworks support parallel computing, including multiple threads on CPU and multiple GPUs. So the profiler must be able to collect the timeline for each thread. In addition, the profiler also occupies certain resources. It must can be easily to be enabled or disabled by the developers. At last, the profiler should present a human-readable report. - -```python -for i in xrange(M): # M is the iteration number - for op in operator_lists: # The `operator_lists` contains all the operators in the network. - op.run(); -``` - -In summary, the proflier should have following features: - -- records time span in loop. -- supports nested time span. -- supports multiple threads/multiple GPUs. -- supports to be enabled and disabled by users. - -But how to record the time for the mixed C++ and CUDA program? There many C++ APIs to get the current calendar time in host program. But for GPU, the CUDA kernels may be executed concurrently if they are in different [streams](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#streams) and the CUDA kernels is asynchronous with the host program if there is no the synchronous aftern the CUDA kernels. CUDA provides [event](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#events) to monitor the device and perform accurate timing. Inspired by PyTorch and CUDA event, we also design and apply the events to record the timeline. Then summarize and present statistics based on these events. - -The overall flow is shown as the following figure. - -
- -### Event - -In above work flow, a pair of events are needed before and after the piece of code to collect time. So the event has a flag to mark whether it is a starting event or an ending event. Except this two kinds of event, sometime, a only marker with a text message is needed, for example, a marker to specify the profiling start or end. There are three kinds of event: - -```c++ -enum EventKind { - kMark, - kPushRange, - kPopRange}; -``` -- kMark: only a marker without time range. -- kPushRange: mark the starting event for time range. -- kPopRange: mark the ending event for time range. - -For the CPU code, the events only need to record the current time. For the CUDA code, the [event management functions of CUDA](http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT) are used. For many pieces of code, an event lists are used to record each piece. - -```c++ -class Event { - public: - // The DeviceContext is used to get current CUDA stream. - Event(EventKind kind, std::string name, uint32_t thread_id, - const platform::DeviceContext* dev_ctx = nullptr); - double CpuElapsedUs(const Event& e) const; - double CudaElapsedUs(const Event& e) const; - - private: - EventKind kind_; - std::string name_; - uint32_t thread_id_; - int64_t cpu_ns_; -#ifdef PADDLE_WITH_CUDA - cudaEvent_t event_ = nullptr; - int device_ = -1; -#endif -}; - -struct EventList { - std::forward_list> event_blocks; -}; -``` - -As mentioned above, there is no need to record the timeline when disabling the profiler. So there is a global state to enable or disable the profiler. - -```c++ -enum ProfilerState { - kDisabled, - kCPU, - kCUDA -}; -ProfilerState g_state; -``` -- kDisabled: the disabled state. -- kCPU: CPU profiling state. -- kCUDA: GPU profiling state. - -A pair of starting and ending events are pushed to event lists in constructor and destructor of `RecordEvent`. So the timeline is recorded for the code in the lifecycle of an object of `RecordEvent`. - -```c++ -struct RecordEvent { - explicit RecordEvent(const std::string name, - platform::DeviceContext* dev_ctx = nullptr) { - if (kState == ProfilerState::kDisabled) return; - // push the starting event to the event lists. - } - ~RecordEvent() { - if (kState == ProfilerState::kDisabled) return; - // push the ending event to the event lists. - } -}; -``` diff --git a/develop/doc/_sources/design/program.md.txt b/develop/doc/_sources/design/program.md.txt deleted file mode 100644 index bd2456787c4e336d357a65255a8274a7c9e465cc..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/program.md.txt +++ /dev/null @@ -1,139 +0,0 @@ -# Design Doc: PaddlePaddle Programs - -## Compile and Execution - -A PaddlePaddle program consists of two parts -- the first generates a `ProgramDesc` protobuf message that describes the program, and the second runs this message using a C++ class `Executor`. - -A simple example PaddlePaddle program can be found in [graph.md](./graph.md): - -```python -x = layer.data("images") -l = layer.data("label") -y = layer.fc(x) -cost = layer.mse(y, l) -optimize(cost) -train(cost, reader=mnist.train()) -``` - -The first five lines of the following PaddlePaddle program generates, or, compiles, the `ProgramDesc` message. The last line runs it. - -## Programs and Blocks - -The basic structure of a PaddlePaddle program is some nested blocks, as a C++ or Java program. - -- program: some nested blocks -- [block](./block.md): - - some local variable definitions, and - - a sequence of operators - -The concept of block comes from usual programs. For example, the following C++ program has three blocks: - -```c++ -int main() { // block 0 - int i = 0; - if (i < 10) { // block 1 - for (int j = 0; j < 10; j++) { // block 2 - } - } - return 0; -} -``` - -The following PaddlePaddle program has three blocks: - -```python -import paddle as pd // block 0 - -x = minibatch([10, 20, 30]) # shape=[None, 1] -y = var(1) # shape=[1], value=1 -z = minibatch([10, 20, 30]) # shape=[None, 1] -cond = larger_than(x, 15) # [false, true, true] - -ie = pd.ifelse() -with ie.true_block(): // block 1 - d = pd.layer.add_scalar(x, y) - ie.output(d, pd.layer.softmax(d)) -with ie.false_block(): // block 2 - d = pd.layer.fc(z) - ie.output(d, d+1) -o1, o2 = ie(cond) -``` - -## `BlockDesc` and `ProgramDesc` - -All protobuf messages are defined in `framework.proto`. - -`BlockDesc` is straight-forward -- it includes local variable definitions, `vars`, and a sequence of operators, `ops`. - -```protobuf -message BlockDesc { - required int32 parent = 1; - repeated VarDesc vars = 2; - repeated OpDesc ops = 3; -} -``` - -The parent ID indicates the parent block so that operators in a block can refer to variables defined locally and also those defined in their ancestor blocks. - -All hierarchical blocks in a program are flattened and stored in an array. The block ID is the index of the block in this array. - -```protobuf -message ProgramDesc { - repeated BlockDesc blocks = 1; -} -``` - - -### Global Block - -The global block is the first one in the above array. - -## Operators that Use Blocks - -In the above example, the operator `IfElseOp` has two blocks -- the true branch and the false branch. - -The definition of `OpDesc` shows that an operator could have some attributes: - -```protobuf -message OpDesc { - AttrDesc attrs = 1; - ... -} -``` - -and an attribute could be of type block, which is, in fact, a block ID as described above: - -``` -message AttrDesc { - required string name = 1; - - enum AttrType { - INT = 1, - STRING = 2, - ... - BLOCK = ... - } - required AttrType type = 2; - - optional int32 block = 10; // when type == BLOCK - ... -} -``` - -## InferShape - -With this design, the InferShape function should take the following parameters: - -```c++ -void InferShape(int current_block, - int current_operator, - ProgramDesc* program // might change VarDesc values. - ) { - ... -} -``` - -where - -- `current_block` indices into `ProgramDesc::blocks`, -- `current_operator` indices into `BlockDesc::ops`. diff --git a/develop/doc/_sources/design/prune.md.txt b/develop/doc/_sources/design/prune.md.txt deleted file mode 100644 index 4a5cf10c79a554779137f0cce5494fdd96ef6b7a..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/prune.md.txt +++ /dev/null @@ -1,63 +0,0 @@ -# Prune - -## Motivation - -We want to support running inference, training and checkpointing in one `ProgramDesc`. We implement -`void Prune(const ProgramDesc* input, ProgramDesc* output)` function, which takes a `ProgramDesc` -and generate a pruned `ProgramDesc`. - -## Challenge - -Pruning need to support both variables and operators being evaluation targets. Consider the following -different situations. - -```python -# Case 1: run foward pass. -cost_np = session.run(target=cost) -# Case 2: run backward passing. -opts_np, _ = session.run(target=[cost, opt]) -# Case 3: run checkpointing -_ = session.run(target=checkpoint) -``` - -## Solution - -To support evaluation of operators, we add `is_target` field in the `OpDesc`. - -```c++ -message OpDesc { - required string type = 3; - repeated Var inputs = 1; - repeated Var outputs = 2; - repeated Attr attrs = 4; - optional bool is_target = 5 [ default = false ]; -}; -``` - -To support evaluation of variables, we add [fetch_op](https://github.com/PaddlePaddle/Paddle/pull/4599). -For each variable in the `target`, we insert a `fetch_op` into the `ProgramDesc` with `variable` being -`fetch_op`'s input. Then we also set `fetch_op` is a target. - -### Algorithm - -If an operator needs to be run, it must fall into one of the following cases: - -1. It is the target. -2. It is depended by some other ops, meaning its output is some other op's input. - -The first case can be checked by `op_desc.is_traget()` . The second case can be implement as - -```c++ -bool HasDependentVar(const OpDesc& op_desc, const std::set& dependent_vars) { - for (auto& var : op_desc.outputs()) { - for (auto& argu : var.arguments()) { - if (dependent_vars.count(argu) != 0) { - return true; - } - } - } - return false; -} -``` - -Then the whole algorithm can be implemented as the following [code](https://github.com/tonyyang-svail/Paddle/blob/prune_impl/paddle/framework/prune.cc). diff --git a/develop/doc/_sources/design/python_api.md.txt b/develop/doc/_sources/design/python_api.md.txt deleted file mode 100644 index 73f6d7b90c7dca0d48109cf3d28d5f7cd56b5c0b..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/python_api.md.txt +++ /dev/null @@ -1,304 +0,0 @@ -# Design Doc: Python API - -Due to the refactorization of the PaddlePaddle core, we need Python classes to construct corresponding protobuf messages that describe a DL program. - -| Python classes | Protobuf messages | -| --- | --- | -| Program | ProgramDesc | -| Block | BlockDesc | -| Operator | OpDesc | -| Variable | VarDesc | - -Please be aware that these Python classes need to maintain some construction-time information, which are not part of the protobuf messages. - -## Core Concepts - -### Program - -A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), which is composed of an array of `BlockDesc`s. The `BlockDesc`s in a `ProgramDesc` can have a tree-like hierarchical structure. However, the `ProgramDesc` onlys stores a flattened array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator need to be able to access variables in its ancestor blocks. - -Whenever we create a block, we need to set its parent block to the current block, hence the Python class `Program` needs to maintain a data member `current_block`. - -```python -class Program(objects): - def __init__(self): - self.desc = core.NewProgram() # a C++ ProgramDesc pointer. - self.blocks = vector() - self.blocks.append(Block(self, -1)) # the global block - self.current_block = 0 # initialized to the global block - - def global_block(): - return self.blocks[0] - - def current_block(): - return self.get_block(self.current_block) - - def rollback(): - self.current_block = self.current_block().parent_idx - - def create_block(): - new_block_idx = len(self.block) - self.blocks.append(Block(self, self.current_block)) - self.current_block = new_block_idx - return current_block() -``` - -`Program` is an accessor to the protobuf message `ProgramDesc`, which is created in C++ space, because the InferShape function is in C++, which manipulates `VarDesc` messages, which are in turn members of `BlockDesc`, which is a member of `ProgramDesc`. - -`Program` creates the first block as the global block in its constructor. All parameters and their initializer operators are in the global block. - -### Block - -A [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md) includes - -1. a map from variable names to an instance of the Python `Variable` class, and -1. a list of `Operator` instances. - -```python -class Block(objects): - def __init__(self, program, parent_idx): - self.desc = core.NewBlock(program.desc) - self.program = program - self.vars = map() - self.ops = vector() - self.parent_idx = parent_idx - - def create_var(self, ...): - return Variable(self, ...) - - def _create_global_var(self, ...): - program.global_block().create_var(...) - - def create_parameter(self, name, ...): - # Parameter is a subclass of variable. See Parameter section for details. - self.vars[name] = Parameter(self._create_global_var(...), ...) - return self.vars[name] - - def append_operator(self, ...): - self.ops.append(Operator(self, ...)) - - def prepend_operator(self, ...): # Parameter's ctor prepands initialize operators. - self.ops.prepend(Operator(self, ...)) -``` - -`create_parameter` is necessary because parameters are global variables, defined in the global block, but can be created in some sub-blocks. For example, an FC layer in the step block of an RNN operator. - -`prepend_operator` is necessary because the constructor of `Parameter` needs to create the initialize (or load) operator of the parameter, and would like to put it in the *preamble* of the global block. - -### Operator - -The `Operator` class fills in the `OpDesc` message and calls the C++ function `InferShape` to infer the output shapes from the input shapes. - -```python -class Operator(object): - def __init__(self, - block, # Block - type, # string - inputs, # dict - outputs,# dict - attrs # dict - ): - self.desc = core.NewOpDesc(block.desc, type, inputs, outputs, attrs) - core.infer_shape(self.desc, inputs, outputs) - - def type(self): - return self.desc.type() -``` - -`Operator` creates the `OpDesc` message in C++ space, so that it can call the `InferShape` function, which is in C++. - -### Variable - -Operators take Variables as its inputs and outputs. - -```python -class Variable(object): - def __init__(self, - block=None, # Block - name=None, # string - shape, # tuple - dtype="float32", # string - lod_level=None # int - ): - if name is None: - name = unique_name_generator() - self.name = name - self.block = block - self.desc = core.NewVarDesc(block.desc, name, shape, lod_level) - self.writer = None -``` - -Please be aware of `self.writer`, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each write to a variable is represented by a Variable class. This is guaranteed by the fact that **`core.NewVarDesc` must NOT create a new `VarDesc` message if its name already exists in the specified block**. - -### Parameter - -A parameter is a global variable with an initializer (or load) operator. - -```python -class Parameter(Variable): - def __init__(self, - block=None, # Block - name=None, # string - shape, # tuple - dtype="float32", # string - lod_level=None # int - trainable, # bool - initialize_op_attrs, - optimize_op_attrs): - super(Parameter, self).__init__(block, name, shape, dtype, lod_level) - self.trainable = trainable - self.optimize_op_attrs = optimize_op_attrs - block.prepend(Operator(block, # Block - initialize_op_attrs['type'], # string - None, # no inputs - self, # output is the parameter - initialize_op_attrs) -``` - -When users create a parameter, they can call - -```python -program.create_parameter( - ..., - init_attr={ - type: "uniform_random", - min: -1.0, - max: 1.0, - }) -) -``` - -In above example, `init_attr.type` names an initialize operator. It can also name the load operator - -```python -init_attr={ - type: "load", - filename: "something.numpy", -} -``` - -`optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message. - -## Layer Function - -A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers. - -Layer functions take `Variable` and configuration parameters as its input and return the output variable(s). - -For example, `FullyConnected` take one or more variable as its input. The input could be input data or another layer's output. There are many configuration options for a `FullyConnected` layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The `FullyConnected` layer will return an output variable. - - -### Necessity for reusing code between layer functions - -There are a lot of code that can be reused. Such as - -* Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with `min = -1.0`, `max = 1.0`. and default initialize strategy for bias is to fill zero. -* Append the activation operator. -* Create a temporary variable. -* Create parameter. -* Generate a unique name. -* Add a bias. -* ... - -A mechanism to reuse code between layer functions is necessary. It will be around [150 lines of code](https://github.com/PaddlePaddle/Paddle/pull/4724/files#diff-823b27e07e93914ada859232ae23f846R12) if we write a `FullyConnected` layer without any helper functions. - - - -### Comparision between global functions and helper class - -The `FullyConnected` layer will be as follow when we provide global functions: - -```python -def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None): - if name is None: - name = unique_name("fc") - input = multiple_input(input) - param_attr = default_param_attr(param_attr) - param_attr = multiple_param_attr(param_attr, len(input)) - - # mul - mul_results = [] - for ipt, attr in zip(input, param_attr): - shape = ipt.shape[1:] + [size] - w = g_program.global_block().create_parameter(shape, ipt.dtype, name, attr) - tmp = create_tmp_var(name) - g_program.current_block().append_op("mul", {ipt, w}, {tmp}) - mul_results.append(tmp) - - # add sum - ... - # add bias - ... - # add activation - ... - return out -``` - -We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions: - -1. We need a namespace for these methods, then layer developers can quickly figure out what method they can use. -2. Global functions will force layer developers to pass its parameter time by time. - -So we provide a helper class, `LayerHelper`, to share code between layer functions. The `FullyConnected` Layer will be as follow. - -```python -def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None): - helper = LayerHelper(locals()) # pass all parameter to LayerHelper - - mul_results = [] - for ipt, param in helper.iter_multiple_input_and_param(): - w = helper.create_parameter(shape=ipt.shape[1:] + [size], dtype = ipt.dtype) - tmp = helper.create_tmp_variable() - helper.append_op('mul', {ipt, w}, {tmp}) - mul_results.append(tmp) - - pre_bias = helper.add_sum(mul_results) - pre_activation = helper.add_bias(pre_bias) - return helper.add_activation(pre_activation) -``` - -We not only use the fewer lines of code to write `fc_layer` but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing `helper.` in a python editor. - - -### Implementation of layer helper - -We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The `activation` is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of `add_activation` are: - -```python -class LayerHelper(object): - def __init__(self, **kwargs): # kwargs is short for `keyword arguments` - self.kwargs = kwargs - - def add_activation(self, input_var): - act = self.kwargs.get("act", None) # default value is None - if act is None: # do nothing if no act - return input_var - - tmp = self.create_tmp_var(self) - self.append_op(type=act, input=input_var, output=tmp) - return tmp -``` - -### Return value of layer functions - -The layer will return a Variable, which is also the output of an operator. However, outputs of a layer function have more attributes than an operator. There are parameter variables, and their gradient variables need to return. To return them is useful. For example, - -1. Users can debug the network by printing parameter gradients. -2. Users can append attributes to a parameter, such as, `param.stop_gradient=True` will make a parameter stop generate the gradient. We can fix the parameter value during training by using this attribute. - -However, it is good to return a Variable for layers, since all layers and operators use Variables as their parameters. We can just append a `param` field and a `grad` field for layer function since the Python is dynamic typing. - -The sample usage is - -```python -data = fluid.layers.data(...) -hidden = fluid.layers.fc(data, ...) -... - -executor.run(fetch_list=[hidden.param, hidden.param.grad], ...) -``` - - -## Optimizer - -[Optimizer Design Doc](./optimizer.md) diff --git a/develop/doc/_sources/design/reader/README.md.txt b/develop/doc/_sources/design/reader/README.md.txt deleted file mode 100644 index 2cd4b6225b61cf374458e40afabad7745f61ba71..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/reader/README.md.txt +++ /dev/null @@ -1,206 +0,0 @@ -# Python Data Reader Design Doc - -During the training and testing phases, PaddlePaddle programs need to read data. To help the users write code that performs reading input data, we define the following: - -- A *reader*: A function that reads data (from file, network, random number generator, etc) and yields the data items. -- A *reader creator*: A function that returns a reader function. -- A *reader decorator*: A function, which takes in one or more readers, and returns a reader. -- A *batch reader*: A function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items. - -and also provide a function which can convert a reader to a batch reader, frequently used reader creators and reader decorators. - -## Data Reader Interface - -*Data reader* doesn't have to be a function that reads and yields data items. It can just be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`) as follows: - -``` -iterable = data_reader() -``` - -The item produced from the iterable should be a **single** entry of data and **not** a mini batch. The entry of data could be a single item or a tuple of items. Item should be of one of the [supported types](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int etc.) - -An example implementation for single item data reader creator is as follows: - -```python -def reader_creator_random_image(width, height): - def reader(): - while True: - yield numpy.random.uniform(-1, 1, size=width*height) - return reader -``` - -An example implementation for multiple item data reader creator is as follows: -```python -def reader_creator_random_image_and_label(width, height, label): - def reader(): - while True: - yield numpy.random.uniform(-1, 1, size=width*height), label - return reader -``` - -## Batch Reader Interface - -*Batch reader* can be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list should be a tuple. - -Here are some valid outputs: - -```python -# a mini batch of three data items. Each data item consist three columns of data, each of which is 1. -[(1, 1, 1), -(2, 2, 2), -(3, 3, 3)] - -# a mini batch of three data items, each data item is a list (single column). -[([1,1,1],), -([2,2,2],), -([3,3,3],)] -``` - -Please note that each item inside the list must be a tuple, below is an invalid output: -```python - # wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],). - # Otherwise it is ambiguous whether [1,1,1] means a single column of data [1, 1, 1], - # or three columns of data, each of which is 1. -[[1,1,1], -[2,2,2], -[3,3,3]] -``` - -It is easy to convert from a reader to a batch reader: - -```python -mnist_train = paddle.dataset.mnist.train() -mnist_train_batch_reader = paddle.batch(mnist_train, 128) -``` - -It is also straight forward to create a custom batch reader: - -```python -def custom_batch_reader(): - while True: - batch = [] - for i in xrange(128): - batch.append((numpy.random.uniform(-1, 1, 28*28),)) # note that it's a tuple being appended. - yield batch - -mnist_random_image_batch_reader = custom_batch_reader -``` - -## Usage - -Following is how we can use the reader with PaddlePaddle: -The batch reader, a mapping from item(s) to data layer, the batch size and the number of total passes will be passed into `paddle.train` as follows: - -```python -# two data layer is created: -image_layer = paddle.layer.data("image", ...) -label_layer = paddle.layer.data("label", ...) - -# ... -batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128) -paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...) -``` - -## Data Reader Decorator - -The *Data reader decorator* takes in a single reader or multiple data readers and returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` in the syntax. - -Since we have a strict interface for data readers (no parameters and return a single data item), a data reader can be used in a flexible way using data reader decorators. Following are a few examples: - -### Prefetch Data - -Since reading data may take some time and training can not proceed without data, it is generally a good idea to prefetch the data. - -Use `paddle.reader.buffered` to prefetch data: - -```python -buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100) -``` - -`buffered_reader` will try to buffer (prefetch) `100` data entries. - -### Compose Multiple Data Readers - -For example, if we want to use a source of real images (say reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661). - -We can do the following : - -```python -def reader_creator_random_image(width, height): - def reader(): - while True: - yield numpy.random.uniform(-1, 1, size=width*height) - return reader - -def reader_creator_bool(t): - def reader: - while True: - yield t - return reader - -true_reader = reader_creator_bool(True) -false_reader = reader_creator_bool(False) - -reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader) -# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry. -# And we don't care about the second item at this time. -paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...) -``` - -### Shuffle - -Given the shuffle buffer size `n`, `paddle.reader.shuffle` returns a data reader that buffers `n` data entries and shuffles them before a data entry is read. - -Example: -```python -reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512) -``` - -## Q & A - -### Why does a reader return only a single entry, and not a mini batch? - -Returning a single entry makes reusing existing data readers much easier (for example, if an existing reader returns 3 entries instead if a single entry, the training code will be more complicated because it need to handle cases like a batch size 2). - -We provide a function: `paddle.batch` to turn (a single entry) reader into a batch reader. - -### Why do we need a batch reader, isn't is sufficient to give the reader and batch_size as arguments during training ? - -In most of the cases, it would be sufficient to give the reader and batch_size as arguments to the train method. However sometimes the user wants to customize the order of data entries inside a mini batch, or even change the batch size dynamically. For these cases using a batch reader is very efficient and helpful. - -### Why use a dictionary instead of a list to provide mapping? - -Using a dictionary (`{"image":0, "label":1}`) instead of a list (`["image", "label"]`) gives the advantage that the user can easily reuse the items (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or even skip an item (e.g., using `{"image_a":0, "label":2}`). - -### How to create a custom data reader creator ? - -```python -def image_reader_creator(image_path, label_path, n): - def reader(): - f = open(image_path) - l = open(label_path) - images = numpy.fromfile( - f, 'ubyte', count=n * 28 * 28).reshape((n, 28 * 28)).astype('float32') - images = images / 255.0 * 2.0 - 1.0 - labels = numpy.fromfile(l, 'ubyte', count=n).astype("int") - for i in xrange(n): - yield images[i, :], labels[i] # a single entry of data is created each time - f.close() - l.close() - return reader - -# images_reader_creator creates a reader -reader = image_reader_creator("/path/to/image_file", "/path/to/label_file", 1024) -paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...) -``` - -### How is `paddle.train` implemented - -An example implementation of paddle.train is: - -```python -def train(batch_reader, mapping, batch_size, total_pass): - for pass_idx in range(total_pass): - for mini_batch in batch_reader(): # this loop will never end in online learning. - do_forward_backward(mini_batch, mapping) -``` diff --git a/develop/doc/_sources/design/refactorization.md.txt b/develop/doc/_sources/design/refactorization.md.txt deleted file mode 100644 index f93d6155e1764386b01d2f0df3f141ab75cd55d4..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/refactorization.md.txt +++ /dev/null @@ -1,249 +0,0 @@ -# Design Doc: Refactorization Overview - -The goals of refactoring include: - -1. Making it easy for external contributors to write new elementary computation operations. -1. Making the codebase clean and readable. -1. Designing a new computation representation -- a computation graph of operators and variables. -1. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs. - -## Computation Graphs - -1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs. - - 1. Please refer to [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a concrete example. - -1. Users write Python programs to describe the graphs and run them (locally or remotely). - -1. A graph is composed of *variables* and *operators*. - -1. The description of graphs must be serializable/deserializable, so that: - - 1. It can be sent to the cloud for distributed execution, and - 1. It can be sent to clients for mobile or enterprise deployment. - -1. The Python program does two things - - 1. *Compilation* runs a Python program to generate a protobuf message representation of the graph and send it to - 1. the C++ library `libpaddle.so` for local execution, - 1. the master process of a distributed training job for training, or - 1. the server process of a Kubernetes serving job for distributed serving. - 1. *Execution* executes the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message. - -## Description and Realization of Computation Graph - -At compile time, the Python program generates a protobuf message representation of the graph, or a description of the graph. - -At runtime, the C++ program realizes the graph and runs it. - -| | Representation (protobuf messages) | Realization (C++ class objects) | -|---|---|---| -|Data|[VarDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L107)|[Variable](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24)| -|Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)| -|Block|BlockDesc|Block| - -The word *graph* is interchangeable with *block* in this document. A graph consists of computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`). - -## Compilation and Execution - -1. Run a Python program to describe the graph. In particular, the Python application program does the following: - - 1. Create `VarDesc` to represent local/intermediate variables, - 1. Create operators and set attributes, - 1. Validate attribute values, - 1. Infer the type and the shape of variables, - 1. Plan memory-reuse for variables, - 1. Generate the backward graph - 1. Add optimization operators to the computation graph. - 1. Optionally, split the graph for distributed training. - -1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the Python program does the following: - - 1. Create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block, - 1. realize local variables defined in the BlockDesc message in the new scope, - 1. a scope is similar to the stack frame in programming languages, - - 1. Create an instance of class `Block`, in which, - 1. realize operators in the BlockDesc message, - - 1. Run the Block by calling - 1. `Block::Eval(vector* targets)` for forward and backward computations, or - 1. `Block::Eval(vector* targets)` for optimization. - - -## Intermediate Representation (IR) - -```text -Compile Time -> IR -> Runtime -``` - -### Benefits of IR - -- Optimization - ```text - Compile Time -> IR -> Optimized IR -> Runtime - ``` -- Automatically send partitioned IR to different nodes. - - Automatic Data Parallelism - ```text - Compile Time - |-> Single GPU IR - |-> [trainer-IR-0, trainer-IR-1, pserver-IR] - |-> Node-0 (runs trainer-IR-0) - |-> Node-1 (runs trainer-IR-1) - |-> Node-2 (runs pserver-IR) - ``` - - Automatic Model Parallelism (planned for future) - ---- - -# Operator/OpWithKernel/OpKernel - -![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/49caf1fb70820fb4a6c217634317c9306f361f36/op_op_with_kern_class_diagram.dot) - ---- - -# Operator -![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot) - -* `Operator` is the fundamental building block of the user interface. - * Operator stores input/output variable names and attributes. - * The `InferShape` interface is used to infer the shape of the output variables based on the shapes of the input variables. - * Use `Run` to compute the `output` variables from the `input` variables. - ---- - -# OpWithKernel/Kernel - -![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/9d7f4eba185cf41c8e2fbfb40ae21890dbddcd39/op_with_kernel.dot) - -* `OpWithKernel` inherits `Operator`. -* `OpWithKernel` contains a Kernel map. - * `OpWithKernel::Run` get device's kernel, and invoke `OpKernel::Compute`. - * `OpKernelKey` is the map key. Only device place now, but may be data type later. - ---- - -# Why separate Kernel and Operator - -* Separate GPU and CPU code. - * Make Paddle capable of running without GPU. -* Make one operator (which is a user interface) and create many implementations. - * For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel. ---- - -# Libraries for Kernel development - -* `Eigen::Tensor` contains basic math and element-wise functions. - * Note that `Eigen::Tensor` has broadcast implementation. - * Limit the number of `tensor.device(dev) = ` in your code. -* `thrust::transform` and `std::transform`. - * `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized element-wise kernels. - * `thrust`, in addition, supports more complex APIs, like `scan`, `reduce`, `reduce_by_key`. -* Hand-writing `GPUKernel` and `CPU` code - * Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.) ---- -# Operator Registration - -## Why is registration necessary? -We need a method to build mappings between Op type names and Op classes. - -## How is registration implemented? -Maintaining a map, whose key is the type name and the value is the corresponding Op constructor. - ---- -# The Registry Map - -### `OpInfoMap` - -`op_type(string)` -> `OpInfo` - -`OpInfo`: - -- **`creator`**: The Op constructor. -- **`grad_op_type`**: The type of the gradient Op. -- **`proto`**: The Op's Protobuf, including inputs, outputs and required attributes. -- **`checker`**: Used to check attributes. - ---- -# Related Concepts - -### Op_Maker -It's constructor takes `proto` and `checker`. They are completed during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)) - -### Register Macros -```cpp -REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class) -REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) -``` - ---- -# Registration Process -1. Write an Op class and its gradient Op class, if required. -2. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator. -3. Invoke the macro `REGISTER_OP`. This macro will - 1. Call maker class to complete `proto` and `checker` - 2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap` - ---- -# Backward Module (1/2) -### Create Backward Operator -- Mapping from forward Op to backward Op -![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png) - ---- -# Backward Module (2/2) -### Build Backward Network -- **Input**: a graph of forward operators -- **Output**: a graph of backward operators -- **Corner cases in construction** - - Shared Variables => insert an `Add` operator to combine gradients - - No Gradient => insert a `fill_zero_grad` operator - - Recursive NetOp => call `Backward` recursively - - RNN Op => recursively call `Backward` on stepnet - - RNN Op => recursively call `Backward` on stepnet - - ---- -# Scope, Variable, Tensor - -* `Tensor` is an n-dimension array with type. - * Only dims and data pointers are stored in `Tensor`. - * All operations on `Tensor` are written in `Operator` or global functions. - * Variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) -* `Variable` instances are the inputs and the outputs of an operator, not just `Tensor`. - * `step_scopes` in RNN is a variable and not a tensor. -* `Scope` is where variables are stored. - * map - * `Scope` has a hierarchical structure. The local scope can get variables from its parent scope. - ---- -# Block (in design) -## the difference between original RNNOp and Block -- As an operator is more intuitive than `RNNOp`, -- Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`, -- Fits the compile-time/ runtime separation design paradigm. - - During the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc` - - When graph executes, a Block with `BlockDesc` is passed. It then creates `Op` and `Var` instances and then invokes `Run`. - ---- -# Milestone -- Take Paddle/books as the main line, the requirement of the models motivates framework refactoring, -- Model migration - - Framework development gives **priority support** to model migration, for example, - - the MNIST demo needs a Python interface, - - the RNN models require the framework to support `LoDTensor`. - - Determine some timelines, - - Frequently used Ops need to be migrated first, - - Different models can be migrated in parallel. -- Improve the framework at the same time -- Accept imperfection, concentrate on solving the specific problem at the right price. - ---- -# Control the migration quality -- Compare the performance of migrated models with old ones. -- Follow the google C++ style guide. -- Build the automatic workflow of generating Python/C++ documentations. - - The documentation of layers and ops should be written inside the code. - - Take the documentation quality into account when submitting pull requests. - - Preview the documentations, read and improve them from a user's perspective. diff --git a/develop/doc/_sources/design/register_grad_op.md.txt b/develop/doc/_sources/design/register_grad_op.md.txt deleted file mode 100644 index 8d973eb53178c3e889c845144553a453e11f067c..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/register_grad_op.md.txt +++ /dev/null @@ -1,92 +0,0 @@ -# Design Doc: Gradient Operators Registration - - -## The Problem Posed - -Currently, for each C++ operator class definition, a *gradient operator creator* function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance. - -However, we noticed two problems with the current design: - -1. As we decided to separate the *compilation* and the *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message. - -1. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of *minus* operator consists of two operators -- an *identity* operator followed by a *scale* operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation. - -## The Current Implementation - -Instances of the C++ class `OpInfo` are stored an associative map whose key is the operator type. The `grad_op_type` indicates the associated gradient operator type. An operator can create the gradient operator by invoking `OpInfo::creator_` of the gradient operator. The pseudo code is as follows - -```cpp -struct OpInfo { - std::function creator_; - std::string grad_op_type_; - ... -}; - -map OpInfoMap; - -OperatorBase* CreateGradientOperator(const OperatorBase& op) { - return OpInfoMap.at(op.Type()).creator_(...); -} -``` - -## Proposed Solution - -The mapping relationship between an operator and its gradient operators is a function. The interface of this function is: - -```cpp -// (OpDesc) --> vector -std::function(const OpDescBind&)>; -``` - -The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for the protobuf message `OpDesc` for rapid manipulation of `OpDesc`. - -The `GradOpDescMaker` will be registered in `OpInfo` and will replace the `grad_op_type_` field. The `OpInfo` should look like - -```cpp -struct OpInfo { - std::function>(const OpDescBind&)> grad_op_maker_; - ... -}; -``` - -The `grad_op_maker_ ` is a `nullptr` if the operator does not have any associated gradient operators. - -We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is - -```cpp -class GradOpDescMakerBase { -public: - GradOpDescMakerBase(const OpDescBind& ); - virtual std::vector> operator()()const = 0; -}; -``` - -We can convert `GradOpDescMakerBase` to `std::function>(const OpDescBind&)>` by - -```cpp -using GradOpMaker = ...; -std::function(const OpDescBind&)> func; -func = [] (const OpDescBind& fwd_op) { - GradOpMaker maker(fwd_op); - return maker(); -}; -``` - -We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator. - -We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`. - -The user interface should be - -```cpp -vector MinusOpGradMaker(OpDesc) {...} -REGISTER_OPERATOR(minus, MinusOp, MinusOpProtoAndCheckerMaker, SumOpGradMaker); -// Developers can still manually implement gradient operator. -REGISTER_OPERATOR(minus_grad, MinusGradOp); -``` - -The interface of current `REGISTER_OP` macro could not be changed. In `REGISTER_OP`, it will invoke `REGISTER_OPERATOR` two times and generate GradOpDescMaker inside. - -```cpp -REGISTER_OP(minus, MinusOp, MinusOpProtoAndCheckerMaker, minus_grad, MinusGradOp); -``` diff --git a/develop/doc/_sources/design/regularization.md.txt b/develop/doc/_sources/design/regularization.md.txt deleted file mode 100644 index 21280ac898feb4dd5e5a5d9e88d121e856850f0b..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/regularization.md.txt +++ /dev/null @@ -1,72 +0,0 @@ -# Regularization in PaddlePaddle - -## Introduction to Regularization -A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. A frequently faced problem is the problem of **overfitting**, where the model does not make reliable predictions on new unseen data. **Regularization** is the process of introducing additional information in order to prevent overfitting. This is usually done by adding extra penalties to the loss function that restricts the parameter spaces that an optimization algorithm can explore. - -### Parameter Norm Penalties -Most common regularization approaches in deep learning are based on limiting the capacity of the models by adding a parameter norm penalty to the objective function `J`. This is given as follows: - -
- -The parameter `alpha` is a hyperparameter that weights the relative contribution of the norm penalty term, `omega`, relative to the standard objective function `J`. - -The most commonly used norm penalties are the L2 norm penalty and the L1 norm penalty. These are given as follows: - -##### L2 Regularization: -
- -##### L1 Regularization -
- -A much more detailed mathematical background of regularization can be found [here](http://www.deeplearningbook.org/contents/regularization.html). - -## Regularization Survey - -A detailed survey of regularization in various deep learning frameworks can be found [here](https://github.com/PaddlePaddle/Paddle/wiki/Regularization-Survey). - -## Proposal for Regularization in PaddlePaddle - -### Low-Level implementation - -In the new design, we propose to create new operations for regularization. For now, we can add 2 ops that correspond to the most frequently used regularizations: -- L2_regularization_op -- L1_regularization_op - -These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for [Activation Ops](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/accuracy_op.h). This abstraction pattern can make it very easy to implement new regularization schemes other than L1 and L2 norm penalties. - -The idea of building ops for regularization is in sync with the refactored Paddle philosophy of using operators to represent any computation unit. The way these ops will be added to the computation graph, will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) in Python API. - -### Computation Graph - -Below is an example of a really simple feed forward neural network. - -
- -The Python API will modify this computation graph to add regularization operators. The modified computation graph will look as follows: - -
-    -### Python API implementation for Regularization - -Using the low level ops, `L2_regularization_op` and `L1_regularization_op`, any user can add regularization to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support regularization. An example of such an API can be seen in [Keras](https://keras.io/regularizers/). As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since regularization is a property of parameters, it makes sense to create these in the layer functions. - -#### Creation of Regularization ops -There are two possibilities for creating the regularization ops: -1. We create these ops immediately while building the computation graph. -2. We add these ops in a lazy manner, just before the backward, similar to the way the optimization ops are added. - -The proposal is to add these ops in a lazy manner just before the backward pass. - -#### Storage of Regularization attributes - -Since we want to create the regularization ops in a lazy manner, the regularization attributes (type of regularization and weight of regularization penalty) can be stored as attributes of the [`Parameter`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/framework.py#L421) class. This is because regularization is a property of the parameters and storing regularization properties with Parameters also allows for shared parameters. - -#### High-level API - -In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers). - - - - - - diff --git a/develop/doc/_sources/design/releasing_process.md.txt b/develop/doc/_sources/design/releasing_process.md.txt deleted file mode 100644 index b9787261092f1f27377886152cb1596d9ff54188..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/releasing_process.md.txt +++ /dev/null @@ -1,90 +0,0 @@ -# PaddlePaddle发行规范 - -PaddlePaddle使用git-flow branching model做分支管理,使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。 - -PaddlePaddle每次发新的版本,遵循以下流程: - -1. 从`develop`分支派生出新的分支,分支名为`release/版本号`。例如,`release/0.10.0` -1. 将新分支的版本打上tag,tag为`版本号rc.Patch号`。第一个tag为`0.10.0rc1`,第二个为`0.10.0rc2`,依次类推。 -1. 对这个版本的提交,做如下几个操作: - * 使用Regression Test List作为检查列表,测试本次release的正确性。 - * 如果失败,记录下所有失败的例子,在这个`release/版本号`分支中,修复所有bug后,Patch号加一,到第二步 - * 修改`python/setup.py.in`中的版本信息,并将`istaged`字段设为`True`。 - * 编译这个版本的python wheel包,并发布到pypi。 - * 由于pypi.python.org目前遵循[严格的命名规范PEP 513](https://www.python.org/dev/peps/pep-0513),在使用twine上传之前,需要重命名wheel包中platform相关的后缀,比如将`linux_x86_64`修改成`manylinux1_x86_64`。 - * pypi上的package名称为paddlepaddle和paddlepaddle_gpu,如果要上传GPU版本的包,需要修改build/python/setup.py中,name: "paddlepaddle_gpu"并重新打包wheel包:`python setup.py bdist_wheel`。 - * 上传方法: - ``` - cd build/python - pip install twine - twine upload dist/[package to upload] - ``` - * 编译这个版本的Docker发行镜像,发布到dockerhub。如果失败,修复Docker编译镜像问题,Patch号加一,返回第二步 -1. 第三步完成后,将`release/版本号`分支合入master分支,并删除`release/版本号`分支。将master分支的合入commit打上tag,tag为`版本号`。同时再将`master`分支合入`develop`分支。最后删除`release/版本号`分支。 -1. 协同完成Release Note的书写 - - -需要注意的是: - -* `release/版本号`分支一旦建立,一般不允许再从`develop`分支合入`release/版本号`。这样保证`release/版本号`分支功能的封闭,方便测试人员测试PaddlePaddle的行为。 -* 在`release/版本号`分支存在的时候,如果有bugfix的行为,需要将bugfix的分支同时merge到`master`, `develop`和`release/版本号`这三个分支。 - -## 发布wheel包到pypi - -使用[PaddlePaddle CI](https://paddleci.ngrok.io/project.html?projectId=Manylinux1&tab=projectOverview) -完成自动化二进制编译,参考下图,选择需要发布的版本(通常包含一个CPU版本和一个GPU版本),点击"run"右侧的"..."按钮,可以 -弹出下面的选择框,在第二个tab (Changes)里选择需要发布的分支,这里选择0.11.0,然后点击"Run Build"按钮。等待编译完成后 -可以在此页面的"Artifacts"下拉框中找到生成的3个二进制文件,分别对应CAPI,`cp27m`和`cp27mu`的版本。然后按照上述的方法 -使用`twine`工具上传即可。 - - - -* 注:CI环境使用 https://github.com/PaddlePaddle/buildtools 这里的DockerImage作为编译环境以支持更多的Linux - 发型版,如果需要手动编译,也可以使用这些镜像。这些镜像也可以从 https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/ 下载得到。 -* pypi不支持覆盖上传,所以一个版本号的wheel包发布之后,不可以更改。下一个wheel包需要更新版本号才可以上传。 - -## 发布Docker镜像 - -上述PaddlePaddle CI编译wheel完成后会自动将Docker镜像push到DockerHub,所以,发布Docker镜像只需要对自动push的镜像打上 -版本号对应的tag即可: - -1. 进入 https://hub.docker.com/r/paddlepaddle/paddle/tags/ 查看latest tag的更新时间是否在上述编译wheel包完成后是否最新。 -1. 执行 `docker pull paddlepaddle/paddle:[latest tag]`,latest tag可以是latest或latest-gpu等。 -1. 执行 `docker tag paddlepaddle/paddle:[latest tag] paddlepaddle/paddle:[version]` -1. 执行 `docker push paddlepaddle/paddle:[version]` - -## PaddlePaddle 分支规范 - -PaddlePaddle开发过程使用[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,并适应github的特性做了一些区别。 - -* PaddlePaddle的主版本库遵循[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范。其中: - * `master`分支为稳定(stable branch)版本分支。每一个`master`分支的版本都是经过单元测试和回归测试的版本。 - * `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试,但并没有经过回归测试。 - * `release/版本号`分支为每一次Release时建立的临时分支。在这个阶段的代码正在经历回归测试。 - -* 其他用户的fork版本库并不需要严格遵守[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,但所有fork的版本库的所有分支都相当于特性分支。 - * 建议,开发者fork的版本库使用`develop`分支同步主版本库的`develop`分支 - * 建议,开发者fork的版本库中,再基于`develop`版本fork出自己的功能分支。 - * 当功能分支开发完毕后,向PaddlePaddle的主版本库提交`Pull Reuqest`,进而进行代码评审。 - * 在评审过程中,开发者修改自己的代码,可以继续在自己的功能分支提交代码。 - -* BugFix分支也是在开发者自己的fork版本库维护,与功能分支不同的是,BugFix分支需要分别给主版本库的`master`、`develop`与可能有的`release/版本号`分支,同时提起`Pull Request`。 - -## PaddlePaddle回归测试列表 - -本列表说明PaddlePaddle发版之前需要测试的功能点。 - -### PaddlePaddle Book中所有章节 - -PaddlePaddle每次发版本首先要保证PaddlePaddle Book中所有章节功能的正确性。功能的正确性包括验证PaddlePaddle目前的`paddle_trainer`训练和纯使用`Python`训练模型正确性。 - -| | 新手入门章节 | 识别数字 | 图像分类 | 词向量 | 情感分析 | 语意角色标注 | 机器翻译 | 个性化推荐 | -| --- | --- | --- | --- | --- | --- | --- | --- | --- | -| API.V2 + Docker + GPU | | | | | | | | | -| API.V2 + Docker + CPU | | | | | | | | | -| `paddle_trainer` + Docker + GPU | | | | | | | | | -| `paddle_trainer` + Docker + CPU | | | | | | | | | -| API.V2 + Ubuntu + GPU | | | | | | | | | -| API.V2 + Ubuntu + CPU | | | | | | | | | -| `paddle_trainer` + Ubuntu + GPU | | | | | | | | | -| `paddle_trainer` + Ubuntu + CPU | | | | | | | | | diff --git a/develop/doc/_sources/design/scope.md.txt b/develop/doc/_sources/design/scope.md.txt deleted file mode 100644 index 4da76eebb74abcd26ec2b8671399e6bc4fb58574..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/scope.md.txt +++ /dev/null @@ -1,124 +0,0 @@ -# Design of Scope in Paddle - -## Overview - -Scope is an important concept in programming languages, which defines a program region that a set of bindings between names and entities applies. In a specific scope, a valid name is uniquely associated with an entity, such as a variable. And in another scope, this name may refer to other entity or nothing at all. It clearly restricts the visibility and validity of names in a program. Hence **Scope** is introduced to PaddlePaddle to manage variables in context. But different from the original abstract concept, Scope now becomes an object with two important attributes: - -- Scope is an association of a name to variable. -- Variables in a parent scope can be retrieved from local scope. - -A detailed explanation of these two attributes goes as following. - - -## Scope is an association of a name to variable. - -Scope is an association of a name to variable. All variables belong to `Scope`. You need to specify a scope to run a Net, i.e., `net.Run(&scope)`. One net can run in different scopes and update different variable in the scope. - - -1. Scope only contains a map of a name to variable. - - All parameters, data, states in a Net should be variables and stored inside a scope. Each op should get inputs and outputs to do computation from a scope, such as data buffer, state (momentum) etc. - -1. Variable can only be created by Scope and a variable can only be got from Scope. User cannot create or get a variable outside a scope. This is a constraints of our framework, and will keep our framework simple and clear. - -1. Scope only contains methods that are used to Create and Get Variables. Scope do not contain Operators and have no information to run them. - `Net` is designed to drive the computation and Scope only contains a map of variables. There is no computation logic inside a `Scope`. Scope just handles the lifetime management of variables. - - `Create` is used to create a Variable by its name and add the mapping relation. - - `Get` is used to find a Variable by name. - -1. Every variable only belongs to one certain Scope. - - Variable can not belong to many scopes. If you want to use variables from parent scope, you can use `parent scope`. - -1. Scope should destruct all Variables inside it when itself is destructed. User can never store `Variable` pointer somewhere else. - - Because Variable can only be got from Scope. When destroying Scope, we also need to destroy all the Variables in it. If user store `Variable` pointer to private data member or some global variable, the pointer will be an invalid pointer when associated `Scope` is destroyed. - -```cpp -class Scope { - public: - Variable* Var(const std::string& name); - const Variable* FindVar(const std::string& name) const; - - private: - std::unordered_map> vars_; -}; -``` - - -## Parent scope and local scope - -Just like [scope](https://en.wikipedia.org/wiki/Scope_(computer_science)) in programming languages, `Scope` in the neural network can also be a local scope. There are two attributes about local scope. - -1. We can create local variables in a local scope. When that local scope is destroyed, all local variables should also be destroyed. -2. Variables in a parent scope can be retrieved from local scopes of that parent scope, i.e., when user get a variable from a scope, it will try to search this variable in current scope. If there is no such variable in the local scope, `scope` will keep searching from its parent, until the variable is found or there is no parent. - -```cpp -class Scope { - public: - Scope(const std::shared_ptr& scope): parent_(scope) {} - - Variable* FindVar(const std::string& name) const { - auto it = vars_.find(name); - if (it != vars_.end()) { - return it->second.get(); - } else if (parent_ != nullptr) { - return parent_->FindVar(name); - } else { - return nullptr; - } - } - - private: - std::shared_ptr parent_ {nullptr}; -}; -``` - -In `Scope` class, there is a private data member called `parent_`. `parent_` is a smart pointer to its parent scope. When user `Get` a variable by its `name`, the `name` will be searched inside the current scope. If the variable cannot be found locally and parent scope is not a `nullptr`, the variable will be searched inside that parent scope. `parent_` pointer's default value is `nullptr`. It means that the scope is a global scope when `parent_` is nullptr. - -A local scope is very useful when we implement Recurrent Neural Network. Each timestep of an RNN should be a `Net`. Each `Net` of timestep (`StepNet` for short) should use an independent local scope. Just like variables in a while loop is inside a local scope in programming languages. By using a single `StepNet` and changing local scope, we can implement an RNN easily. - -# Interface Design - -```cpp -class Variable { - private: - Variable() = default; - friend class Scope; -}; - -class Scope { - private: - Scope(const std::shared_ptr& parent = nullptr); - - public: - static std::shared_ptr Create(const std::shared_ptr& parent = nullptr); - - // return nullptr if not found. - Variable* FindVar(const std::string& name) const; - - // return if already contains same name variable. - Variable* Var(const std::string& name); - - private: - std::shared_ptr parent_; - std::unordered_map> vars_; -}; -``` -## Only scope can create a variable - -To ensure `only scope can create a variable`, we should mark `Variable`'s constructor as a private member function, and Scope is a friend class of Variable. And then only `Var` can construct `Variable`. - -## When scope destroyed, all variables inside this scope should be destroyed together - -The scope hold unique pointers for all variables. User can `FindVar` from scope, but he should not hold this pointer as a member variable. Because when scope is destroyed, all variables inside this scope will be destroyed together. - -## Sharing a parent scope - -Local scope contains a `parent_` pointer. It is a linked-list for scopes. Using a `shared_ptr` because when a local scope is using, its parents cannot be destroyed. - -Also, as the parent scope is a `shared_ptr`, we can only `Create()` a scope shared pointer. We cannot construct a scope variable, because it cannot be passed to other scope as `parent` pointer. - -## Orthogonal interface - -`FindVar` will return `nullptr` when `name` is not found. It can be used as `Contains` method. `Var` will return an `Error` when there is a name conflict locally. Combine `FindVar` and `Var`, we can implement `Var` easily. diff --git a/develop/doc/_sources/design/selected_rows.md.txt b/develop/doc/_sources/design/selected_rows.md.txt deleted file mode 100644 index 1a98839a957612b91b2276b58818623ecc62d1d5..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/selected_rows.md.txt +++ /dev/null @@ -1,74 +0,0 @@ -# Design Doc: Selected Rows - -`SelectedRows` is a type of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure: - -```cpp -class SelectedRows { - private: - vector rows_; - Tensor value_; - int height_; -}; -``` - -The field `height_` is the first dimension of `SelectedRows`. The `rows` are the indices of the non-zero rows of `SelectedRows`. The `value_` field is an N-dim tensor of shape `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`. - -Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be: - -``` -x = SelectedRow { - rows = [73, 84], - value = [[1, 2], [3,4]] -} -``` - - -## SelectedRows in Protobuf - -`SelectedRows` is a type of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time because the `rows_` and `value_` are dependent on the training data. -So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description. - -```proto -message TensorDesc { - required DataType data_type = 1; - repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480] -} - -message LodTensorDesc { - required TensorDesc tensor = 1; - optional int lod_level = 2; -} - -message VarDesc { - required string name = 1; - enum VarType { - LOD_TENSOR = 0; - SELECTED_ROWS = 1; - } - required VarType type = 2; - optional LodTensorDesc lod_desc = 3; - optional TensorDesc selected_rows_desc = 4; - optional bool persistable = 5 [ default = false ]; -} -``` - -## InferShape for Selected Rows - -Just like `LoD` information, `InferShape` method will infer the output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor. - -For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following - -```cpp -void TableLookupGrad::InferShape(context) { - ... - context.SetDataType("Embedding.Grad", kSelectedRows); -} -``` - - -## Sparse Operators - -There are several operators that need to be written to support `SelectedRows`. These are: - -1. Operators which generate `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`. -2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`. diff --git a/develop/doc/_sources/design/simple_op_design.md.txt b/develop/doc/_sources/design/simple_op_design.md.txt deleted file mode 100644 index c7aeed7f9b4637e1c29d530f37b42d12500af82f..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/simple_op_design.md.txt +++ /dev/null @@ -1,202 +0,0 @@ -## Interaction between C++ and Python - -Users employ API in Python to describe their own network, however, the network construction actually happens in C++. so Protobuf is introduced to send the message between Python and C++. - -The Interaction between Python and C++ can be simplified as two steps: - -1. C++ tells Python how many Ops there are, and what parameter do users need to offer to initialize a new Op. Python then builds API for each Op at compile time. - -2. Users invoke APIs built by Python and provide necessary parameters. These parameters will be sent to C++ for finishing the Op construction task. - -### Message from C++ to Python - -We define a Protobuf message class `OpProto` to hold message needed in the first step. What should an `OpProto` contain? This question is equivalent to “What message do we need to offer, to build a Python API which is legal and user oriented and can use to describe a whole Op.” - -Following message are necessary: - -1. Op's name, and its simple comment. -2. Input and output variable number; each variable's name, type, and comment. -3. Op's attributes; each attribute includes name, type, comment, **default value** and **value range**. - -So `OpProto` can be defined as follows: - -```proto -enum AttrType { - INT = 1; - FLOAT = 2; - STRING = 3; - INTS = 4; - FLOATS = 5; - STRINGS = 6; -}; - -message AttrValue { - AttrType type = 1; - optional int iv = 2; - optional float fv = 3; - optional string sv = 4; - repeated int ivs = 5; - repeated float fvs = 6; - repeated string svs = 7; -}; - -message AttrProto { - required string name = 1; - required string comment = 2; - required AttrType type = 3; -}; - -message VarProto { - required string name = 1; - required string comment = 2; - required bool is_tensor = 3; -}; - -message OpProto { - repeated VarProto inputs = 1; - repeated VarProto outputs = 2; - repeated AttrProto attrs = 3; - required string type = 4; - required string comment = 5; -}; -``` - -To generate Python code automatically: - -```python -def create_python_ops_creatation_functions(): - op_protos = paddle.framework.OpRegistry.get_all_op_proto() - for type_name in op_protos: - op_proto = op_protos[type_name] - def __impl__(**kwargs): # User must use key word args in Paddle API - inputs = [kwargs.get(ipt.name, "") for ipt in op_proto.inputs] - outputs = [kwargs.get(opt.name, "") for opt in op_proto.outputs] - attrs = [cast_to_op_attr(attr, kwargs.get(attr.name, None)) for attr in op_proto.attrs] - opdesc = (input, outputs, type_name, attrs) - return paddle.framework.OpRegistry.CreateOp(opdesc) - __impl__.__doc__ = create_doc_string(op_proto) - globals()[type_name] = __impl__ - -create_python_ops_creatation_functions() -``` - -### Message from Python to C++ - -To hold message needed in the above second step, we define Protobuf message class `OpDesc`. It is used to hold user-specified parameters in Op describing. - -```proto -message OpDesc { - required string type = 1; - repeated string inputs = 2; - repeated string outputs = 3; - map attrs = 4; -}; -``` - -## OpProto Register - -Every Op has its own `OpProto`. For using convenience, we need to register them and record all their messages. For each `Op` class, we define a corresponding `OpMaker` class, in whose constructor we implement the `OpProto`'s building process. `OpMaker`'s constructor will be invoked by another function `OpRegistry::RegisterOp()`. - -```cpp -class OpProtoMaker { -public: - OpProtoMaker(OpProto* proto): proto_(proto) {} -protected: - OpProto* proto_; - void AddInput(const std::string& name, const std::string& desc) {...} - void AddAttr(const std::string& name, const std::string& desc, TypeId type) {...} - void AddComment(const std::string& comment) { ... } -}; - -class OpRegistry { -public: - using OpCreator = std::function; - - template - static void RegisterOp(const std::string& name) { - gCreators_[name] = [](const OpDesc& desc) { - return new OpType(desc); - }; - OpProto& opProto = gProtos_[name]; - OpMaker()(&opProto); - } - - static map gCreators_; - static map gProtos_; -}; - -template -class OpRegister { - public: - OpRegister(std::string type) { - OpRegistry::RegisterOp(type); - } -}; - -#define REGISTER_OP(op_class, op_maker_class, type_name) \ - class op_class##Register { \ - private: \ - const static OpRegister<#op_class, #op_maker_class> reg; \ - }; \ - const Register op_class##Register::reg(#type_name); - -class CosineOp { -// ... -} - -struct CosineOpProtoMaker : public OpProtoMaker { - CosineOpProtoMaker(OpProto* proto) : OpProtoMaker(proto) { - AddInput("input", "input of cosine op"); - AddAttr("scale", "scale of cosine op", float).Default(1.0).GreaterThan(0.0); - AddType("cos"); - AddComment("This is cos op"); - } -} - -REGISTER_OP(CosineOp, CosineOpProtoMaker, cos); -``` - -In `REGISTER_OP(CosineOp, CosineOpProtoMaker, cos)`, we register not only `CosineOp` but also `CosineOpProto`. As fields of `CosineOpProto`, the default value and value range of `scale` are also registered here. - -## Python API - -Python APIs are divided into two types, high-level API and low-level API. - -### High-Level API - -High-level API is called by users directly, so it should keep its style consistent with existing V2 APIs. - -Here is a sample about how a define a fc layer: - -```python -hd = fc_layer(input=data, size=56, with_bias=True, activation="sigmoid"); -``` - -`hd` is the output of `fc_layer` and it's a `variable`. It can be further sent into other layers as input. - -The definition of `fc_layer()`: - -```python -def fc_layer(input, size, with_bias, activation): - attr_map = {"size":size} - check_attrs(attr_map) - w = make_variable('w') - if with_bias: - b = make_variable('b') - else: - b = None - fc_output = make_variable('fc_output'); - fc_op(input, w, b, fc_output, attr_map) - act_output = make_variable('sigmod_output'); - if activation == "sigmod": - sigmod_op(fc_output, act_output); - elif: - # ... - return act_output; -``` - -### Low Leval API - -In above sample, `fc_op` and `sigmod_op` are low-level API. They build `OpDesc` and invoke corresponding C++ code. - -*TODO* diff --git a/develop/doc/_sources/design/speech/deep_speech_2.md.txt b/develop/doc/_sources/design/speech/deep_speech_2.md.txt deleted file mode 100644 index cfdc4d6df04344c70d3334626bd38eca997c31ff..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/speech/deep_speech_2.md.txt +++ /dev/null @@ -1,168 +0,0 @@ -# DeepSpeech2 on PaddlePaddle: Design Doc - -We are planning to build Deep Speech 2 (DS2) \[[1](#references)\], a powerful Automatic Speech Recognition (ASR) engine, on PaddlePaddle. For the first-stage plan, we have the following short-term goals: - -- Release a basic distributed implementation of DS2 on PaddlePaddle. -- Contribute a chapter of Deep Speech to PaddlePaddle Book. - -Intensive system optimization and low-latency inference library (details in \[[1](#references)\]) are not yet covered in this first-stage plan. - -## Table of Contents - -- [Tasks](#tasks) -- [Task Dependency](#task-dependency) -- [Design Details](#design-details) - - [Overview](#overview) - - [Row Convolution](#row-convolution) - - [Beam Search With CTC and LM](#beam-search-with-ctc-and-lm) -- [Future Work](#future-work) -- [References](#references) - -## Tasks - -We roughly break down the project into 14 tasks: - -1. Develop an **audio data provider**: - - Json filelist generator. - - Audio file format transformer. - - Spectrogram feature extraction, power normalization etc. - - Batch data reader with SortaGrad. - - Data augmentation (optional). - - Prepare (one or more) public English data sets & baseline. -2. Create a **simplified DS2 model configuration**: - - With only fixed-length (by padding) audio sequences (otherwise need *Task 3*). - - With only bidirectional-GRU (otherwise need *Task 4*). - - With only greedy decoder (otherwise need *Task 5, 6*). -3. Develop to support **variable-shaped** dense-vector (image) batches of input data. - - Update `DenseScanner` in `dataprovider_converter.py`, etc. -4. Develop a new **lookahead-row-convolution layer** (See \[[1](#references)\] for details): - - Lookahead convolution windows. - - Within-row convolution, without kernels shared across rows. -5. Build KenLM **language model** (5-gram) for beam search decoder: - - Use KenLM toolkit. - - Prepare the corpus & train the model. - - Create infererence interfaces (for Task 6). -6. Develop a **beam search decoder** with CTC + LM + WORDCOUNT: - - Beam search with CTC. - - Beam search with external custom scorer (e.g. LM). - - Try to design a more general beam search interface. -7. Develop a **Word Error Rate evaluator**: - - update `ctc_error_evaluator`(CER) to support WER. -8. Prepare internal dataset for Mandarin (optional): - - Dataset, baseline, evaluation details. - - Particular data preprocessing for Mandarin. - - Might need cooperating with the Speech Department. -9. Create **standard DS2 model configuration**: - - With variable-length audio sequences (need *Task 3*). - - With unidirectional-GRU + row-convolution (need *Task 4*). - - With CTC-LM beam search decoder (need *Task 5, 6*). -10. Make it run perfectly on **clusters**. -11. Experiments and **benchmarking** (for accuracy, not efficiency): - - With public English dataset. - - With internal (Baidu) Mandarin dataset (optional). -12. Time **profiling** and optimization. -13. Prepare **docs**. -14. Prepare PaddlePaddle **Book** chapter with a simplified version. - -## Task Dependency - -Tasks parallelizable within phases: - -Roadmap | Description | Parallelizable Tasks ------------ | :------------------------------------ | :-------------------- -Phase I | Simplified model & components | *Task 1* ~ *Task 8* -Phase II | Standard model & benchmarking & profiling | *Task 9* ~ *Task 12* -Phase III | Documentations | *Task13* ~ *Task14* - -Issue for each task will be created later. Contributions, discussions and comments are all highly appreciated and welcomed! - -## Design Details - -### Overview - -Traditional **ASR** (Automatic Speech Recognition) pipelines require great human efforts devoted to elaborately tuning multiple hand-engineered components (e.g. audio feature design, accoustic model, pronuncation model and language model etc.). **Deep Speech 2** (**DS2**) \[[1](#references)\], however, trains such ASR models in an end-to-end manner, replacing most intermediate modules with only a single deep network architecture. With scaling up both the data and model sizes, DS2 achieves a very significant performance boost. - -Please read Deep Speech 2 \[[1](#references),[2](#references)\] paper for more background knowledge. - -The classical DS2 network contains 15 layers (from bottom to top): - -- **Two** data layers (audio spectrogram, transcription text) -- **Three** 2D convolution layers -- **Seven** uni-directional simple-RNN layers -- **One** lookahead row convolution layers -- **One** fully-connected layers -- **One** CTC-loss layer - -
-
-Figure 1. Archetecture of Deep Speech 2 Network. -
- -We don't have to persist on this 2-3-7-1-1-1 depth \[[2](#references)\]. Similar networks with different depths might also work well. As in \[[1](#references)\], authors use a different depth (e.g. 2-2-3-1-1-1) for final experiments. - -Key ingredients about the layers: - -- **Data Layers**: - - Frame sequences data of audio **spectrogram** (with FFT). - - Token sequences data of **transcription** text (labels). - - These two type of sequences do not have the same lengthes, thus a CTC-loss layer is required. -- **2D Convolution Layers**: - - Not only temporal convolution, but also **frequency convolution**. Like a 2D image convolution, but with a variable dimension (i.e. temporal dimension). - - With striding for only the first convlution layer. - - No pooling for all convolution layers. -- **Uni-directional RNNs** - - Uni-directional + row convolution: for low-latency inference. - - Bi-direcitional + without row convolution: if we don't care about the inference latency. -- **Row convolution**: - - For looking only a few steps ahead into the feature, instead of looking into a whole sequence in bi-directional RNNs. - - Not nessesary if with bi-direcitional RNNs. - - "**Row**" means convolutions are done within each frequency dimension (row), and no convolution kernels shared across. -- **Batch Normalization Layers**: - - Added to all above layers (except for data and loss layer). - - Sequence-wise normalization for RNNs: BatchNorm only performed on input-state projection and not state-state projection, for efficiency consideration. - - -Required Components | PaddlePaddle Support | Need to Develop -:------------------------------------- | :-------------------------------------- | :----------------------- -Data Layer I (Spectrogram) | Not supported yet. | TBD (Task 3) -Data Layer II (Transcription) | `paddle.data_type.integer_value_sequence` | - -2D Convolution Layer | `paddle.layer.image_conv_layer` | - -DataType Converter (vec2seq) | `paddle.layer.block_expand` | - -Bi-/Uni-directional RNNs | `paddle.layer.recurrent_group` | - -Row Convolution Layer | Not supported yet. | TBD (Task 4) -CTC-loss Layer | `paddle.layer.warp_ctc` | - -Batch Normalization Layer | `paddle.layer.batch_norm` | - -CTC-Beam search | Not supported yet. | TBD (Task 6) - -### Row Convolution - -TODO by Assignees - -### Beam Search with CTC and LM - -
-
-Figure 2. Algorithm for CTC Beam Search Decoder. -
- -- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] as shown in Figure 2, with two important modifications for the ambiguous parts: - - 1) in the iterative computation of probabilities, the assignment operation is changed to accumulation for one prefix may comes from different paths; - - 2) the if condition ```if l^+ not in A_prev then``` after probabilities' computation is deprecated for it is hard to understand and seems unnecessary. -- An **external scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a white space appended in English decoding and any character appended in Mandarin decoding. -- Such external scorer consists of language model, word count or any other custom scorers. -- The **language model** is built from Task 5, with parameters should be carefully tuned to achieve minimum WER/CER (c.f. Task 7) -- This decoder needs to perform with **high efficiency** for the convenience of parameters tuning and speech recognition in reality. - - -## Future Work - -- Efficiency Improvement -- Accuracy Improvement -- Low-latency Inference Library -- Large-scale benchmarking - -## References - -1. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](http://proceedings.mlr.press/v48/amodei16.pdf). ICML 2016. -2. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595). arXiv:1512.02595. -3. Awni Y. Hannun, etc. [First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs](https://arxiv.org/abs/1408.2873). arXiv:1408.2873 diff --git a/develop/doc/_sources/design/support_new_device.md.txt b/develop/doc/_sources/design/support_new_device.md.txt deleted file mode 100644 index 8983df900460127fc130043c52373dab505363ba..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/support_new_device.md.txt +++ /dev/null @@ -1,240 +0,0 @@ -# Design Doc: Supporting new Device/Library - -## Background - -Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries in a flexible and efficient manner. - -On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example, Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library. - -On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are hardware independent. - -So, how to support a new Device/Library in Fluid becomes a challenge. - - -## Basic: Integrate A New Device/Library - -For a general overview of fluid, please refer to the [overview doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md). - -There are mainly three parts that we have to consider while integrating a new device/library: - -- Place and DeviceContext: indicate the device id and manage hardware resources - -- Memory and Tensor: malloc/free data on certain device - -- Math Functor and OpKernel: implement computing unit on certain devices/libraries - -### Place and DeviceContext - -Please note that device and computing library are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices. - -#### Place -Fluid uses class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent the device memory where data is located. If we add another device, we have to add the corresponding `DevicePlace`. - -``` - | CPUPlace -Place --| CUDAPlace - | FPGAPlace -``` - -And `Place` is defined as follows: - -``` -typedef boost::variant Place; -``` - -#### DeviceContext - -Fluid uses class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in different libraries, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`. - - -``` - /-> CPUDeviceContext -DeviceContext ----> CUDADeviceContext - \-> FPGADeviceContext -``` - -An example of Nvidia GPU is as follows: - -- DeviceContext - - -``` -class DeviceContext { - virtual Place GetPlace() const = 0; -}; -``` - - -- CUDADeviceContext - - -``` -class CUDADeviceContext : public DeviceContext { - Place GetPlace() const override { return place_; } -private: - CUDAPlace place_; - cudaStream_t stream_; - cublasHandle_t cublas_handle_; - std::unique_ptr eigen_device_; // binds with stream_ -}; -``` - -### Memory and Tensor - - -#### memory module - -Fluid provides the following [memory interfaces](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36): - -``` -template -void* Alloc(Place place, size_t size); - -template -void Free(Place place, void* ptr); - -template -size_t Used(Place place); -``` - -To implement these interfaces, we have to implement MemoryAllocator for different Devices. - - -#### Tensor - -[Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36) holds data with some shape in a specific Place. - -```cpp -class Tensor { - public: - /*! Return a pointer to mutable memory block. */ - template - inline T* data(); - - /** - * @brief Return a pointer to mutable memory block. - * @note If not exist, then allocation. - */ - template - inline T* mutable_data(platform::Place place); - - /** - * @brief Return a pointer to mutable memory block. - * - * @param[in] dims The dimensions of the memory block. - * @param[in] place The place of the memory block. - * - * @note If not exist, then allocation. - */ - template - inline T* mutable_data(DDim dims, platform::Place place); - - /*! Resize the dimensions of the memory block. */ - inline Tensor& Resize(const DDim& dims); - - /*! Return the dimensions of the memory block. */ - inline const DDim& dims() const; - - private: - /*! holds the memory block if allocated. */ - std::shared_ptr holder_; - - /*! points to dimensions of memory block. */ - DDim dim_; -}; -``` - -`Placeholder` is used to delay memory allocation; that is, we can first define a tensor, using `Resize` to configurate its shape, and then call `mutuable_data` to allocate the actual memory. - -```cpp -paddle::framework::Tensor t; -paddle::platform::CPUPlace place; -// set size first -t.Resize({2, 3}); -// allocate memory on CPU later -t.mutable_data(place); -``` - - - -### Math Functor and OpKernel - -Fluid implements computing units based on different DeviceContexts. Some computing units are shared between operators. This common part will be put in operators/math directory as basic Functors. - -Let's take [MaxOutFunctor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27) as an example: - -The interface is defined in the header file. - -``` -template -class MaxOutFunctor { - public: - void operator()(const DeviceContext& context, const framework::Tensor& input, - framework::Tensor* output, int groups); -}; -``` - -CPU implementation is in .cc file - -``` -template -class MaxOutFunctor { - public: - void operator()(const platform::CPUDeviceContext& context, - const framework::Tensor& input, framework::Tensor* output, - int groups) { - ... - } -}; -``` - -CUDA implementation is in .cu file - -``` -template -class MaxOutFunctor { - public: - void operator()(const platform::CUDADeviceContext& context, - const framework::Tensor& input, framework::Tensor* output, - int groups) { - ... - } -}; -``` - - -We first obtain the computing handle from a concrete DeviceContext and then compute on tensors. - -The implementation of `OpKernel` is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map. - -Fluid provides different register interfaces in op_registry.h - - -Let's take [Crop](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.cc#L134) operator as an example: - -In .cc file: - -``` -REGISTER_OP_CPU_KERNEL(crop, ops::CropKernel); -REGISTER_OP_CPU_KERNEL( - crop_grad, ops::CropGradKernel); -``` - -In .cu file: - -``` -REGISTER_OP_CUDA_KERNEL(crop, ops::CropKernel); -REGISTER_OP_CUDA_KERNEL( - crop_grad, ops::CropGradKernel); -``` - - -## Advanced topics: How to switch between different Device/Library - -Generally, we will implement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not suitable on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run on GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library. - - -For more details, please refer to following docs: - -- operator kernel type [doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md) -- switch kernel [doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md) diff --git a/develop/doc/_sources/design/switch.md.txt b/develop/doc/_sources/design/switch.md.txt deleted file mode 100644 index 827d0601c621e4a230de28e2baad8e196e69625e..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/switch.md.txt +++ /dev/null @@ -1,31 +0,0 @@ -### Design Doc: Switch - -### Background - -Many programming languages provide `switch` as a generalization of `if-elif-else`. We want to add it to Fluid. - -The following example shows the usage of `fluid.switch`. - -```python -a = fluid.Var(10) -b = fluid.Var(0) - -with switch() as switch: - with switch.case(fluid.less_equal(a, 10)): - fluid.print("Case 1") - with switch.case(fluid.larger(a, 0)): - fluid.print("Case 2") - with switch.default(): - fluid.print("Case 3") -``` - -### The Semantics - -1. A `switch` control-flow checks cases one-by-one. -1. The condition of each case is a boolean value, which is a scalar, and differs from the `fluid.if_else` control-flow, which condition could be a vector of boolean values. -1. It runs the first matched case, or the default case if there is one. -1. Once it matches a case, it runs the corresponding branch and only that branch. It's like there is a C's `break` keyword at the end of each case. - -The above program should print and print only "Case 1". - -The implementation of the backward pass of the `switch` control-flow is easier than the backward of the `if_else`, because `switch` runs at most one branch, whereas `if-else` could run more than one branches. diff --git a/develop/doc/_sources/design/tensor_array.md.txt b/develop/doc/_sources/design/tensor_array.md.txt deleted file mode 100644 index 37e4f7b90f94fa3eb015e733999cd84c96b2239c..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/tensor_array.md.txt +++ /dev/null @@ -1,271 +0,0 @@ -# Design for TensorArray -This design doc presents the necessity of a new C++ class `TensorArray`. -In addition to the very simple C++ implementation - -```c++ -class TensorArray { - public: - explicit TensorArray(const LoDTensor&); - explicit TensorArray(size_t size); - - private: - vector values_; -}; -``` - -We also need to expose it to PaddlePaddle's Python API, -because users would want to use it with our very flexible operators `WhileLoop`. -An example for a RNN based on dynamic operators is - -```python -input = pd.data(...) -num_steps = Var(12) - -TensorArray states(size=num_steps) -TensorArray step_inputs(unstack_from=input) -TensorArray step_outputs(size=num_steps) - -W = Tensor(...) -U = Tensor(...) -default_state = some_op() - -step = Var(1) - -wloop = paddle.create_whileloop(loop_vars=[step]) -with wloop.frame(): - wloop.break_if(pd.equal(step, num_steps) - pre_state = states.read(step-1, default_state) - step_input = step_inputs.read(step) - state = pd.sigmoid(pd.matmul(U, pre_state) + pd.matmul(W, step_input)) - states.write(step, state) - step_outputs.write(step, state) # output state - step.update(state+1) - -output = step_outputs.stack() -``` - -## Background -Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call `states[step_id]` will get the state in `step_id`th time step. - -An RNN can be implemented with the following pseudocode - -```c++ -Array states; -Array input_segments; -Array output_segments; -Parameter W, U; - -step = 1 -seq_len = 12 -while_loop { - if (step == seq_len) break; - states[step] = sigmoid(W * states[step-1] + U * input_segments[step]); - output_segments[step] = states[step] // take state as output - step++; -} -``` -According to the [RNN roadmap](https://github.com/PaddlePaddle/Paddle/issues/4561), there are several different RNNs that PaddlePaddle will eventually support. - -Currently, the basic RNN implementation supported by PaddlePaddle is the `recurrent_op` which takes tensors as input and splits them into `input_segments`. - - -Since a tensor cannot store variable-length sequences directly, PaddlePaddle implements the tensor with level of details (`LoDTensor` for short). -Segmenting the `LoDTensor` is much more complicated than splitting a tensor, that makes it necessary to refactor the `recurrent_op` with `LoDTensor` segmenting support. - -As the next step in RNN support, `dynamic_recurrent_op` should be introduced to handle inputs with variable-length sequences. - -The implementation is similar to `recurrent_op`. -The key difference is the way **the original input `LoDTensors` and outupts are split to get the `input_segments` and the `output_segments`.** - - -Though it can't be built over `recurrent_op` or `dynamic_recurrent_op` directly, -the logic behind splitting a tensor or a LoD tensor into `input_segments` remains the same. - -## Why `TensorArray` -The logic behind splitting the inputs to segments, states and outputs is similar and can be shared in a seperate module. - -The array of `states`, `input_segments` and `output_segments` would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes. - -So there should be an array-like container, which can store the segments of a tensor or LoD tensor. - -**This container can store an array of tensors and provides several methods to split a tensor or a LoD tensor** . -This is where the notion of `TensorArray` comes from. - -## Introduce TensorArray to uniform all the three RNNs -TensorArray as a new concept is borrowed from TensorFlow, -it is meant to be used with dynamic iteration primitives such as `while_loop` and `map_fn`. - -This concept can be used to support our new design of dynamic operations, and help to refactor some existing variant-sentence-related layers, -such as `recurrent_op`, `RecurrentGradientMachine`. - -In [our design for dynamic RNN](https://github.com/PaddlePaddle/Paddle/pull/4401), -`TensorArray` is used to segment inputs and store states in all time steps. -By providing some methods similar to a C++ array, -the definition of some state-based dynamic models such as RNN can be more natural and highly flexible. - -## Dynamic-operations on TensorArray - -`TensorArray` will be used directly when defining dynamic models, so some operators listed below should be implemented - -```python -# several helper operators for TensorArray -def tensor_array_stack(ta, tensor): - ''' - get a tensor array `ta`, return a packed `tensor`. - ''' - pass - -def tensor_array_unstack(tensor, ta): - ''' - get a `tensor`, unstack it and get a tensor array `ta`. - ''' - pass - -def tensor_array_write(ta, index, tensor, data_shared): - ''' - get a `tensor` and a scalar tensor `index`, write `tensor` into index-th - value of the tensor array `ta`. - `data_shared` is an attribute that specifies whether to copy or reference the tensors. - ''' - pass - -def tensor_array_read(ta, index, tensor): - ''' - get a tensor array `ta`, a scalar tensor `index`, read the index-th value of - `ta` and return as the `tensor`. - ''' - pass - -def tensor_array_size(ta, tensor): - ''' - get a tensor array `ta`, return the size of `ta` and return as the scalar `tensor`. - ''' - pass -``` - -It is trivial for users to use so many low-level operators, so some helper methods should be proposed in python wrapper to make `TensorArray` easier to use, -for example - -```python -class TensorArray: - def __init__(self, name): - self.name = name - self.desc = TensorArrayDesc() - - def stack(self, name=None): - ''' - Pack the values in a `TensorArray` into a tensor with rank one higher - than each tensor in `values`. - `stack` can be used to split tensor into time steps for RNN or whileloop. - - @name: str - the name of the variable to output. - ''' - tensor = Var(name) - tensor_array_stack(self.name, tensor) - return tensor - - def unstack(self, input): - ''' - Unpacks the given dimension of a rank-`R` tensor into rank-`(R-1)` tensors. - `unstack` can be used to concatenate all the time steps for RNN or whileloop. - - @input: str - the name of input tensor - ''' - tensor_array_unstack(tensor, self.name) - - def write(self, index, value, data_shared=True): - ''' - Write value into index of the TensorArray. - If `data_shared` is set to True, than the index-th value in TensorArray will - be shared with the tensor passed in. - - @index: str - name of a scalar tensor - @value: str - name of a tensor - @data_shared: bool - ''' - tensor_array_write(self.name, index, value, data_shared) - - def read(self, index, output): - ''' - Read the value at location `index` in the `TensorArray`. - - @index: str - name of a scalar tensor - @output: - name of a output variable - ''' - tensor_array_read(self.name, index, output) - - - def size(self, output): - ''' - Return the number of values. - - @output: str - name of a scalar tensor - ''' - tensor_array_size(self.name, output) -``` - -## LoDTensor-related Supports -The `RecurrentGradientMachine` in Paddle serves as a flexible RNN layer; it takes varience-length sequences as input, and output sequences too. - -Since each step of RNN can only take a tensor-represented batch of data as input, -some preprocess should be taken on the inputs such as sorting the sentences by their length in descending order and cut each word and pack to new batches. - -Such cut-like operations can be embedded into `TensorArray` as general methods called `unpack` and `pack`, -these two operations are similar to `stack` and `unstack` except that they operate on variable-length sequences formated as a LoD tensor rather than a tensor. - -Some definitions are like - -```python -def unpack(level): - ''' - Split LodTensor in some `level` and generate batches, if set `sort_by_length`, - will sort by length. - - Returns: - - a new `TensorArray`, whose values are LodTensors and represents batches - of data. - - an int32 Tensor, which stores the map from the new batch's indices to - original LoDTensor - ''' - pass - -def pack(level, indices_map): - ''' - Recover the original LoD-arranged LoDTensor with the values in a `TensorArray` - and `level` and `indices_map`. - ''' - pass -``` - -With these two methods, a varience-length sentence supported RNN can be implemented like - -```c++ -// input is the varient-length data -LodTensor sentence_input(xxx); -TensorArray ta; -Tensor indice_map; -Tensor boot_state = xxx; // to initialize rnn's first state -TensorArray::unpack(input, 1/*level*/, true/*sort_by_length*/, &ta, &indice_map); -TessorArray step_outputs; -TensorArray states; - -for (int step = 0; step = ta.size(); step++) { - auto state = states.read(step); - // rnnstep is a function which acts like a step of RNN - auto step_input = ta.read(step); - auto step_output = rnnstep(step_input, state); - step_outputs.write(step_output, true/*data_shared*/); -} - -// rnn_output is the final output of an rnn -LoDTensor rnn_output = ta.pack(ta, indice_map); -``` -the code above shows that by embedding the LoDTensor-related preprocess operations into `TensorArray`, -the implementation of a RNN that supports varient-length sentences is far more concise than `RecurrentGradientMachine` because the latter mixes all the codes together, hard to read and extend. diff --git a/develop/doc/_sources/design/var_desc.md.txt b/develop/doc/_sources/design/var_desc.md.txt deleted file mode 100644 index 6a45af1995463402ba9c65ddb51c6c8bb107f99e..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/design/var_desc.md.txt +++ /dev/null @@ -1,81 +0,0 @@ -## Background -PaddlePaddle divides the description of neural network computation into two stages: compile time and runtime. At compile time, the neural network computation is described as a `ProgramDesc` whereas at runtime an `Executor` interprets the `ProgramDesc` to compute the operations. - -PaddlePaddle uses proto message to describe compile time program because : - -1. The computation program description must be serializable and saved in a file. -1. During distributed training, the serialized program will be sent to multiple workers. It should also be possible to break the program into different components, each of which can be executed on a different worker. - -The computation `Program` consists of nested `Blocks`. Each `Block` will consist of data(i.e. `Variable`) and `Operations`. The concept to represent them is in the table below. - -| |compile time|runtime| -|---|---|---| -|Data|VarDesc(proto)|Variable(cpp)| -|Operation|OpDesc(proto)|Operator(cpp)| - - -## Definition of VarType - -A VarDesc should have a name, type and whether or not it is persistable. The are different kinds of variable types supported in PaddlePaddle, apart from the POD_Types like: `LOD_TENSOR`, `SELECTED_ROWS`, `FEED_MINIBATCH`, `FETCH_LIST`, `STEP_SCOPES`, `LOD_RANK_TABLE`, `LOD_TENSOR_ARRAY`, `PLACE_LIST`, `READER` and `CHANNEL`. These are declared inside `VarType`. A `VarDesc` then looks as the following: - -```proto -message VarDesc { - required string name = 1; - required VarType type = 2; - optional bool persistable = 3 [ default = false ]; -} -``` - -## Definition of TensorDesc - -```proto -message TensorDesc { - // Should only be PODType. Is enforced in C++ - required Type data_type = 1; - repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480] -} -``` - -The `Type` here comes from the enum defined inside of `VarType` : - -```proto -enum Type { - // Pod Types - BOOL = 0; - INT16 = 1; - INT32 = 2; - INT64 = 3; - FP16 = 4; - FP32 = 5; - FP64 = 6; - - // Other types that may need additional descriptions - LOD_TENSOR = 7; - SELECTED_ROWS = 8; - FEED_MINIBATCH = 9; - FETCH_LIST = 10; - STEP_SCOPES = 11; - LOD_RANK_TABLE = 12; - LOD_TENSOR_ARRAY = 13; - PLACE_LIST = 14; - READER = 15; - CHANNEL = 16; -} -``` - -A TensorDesc describes `SelectedRows` and `LoDTensor`. For details of `SelectedRows`, please reference [`SelectedRows`](./selected_rows.md). - -## Definition of LodTensorDesc - -```proto -message LoDTensorDesc { - required TensorDesc tensor = 1; - optional int32 lod_level = 2 [ default = 0 ]; -} -``` - -A LoDTensorDesc contains a tensor and a lod_level. - -## Definition of Variable in Python - -For Variable in Python, please reference [`Python API`](./python_api.md). diff --git a/develop/doc/_sources/dev/index_en.rst.txt b/develop/doc/_sources/dev/index_en.rst.txt index 5fdc30a2d688a6a269b4972c2591af60893e7dfb..549f5fa9aace7eb699d229e5f61fe10ae4ed4d66 100644 --- a/develop/doc/_sources/dev/index_en.rst.txt +++ b/develop/doc/_sources/dev/index_en.rst.txt @@ -6,3 +6,4 @@ Development contribute_to_paddle_en.md write_docs_en.rst + new_layer_en.rst diff --git a/develop/doc/_sources/dev/new_op_en.md.txt b/develop/doc/_sources/dev/new_op_en.md.txt deleted file mode 100644 index da8b1bdd1082e439456daf25e9b3a1e8eb534375..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/dev/new_op_en.md.txt +++ /dev/null @@ -1,336 +0,0 @@ -# How to write a new operator - - - [Background](#background) - - [Implementing C++ Types](#implementing-c-types) - - [Defining ProtoMaker](#defining-protomaker) - - [Defining Operator](#defining-operator) - - [Defining OpKernel](#defining-opkernel) - - [Registering Operator and OpKernel](#registering-operator-and-opkernel) - - [Compilation](#compilation) - - [Python Binding](#python-binding) - - [Unit Tests](#unit-tests) - - [Testing Forward Operators](#testing-forward-operators) - - [Testing Backward Operators](#testing-backward-operators) - - [Compiling and Running](#compiling-and-running) - - [Remarks](#remarks) -## Background - -Here are the base types needed. For details, please refer to the design docs. - -- `class OpProtoAndCheckerMaker`: Describes an Operator's input, output, attributes and description, mainly used to interface with Python API. -- `framework::OperatorBase`: Operator (Op)base class. -- `framework::OpKernel`: Base class for Op computation kernel. -- `framework::OperatorWithKernel`: Inherited from OperatorBase, describing an operator with computation kernels. - - -Operators can be categorized into two groups: operator with kernel(s) and operator without kernel(s). An operator with kernel(s) inherits from `OperatorWithKernel` while the one without kernel(s) inherits from `OperatorBase`. This tutorial focuses on implementing operators with kernels. In short, an operator includes the following information: - - - Information | Where is it defined --------------- | :---------------------- -OpProtoMake definition | `.cc`files, Backward Op does not need an OpProtoMake interface. -Op definition | `.cc` files -Kernel implementation | The kernel methods shared between CPU and CUDA are defined in `.h` files. CPU-specific kernels live in `.cc` files, while CUDA-specific kernels are implemented in `.cu`files. -Registering the Op | Ops are registered in `.cc` files; For Kernel registration, `.cc` files contain the CPU implementation, while `.cu` files contain the CUDA implementation. - - -New Operator implementations are added to the list [paddle/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators), with file names in the format `*_op.h` (if applicable), `*_op.cc`, `*_op.cu` (if applicable).** The system will use the naming scheme to automatically build operators and their corresponding Python extensions.** - - -Let's take matrix multiplication operator, [MulOp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc), as an example to introduce the writing of an Operator with Kernel. - - -## Implementing C++ Types - - -### Defining ProtoMaker - -Matrix Multiplication can be written as $Out = X * Y$, meaning that the operation consists of two inputs and pne output. - -First, define `ProtoMaker` to describe the Operator's input, output, and additional comments: - -```cpp -class MulOpMaker : public framework::OpProtoAndCheckerMaker { - public: - MulOpMaker(OpProto *proto, OpAttrChecker *op_checker) - : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("X", "(Tensor), 2D tensor of size (M x K)"); - AddInput("Y", "(Tensor), 2D tensor of size (K x N)"); - AddOutput("Out", "(Tensor), 2D tensor of size (M x N)"); - AddComment(R"DOC( -Two Element Mul Operator. -The equation is: Out = X * Y -)DOC"); - } -}; -``` - -[`MulOpMaker`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L43)is inherited from`framework::OpProtoAndCheckerMaker`, consisting of 2 variables in the constructor: - - - `framework::OpProto` stores Operator input and variable attribute, used for generating Python API interfaces. - - `framework::OpAttrChecker` is used to validate variable attributes. - -The constructor utilizes `AddInput`, `AddOutput`, and `AddComment`, so that the corresponding information will be added to `OpProto`. - -The code above adds two inputs `X` and `Y` to `MulOp`, an output `Out`, and their corresponding descriptions, in accordance to Paddle's [naming convention](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md). - - -An additional example [`ScaleOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37) is implemented as follows: - -```cpp -template -class ScaleOpMaker : public framework::OpProtoAndCheckerMaker { - public: - ScaleOpMaker(OpProto *proto, OpAttrChecker *op_checker) - : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("X", "The input tensor of scale operator.").NotInGradient(); - AddOutput("Out", "The output tensor of scale operator.").NotInGradient(); - AddComment(R"DOC(Scale operator -The equation is: Out = scale*X -)DOC"); - AddAttr("scale", "scale of scale operator.").SetDefault(1.0); - } -}; -``` - -There are two changes in this example: - -- `AddInput("X","...").NotInGradient()` expresses that input `X` is not involved in `ScaleOp`'s corresponding computation. If an input to an operator is not participating in back-propagation, please explicitly set `.NotInGradient()`. - -- `AddAttr("scale", "...").SetDefault(1.0);` adds `scale`constant as an attribute, and sets the default value to 1.0. - - -### Defining Operator - -The following code defines the interface for MulOp: - -```cpp -class MulOp : public framework::OperatorWithKernel { - public: - using framework::OperatorWithKernel::OperatorWithKernel; - - protected: - void InferShape(const framework::InferShapeContext &ctx) const override { - auto dim0 = ctx.Input("X")->dims(); - auto dim1 = ctx.Input("Y")->dims(); - PADDLE_ENFORCE_EQ(dim0.size(), 2, - "input X(%s) should be a tensor with 2 dims, a matrix", - ctx.op_.Input("X")); - PADDLE_ENFORCE_EQ(dim1.size(), 2, - "input Y(%s) should be a tensor with 2 dims, a matrix", - ctx.op_.Input("Y")); - PADDLE_ENFORCE_EQ( - dim0[1], dim1[0], - "First matrix's width must be equal with second matrix's height."); - ctx.Output("Out")->Resize({dim0[0], dim1[1]}); - } -}; -``` - -[`MulOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L22) is inherited from `OperatorWithKernel`. Its `public` member - -```cpp -using framework::OperatorWithKernel::OperatorWithKernel; -``` - -expresses an operator constructor using base class `OperatorWithKernel`, alternatively written as - -```cpp -MulOp(const std::string &type, const framework::VariableNameMap &inputs, - const framework::VariableNameMap &outputs, - const framework::AttributeMap &attrs) - : OperatorWithKernel(type, inputs, outputs, attrs) {} -``` - -`InferShape` interface needs to be re-written.`InferShape` is a constant method and cannot modify Op's member variables, its constant member `const framework::InferShapeContext &ctx` can be used to extract input, output, and attributes. It functions to - - - 1). validate and error out early: it checks input data dimensions and types. - - 2). configures the tensor shape in the output. - -Usually `OpProtoMaker` and `Op`'s type definitions are written in `.cc` files, which also include the registration methods introduced later. - -### Defining OpKernel - -`MulKernel` inherits `framework::OpKernel`, which includes the following templates: - -- `typename DeviceContext` denotes device context type. When different devices, namely the CPUDeviceContext and the CUDADeviceContext, share the same kernel, this template needs to be added. If they don't share kernels, this must not be added. An example of a non-sharing kernel is [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43). - -- `typename T` denotes data type, such as `float` or `double`. - -`MulKernel` types need to rewrite the interface for `Compute`. - -- `Compute` takes one input parameter: `const framework::ExecutionContext& context`. -- Compared with `InferShapeContext`, `ExecutionContext` includes device types, and can similarly extract input, output, and attribute variables. -- `Compute` implements the computation logics of an `OpKernel`. - -`MulKernel`'s implementation of `Compute` is as follows: - - ```cpp - template - class MulKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* Y = context.Input("Y"); - auto* Z = context.Output("Out"); - Z->mutable_data(context.GetPlace()); - auto& device_context = context.template device_context(); - math::matmul(*X, false, *Y, false, 1, Z, 0, device_context); - } - }; - ``` - -Note that **different devices (CPU, CUDA)share one Op definition; whether or not they share the same `OpKernel` depends on whether `Compute` calls functions can support both devices.** - -`MulOp`'s CPU and CUDA share the same `Kernel`. A non-sharing `OpKernel` example can be seen in [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43). - -To ease the writing of `OpKernel` compute, and for reusing code cross-device, [`Eigen-unsupported Tensor`](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?fileviewer=file-view-default) module is used to implement `Compute` interface. To learn about how the Eigen library is used in PaddlePaddle, please see [usage document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md). - - -This concludes the forward implementation of an operator. Next its operation and kernel need to be registered in a `.cc` file. - -The definition of its corresponding backward operator, if applicable, is similar to that of an forward operator. **Note that a backward operator does not include a `ProtoMaker`**. - -### Registering Operator and OpKernel - -- In `.cc` files, register forward and backward operator classes and the CPU kernel. - - ```cpp - namespace ops = paddle::operators; - REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad); - - REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel); - REGISTER_OP_CPU_KERNEL(mul_grad, - ops::MulGradKernel); - ``` - - In that code block, - - - `REGISTER_OP` registers the `ops::MulOp` class, type named `mul`, its type `ProtoMaker` is `ops::MulOpMaker`, registering `ops::MulOpGrad` as `mul_grad`. - - `REGISTER_OP_WITHOUT_GRADIENT` registers an operator without gradient. - - - `REGISTER_OP_CPU_KERNEL` registers `ops::MulKernel` class and specialized template types `paddle::platform::CPUPlace` and `float`, which also registers `ops::MulGradKernel`. - - -- Registering CUDA Kernel in `.cu` files - - Note that if CUDA Kernel is implemented using the `Eigen unsupported` module, then on top of `.cu`, a macro definition `#define EIGEN_USE_GPU` is needed, such as - - ```cpp - // if use Eigen unsupported module before include head files - #define EIGEN_USE_GPU - - namespace ops = paddle::operators; - REGISTER_OP_CUDA_KERNEL(mul, ops::MulKernel); - REGISTER_OP_CUDA_KERNEL(mul_grad, - ops::MulGradKernel); - ``` - -### Compilation - -Run the following commands to compile. - -``` -# maybe you need to rerun cmake -make mul_op -``` - -## Python Binding - -The system will automatically bind to Python and link it to a generated library. - -## Unit Tests - -Unit tests for an operator include - -1. comparing a forward operator's implementations on different devices, - -2. comparing a backward operator's implementation on different devices, and - -3. a scaling test for the backward operator. - -Here, we introduce the [unit tests for `MulOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py). - -### Testing Forward Operators - -A forward operator unit test inherits `unittest.TestCase` and defines metaclass `__metaclass__ = OpTestMeta`. More concrete tests are performed in `OpTestMeta`. Testing a forward operator requires the following: - -1. Defining input, output and relevant attributes in `setUp` method. - -2. Generating random input data. - -3. Implementing the same computation logic in a Python script. - -4. Call check gradient function to check the backward operator. - - ```python - import unittest - import numpy as np - from op_test import OpTest - - - class TestMulOp(OpTest): - def setUp(self): - self.op_type = "mul" - self.inputs = { - 'X': np.random.random((32, 84)).astype("float32"), - 'Y': np.random.random((84, 100)).astype("float32") - } - self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])} - - def test_check_output(self): - self.check_output() - - def test_check_grad_normal(self): - self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5) - - def test_check_grad_ingore_x(self): - self.check_grad( - ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X")) - - def test_check_grad_ingore_y(self): - self.check_grad( - ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y')) - ``` -Get its output, and compare it with the forward operator's own output. - -The code above first loads required packages. In addition, we have - -- `self.op_type = "mul" ` defines the type that is identical to what the operator's registered type. -- `self.inputs` defines input, with type `numpy.array` and initializes it. -- `self.outputs` defines output and completes the same operator computation in the Python script, and returns its result from the Python script. - -### Testing Backward Operators - -Some key points in checking gradient above include: - -- `test_normal` calls `check_grad` to validate scaling tests' correctness and stability through numeric methods. - - The first variable `["X", "Y"]` appoints `X` and `Y` to be scale tested. - - The second variable `"Out"` points to the network's final output target `Out`. - - The third variable `max_relative_error` points to the maximum relative tolerance error during scaling tests. -- `test_check_grad_ingore_x` and `test_check_grad_ingore_y`branches test the cases where there is only one scaling input. - -### Compiling and Running - - -Any new unit testing file of the format `test_*.py` added to the director `python/paddle/v2/framework/tests` is automatically added to the project to compile. - -Note that **unlike the compile test for Ops, running unit tests requires compiling the entire project** and requires compiling with flag `WITH_TESTING` on i.e. `cmake paddle_dir -DWITH_TESTING=ON`. - -After successfully compiling the project, run the following command to run unit tests: - -```bash -make test ARGS="-R test_mul_op -V" -``` - -Or, - -```bash -ctest -R test_mul_op -``` - -## Remarks - -- Every `*_op.h` (if applicable), `*_op.cc`, and `*_op.cu` (if applicable) must be created for a unique Op. Compiling will fail if multiple operators are included per file. -- The type with which an operator is registered needs to be identical to the Op's name. Registering `REGISTER_OP(B, ...)` in `A_op.cc` will cause unit testing failures. -- If the operator does not implement a CUDA kernel, please refrain from creating an empty `*_op.cu` file, or else unit tests will fail. -- If multiple operators rely on some shared methods, a file NOT named `*_op.*` can be created to store them, such as `gather.h`. diff --git a/develop/doc/_sources/dev/new_op_kernel_en.md.txt b/develop/doc/_sources/dev/new_op_kernel_en.md.txt deleted file mode 100644 index 123df0a7ee4943c0b789ef9cfa6e0804d0fdd564..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/dev/new_op_kernel_en.md.txt +++ /dev/null @@ -1,121 +0,0 @@ -## Add Kernels for a New Device - -### Background - -PaddlePaddle Fluid have hundreds of operators. Each operator could have one or more kernels. A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU. - -[This document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md) explains how to add an operator, and its kernels. The kernels of an operator are indexed by a C++ type [`OpKernelType`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md). An operator chooses the right kernel at runtime. This choosing mechanism is described [here](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md). - -### Write Kernels for A New Device - -#### Add A New Device - - For some historical reaons, we misuse the word *library* for *device*. For example, we call the deivce type by *library type*. An example is the header file [`library_type.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/library_type.h#L24). We will correct this ASAP. - -To register a new device, we need to add an enum value to `LibraryType`: - -``` -enum class LibraryType { - kPlain = 0, - kMKLDNN = 1, - kCUDNN = 2, -}; -``` - - -#### Add A New [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L53) - -If you have a new kind of Device, firstly you need to add a new kind of [`Place`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L53). For example `CUDAPlace`: - -```cpp -struct CUDAPlace { - CUDAPlace() : CUDAPlace(0) {} - explicit CUDAPlace(int d) : device(d) {} - - inline int GetDeviceId() const { return device; } - // needed for variant equality comparison - inline bool operator==(const CUDAPlace &o) const { - return device == o.device; - } - inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); } - - int device; -}; - -typedef boost::variant Place; -``` - -#### Add [device context]((https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L37)) -After a new kind of Device is added, you should add a corresponding [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L37) for it. - -```cpp -class DeviceContext { - public: - virtual ~DeviceContext() {} - virtual Place GetPlace() const = 0; - - virtual void Wait() const {} -}; -``` - -#### Implement new [OpKernel](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L351) for your Device. - -A detailed documentation can be found in [`new_op_and_kernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md) - -```cpp -class OpKernelBase { - public: - /** - * ExecutionContext is the only parameter of Kernel Run function. - * Run will get input/output variables, state such as momentum and - * device resource such as CUDA stream, cublas handle, etc. from - * ExecutionContext. User should construct it before run the Operator. - */ - - virtual void Compute(const ExecutionContext& context) const = 0; - - virtual ~OpKernelBase() = default; -}; - -template -class OpKernel : public OpKernelBase { - public: - using ELEMENT_TYPE = T; -}; -``` - - -#### Register the OpKernel to framework - -After writing the components described above, we should register the kernel to the framework. - -We use `REGISTER_OP_KERNEL` to do the registration. - -```cpp -REGISTER_OP_KERNEL( - op_type, - library_type, - place_type, - kernel0, kernel1, ...) -``` - -kernel0, kernel1 are kernels that have the same `op_type`, `library_type`, `place_type` but different `data_types`. - -take [`conv2d`]((https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/conv_cudnn_op.cu.cc#L318)) as an example: - - ```cpp - REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace, - paddle::operators::GemmConvKernel, - paddle::operators::GemmConvKernel); - - REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace, - paddle::operators::CUDNNConvOpKernel, - paddle::operators::CUDNNConvOpKernel); - ``` - -In the code above: - - - `conv2d` is the type/name of the operator - - `CUDNN/CPU` is `library` - - `paddle::platform::CUDAPlace/CPUPlace` is `place` - - template parameter `float/double` on `CUDNNConvOpKernel` is `data_type`. diff --git a/develop/doc/_sources/dev/use_eigen_en.md.txt b/develop/doc/_sources/dev/use_eigen_en.md.txt deleted file mode 100644 index e169106e12f5d62696f1f0e7163562793b32c18c..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/dev/use_eigen_en.md.txt +++ /dev/null @@ -1,146 +0,0 @@ -## How to use Eigen in Paddle - -Essentially, a neural network is a compute graph. T data needed for the computation is stored in `Tensor`s and its computation procedure is described by `Operator`s. An `Operator` calls the `Compute` interface in its corresponding `OpKernel` and operates on the `Tensor`. - - -### Eigen Tensor Module - -The Eigen Tensor module supports powerful element-wise computation. In addition, a piece of code written using it can be run on both the CPU and the GPU. - -Note that Eigen Tensor is still being actively developed, so its tests are not completely covered and its documentation may be sparse. - -For details on Eigen Tensor module, please see [doc 1](https://github.com/RLovelett/eigen/blob/master/unsupported/Eigen/CXX11/src/Tensor/README.md) and [doc 2](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md). - - -### paddle::framework::Tensor - -Paddle Tensor's is defined in the framework directory with the following interface: - -```cpp -class Tensor { - public: - /*! Return a pointer to mutable memory block. */ - template - inline T* data(); - - /** - * @brief Return a pointer to mutable memory block. - * @note If not exist, then allocation. - */ - template - inline T* mutable_data(platform::Place place); - - /** - * @brief Return a pointer to mutable memory block. - * - * @param[in] dims The dimensions of the memory block. - * @param[in] place The place of the memory block. - * - * @note If not exist, then allocation. - */ - template - inline T* mutable_data(DDim dims, platform::Place place); - - /*! Resize the dimensions of the memory block. */ - inline Tensor& Resize(const DDim& dims); - - /*! Return the dimensions of the memory block. */ - inline const DDim& dims() const; - - private: - /*! holds the memory block if allocated. */ - std::shared_ptr holder_; - - /*! points to dimensions of memory block. */ - DDim dim_; -}; -``` - -`Placeholder` is used to delay memory allocation; that is, we can first define a tensor, using `Resize` to configure its shape, and then call `mutuable_data` to allocate the actual memory. - -```cpp -paddle::framework::Tensor t; -paddle::platform::CPUPlace place; -// set size first -t.Resize({2, 3}); -// allocate memory on CPU later -t.mutable_data(place); -``` - -### paddle::framework::Tensor Usage -`AddOp` demonstrates Tensor's usage. - -- InferShape - -When computing a neural network's compute graph, first call every `Operator`'s `InferShape` method, and use `Resize` to configure the size of the output tensor. - -```cpp -void InferShape(const framework::InferShapeContext &ctx) const override { - PADDLE_ENFORCE_EQ(ctx.Input("X")->dims(), - ctx.Input("Y")->dims(), - "Two input of Add Op's dimension must be same."); - ctx.Output("Out")->Resize(ctx.Input("X")->dims()); -} -``` - - -- Run - -```cpp -void Compute(const framework::ExecutionContext& context) const override { - auto* input0 = context.Input("X"); - auto* input1 = context.Input("Y"); - auto* output = context.Output("Out"); - - output->mutable_data(context.GetPlace()); - - auto x = EigenVector::Flatten(*input0); - auto y = EigenVector::Flatten(*input1); - auto z = EigenVector::Flatten(*output); - - auto place = context.GetEigenDevice(); - - z.device(place) = x + y; -} -``` - - -### paddle::framework::Tensor到EigenTensor的转换 - -As shown above, in actual computation, we need to transform the input and output `Tensor`s into formats Eigen supports. We show some functions in [eigen.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/eigen.h) to implement the transformation from `paddle::framework::Tensor`to `EigenTensor/EigenMatrix/EigenVector/EigenScalar`. - -Using EigenTensor as an example: - -```cpp -Tensor t; -float* p = t.mutable_data(make_ddim({1, 2, 3}), platform::CPUPlace()); -for (int i = 0; i < 1 * 2 * 3; i++) { - p[i] = static_cast(i); -} - -EigenTensor::Type et = EigenTensor::From(t); -``` - -`From` is an interfacing method provided by the EigenTensor template, which implements the transformation from a `paddle::framework::Tensor` object to an EigenTensor. Since `rank` is a template parameter, it needs to be explicitly specified at the time of the transformation. - -In Eigen, tensors with different ranks are different types, with `Vector` bring a rank-1 instance. Note that `EigenVector::From` uses a transformation from an 1-dimensional Paddle tensor to a 1-dimensional Eigen tensor while `EigenVector::Flatten` reshapes a paddle tensor and flattens it into a 1-dimensional Eigen tensor. Both resulting tensors are still typed EigenVector. - -For more transformations, see the [unit tests](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/eigen_test.cc) in the `eigen_test.cc` file. - - - -### Implementing Computation - -While computing, the device interface is needed from the EigenTensors on the left hand side of the assignments. Note that the computation between EigenTensors only changes the data originally inthe Tensor and does not change all the shape information associated with the Tensor. - -```cpp -auto x = EigenVector::Flatten(*input0); -auto y = EigenVector::Flatten(*input1); -auto z = EigenVector::Flatten(*output); -auto place = context.GetEigenDevice(); -z.device(place) = x + y; -``` - -In this code segment, input0/input1/output can be Tensors of arbitrary dimension. We are calling Flatten from EigenVector, transforming a tensor of any dimension into a 1-dimensional EigenVector. After completing computation, input0/input1/output will retain the same shape information, and they can be resized using the `Resize` interface. - -Because the Eigen Tensor module is under-documented, please refer to `OpKernel`'s computation code in TensorFlow's [kernel module documentation](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/kernels). diff --git a/develop/doc/_sources/howto/cluster/fluid_cluster_train_en.md.txt b/develop/doc/_sources/howto/cluster/fluid_cluster_train_en.md.txt deleted file mode 100644 index b4465e8269c2e1603c02404ea33f8c4572e76442..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/howto/cluster/fluid_cluster_train_en.md.txt +++ /dev/null @@ -1,153 +0,0 @@ -# Fluid Distributed Training - -## Introduction - -In this article, we'll explain how to configure and run distributed training jobs with PaddlePaddle Fluid in a bare metal cluster. - -## Preparations - -### Getting the cluster ready - -Prepare the compute nodes in the cluster. Nodes in this cluster can be of any specification that runs PaddlePaddle, and with a unique IP address assigned to it. Make sure they can communicate to each other. - -### Have PaddlePaddle installed - -PaddlePaddle must be installed on all nodes. If you have GPU cards on your nodes, be sure to properly install drivers and CUDA libraries. - -PaddlePaddle build and installation guide can be found [here](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/build_and_install/index_en.html). - -In addition to above, the `cmake` command should be run with the option `WITH_DISTRIBUTE` set to on. An example bare minimum `cmake` command would look as follows: - -``` bash -cmake .. -DWITH_DOC=OFF -DWITH_GPU=OFF -DWITH_DISTRIBUTE=ON -DWITH_SWIG_PY=ON -DWITH_PYTHON=ON -``` - -### Update the training script - -#### Non-cluster training script - -Let's take [Deep Learning 101](http://www.paddlepaddle.org/docs/develop/book/01.fit_a_line/index.html)'s first chapter: "fit a line" as an example. - -The non-cluster version of this demo with fluid API is as follows: - -``` python -import paddle.v2 as paddle -import paddle.fluid as fluid - -x = fluid.layers.data(name='x', shape=[13], dtype='float32') -y_predict = fluid.layers.fc(input=x, size=1, act=None) -y = fluid.layers.data(name='y', shape=[1], dtype='float32') - -cost = fluid.layers.square_error_cost(input=y_predict, label=y) -avg_cost = fluid.layers.mean(x=cost) - -sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) -sgd_optimizer.minimize(avg_cost) - -BATCH_SIZE = 20 - -train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.uci_housing.train(), buf_size=500), - batch_size=BATCH_SIZE) - -place = fluid.CPUPlace() -feeder = fluid.DataFeeder(place=place, feed_list=[x, y]) -exe = fluid.Executor(place) - -exe.run(fluid.default_startup_program()) - -PASS_NUM = 100 -for pass_id in range(PASS_NUM): - fluid.io.save_persistables(exe, "./fit_a_line.model/") - fluid.io.load_persistables(exe, "./fit_a_line.model/") - for data in train_reader(): - avg_loss_value, = exe.run(fluid.default_main_program(), - feed=feeder.feed(data), - fetch_list=[avg_cost]) - - if avg_loss_value[0] < 10.0: - exit(0) # if avg cost less than 10.0, we think our code is good. -exit(1) -``` - -We created a simple fully-connected neural network training program and handed it to the fluid executor to run for 100 passes. - -Now let's try to convert it to a distributed version to run on a cluster. - -#### Introducing parameter server - -As we can see from the non-cluster version of training script, there is only one role in the script: the trainer, that performs the computing as well as holds the parameters. In cluster training, since multi-trainers are working on the same task, they need one centralized place to hold and distribute parameters. This centralized place is called the Parameter Server in PaddlePaddle. - -![parameter server architecture](src/trainer.png) - -Parameter Server in fluid not only holds the parameters but is also assigned with a part of the program. Trainers communicate with parameter servers via send/receive OPs. For more technical details, please refer to [this document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/dist_refactor/distributed_architecture.md). - -Now we need to create programs for both: trainers and parameter servers, the question is how? - -#### Slice the program - -Fluid provides a tool called "Distributed Transpiler" that automatically converts the non-cluster program into cluster program. - -The idea behind this tool is to find the optimize OPs and gradient parameters, slice the program into 2 pieces and connect them with send/receive OP. - -Optimize OPs and gradient parameters can be found from the return values of optimizer's minimize function. - -To put them together: - -``` python -... #define the program, cost, and create sgd optimizer - -optimize_ops, params_grads = sgd_optimizer.minimize(avg_cost) #get optimize OPs and gradient parameters - -t = fluid.DistributeTranspiler() # create the transpiler instance -# slice the program into 2 pieces with optimizer_ops and gradient parameters list, as well as pserver_endpoints, which is a comma separated list of [IP:PORT] and number of trainers -t.transpile(optimize_ops, params_grads, pservers=pserver_endpoints, trainers=2) - -... #create executor - -# in pserver, run this -#current_endpoint here means current pserver IP:PORT you wish to run on -pserver_prog = t.get_pserver_program(current_endpoint) -pserver_startup = t.get_startup_program(current_endpoint, pserver_prog) -exe.run(pserver_startup) -exe.run(pserver_prog) - -# in trainer, run this -... # define data reader -exe.run(fluid.default_startup_program()) -for pass_id in range(100): - for data in train_reader(): - exe.run(t.get_trainer_program()) - - -``` - -### E2E demo - -Please find the complete demo from [here](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book_distribute/notest_dist_fit_a_line.py). -First `cd` into the folder that contains the `python` files. In this case: - -```bash -cd /paddle/python/paddle/fluid/tests/book_distribute -``` - -In parameter server node run the following in the command line: - -``` bash -PSERVERS=192.168.1.2:6174 SERVER_ENDPOINT=192.168.1.2:6174 TRAINING_ROLE=PSERVER python notest_dist_fit_a_line.py -``` - -*please note we assume that your parameter server runs at 192.168.1.2:6174* - -Wait until the prompt `Server listening on 192.168.1.2:6174` - -Then in 2 of your trainer nodes run this: - -``` bash -PSERVERS=192.168.1.2:6174 SERVER_ENDPOINT=192.168.1.2:6174 TRAINING_ROLE=TRAINER python notest_dist_fit_a_line.py -``` - -*the reason you need to run this command twice in 2 nodes is because: in the script we set the trainer count to be 2. You can change this setting on line 50* - -Now you have 2 trainers and 1 parameter server up and running. diff --git a/develop/doc/_sources/howto/index_en.rst.txt b/develop/doc/_sources/howto/index_en.rst.txt index ae8b86f75b5de770312fb2fdc46db490a18e5ff6..2079be766f2d8e6d63ca11dccd98f80613309ceb 100644 --- a/develop/doc/_sources/howto/index_en.rst.txt +++ b/develop/doc/_sources/howto/index_en.rst.txt @@ -6,5 +6,6 @@ HOW TO cmd_parameter/index_en.rst cluster/index_en.rst + capi/index_en.rst rnn/index_en.rst optimization/gpu_profiling_en.rst diff --git a/develop/doc/_sources/howto/optimization/cpu_profiling_en.md.txt b/develop/doc/_sources/howto/optimization/cpu_profiling_en.md.txt deleted file mode 100644 index 01e5fddf61547f9fc86ef18a6f2e2ac508d22dbb..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/howto/optimization/cpu_profiling_en.md.txt +++ /dev/null @@ -1,196 +0,0 @@ -This tutorial introduces techniques we use to profile and tune the -CPU performance of PaddlePaddle. We will use Python packages -`cProfile` and `yep`, and Google's `perftools`. - -Profiling is the process that reveals performance bottlenecks, -which could be very different from what's in the developers' mind. -Performance tuning is done to fix these bottlenecks. Performance optimization -repeats the steps of profiling and tuning alternatively. - -PaddlePaddle users program AI applications by calling the Python API, which calls -into `libpaddle.so.` written in C++. In this tutorial, we focus on -the profiling and tuning of - -1. the Python code and -1. the mixture of Python and C++ code. - -## Profiling the Python Code - -### Generate the Performance Profiling File - -We can use Python standard -package, [`cProfile`](https://docs.python.org/2/library/profile.html), -to generate Python profiling file. For example: - -```bash -python -m cProfile -o profile.out main.py -``` - -where `main.py` is the program we are going to profile, `-o` specifies -the output file. Without `-o`, `cProfile` would outputs to standard -output. - -### Look into the Profiling File - -`cProfile` generates `profile.out` after `main.py` completes. We can -use [`cprofilev`](https://github.com/ymichael/cprofilev) to look into -the details: - -```bash -cprofilev -a 0.0.0.0 -p 3214 -f profile.out main.py -``` - -where `-a` specifies the HTTP IP, `-p` specifies the port, `-f` -specifies the profiling file, and `main.py` is the source file. - -Open the Web browser and points to the local IP and the specifies -port, we will see the output like the following: - -``` - ncalls tottime percall cumtime percall filename:lineno(function) - 1 0.284 0.284 29.514 29.514 main.py:1() - 4696 0.128 0.000 15.748 0.003 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/executor.py:20(run) - 4696 12.040 0.003 12.040 0.003 {built-in method run} - 1 0.144 0.144 6.534 6.534 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/__init__.py:14() -``` - -where each line corresponds to Python function, and the meaning of -each column is as follows: - -| column | meaning | -| --- | --- | -| ncalls | the number of calls into a function | -| tottime | the total execution time of the function, not including the execution time of other functions called by the function | -| percall | tottime divided by ncalls | -| cumtime | the total execution time of the function, including the execution time of other functions being called | -| percall | cumtime divided by ncalls | -| filename:lineno(function) | where the function is defined | - -### Identify Performance Bottlenecks - -Usually, `tottime` and the related `percall` time is what we want to -focus on. We can sort above profiling file by tottime: - -```text - 4696 12.040 0.003 12.040 0.003 {built-in method run} - 300005 0.874 0.000 1.681 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/dataset/mnist.py:38(reader) - 107991 0.676 0.000 1.519 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:219(__init__) - 4697 0.626 0.000 2.291 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp) - 1 0.618 0.618 0.618 0.618 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/__init__.py:1() -``` - -We can see that the most time-consuming function is the `built-in -method run`, which is a C++ function in `libpaddle.so`. We will -explain how to profile C++ code in the next section. At this -moment, let's look into the third function `sync_with_cpp`, which is a -Python function. We can click it to understand more about it: - -``` -Called By: - - Ordered by: internal time - List reduced from 4497 to 2 due to restriction <'sync_with_cpp'> - -Function was called by... - ncalls tottime cumtime -/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp) <- 4697 0.626 2.291 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp) -/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp) <- 4696 0.019 2.316 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:487(clone) - 1 0.000 0.001 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:534(append_backward) - - -Called: - - Ordered by: internal time - List reduced from 4497 to 2 due to restriction <'sync_with_cpp'> -``` - -The lists of the callers of `sync_with_cpp` might help us understand -how to improve the function definition. - -## Profiling Python and C++ Code - -### Generate the Profiling File - -To profile a mixture of Python and C++ code, we can use a Python -package, `yep`, that can work with Google's `perftools`, which is a -commonly-used profiler for C/C++ code. - -In Ubuntu systems, we can install `yep` and `perftools` by running the -following commands: - -```bash -apt update -apt install libgoogle-perftools-dev -pip install yep -``` - -Then we can run the following command - -```bash -python -m yep -v main.py -``` - -to generate the profiling file. The default filename is -`main.py.prof`. - -Please be aware of the `-v` command line option, which prints the -analysis results after generating the profiling file. By examining the - the print result, we'd know that if we stripped debug -information from `libpaddle.so` at build time. The following hints -help make sure that the analysis results are readable: - -1. Use GCC command line option `-g` when building `libpaddle.so` so to - include the debug information. The standard building system of - PaddlePaddle is CMake, so you might want to set - `CMAKE_BUILD_TYPE=RelWithDebInfo`. - -1. Use GCC command line option `-O2` or `-O3` to generate optimized - binary code. It doesn't make sense to profile `libpaddle.so` - without optimization, because it would anyway run slowly. - -1. Profiling the single-threaded binary file before the - multi-threading version, because the latter often generates tangled - profiling analysis result. You might want to set environment - variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically - starting multiple threads. - -### Examining the Profiling File - -The tool we used to examine the profiling file generated by -`perftools` is [`pprof`](https://github.com/google/pprof), which -provides a Web-based GUI like `cprofilev`. - -We can rely on the standard Go toolchain to retrieve the source code -of `pprof` and build it: - -```bash -go get github.com/google/pprof -``` - -Then we can use it to profile `main.py.prof` generated in the previous -section: - -```bash -pprof -http=0.0.0.0:3213 `which python` ./main.py.prof -``` - -Where `-http` specifies the IP and port of the HTTP service. -Directing our Web browser to the service, we would see something like -the following: - -![result](./pprof_1.png) - -### Identifying the Performance Bottlenecks - -Similar to how we work with `cprofilev`, we'd focus on `tottime` and -`cumtime`. - -![kernel_perf](./pprof_2.png) - -We can see that the execution time of multiplication and the computing -of the gradient of multiplication takes 2% to 4% of the total running -time, and `MomentumOp` takes about 17%. Obviously, we'd want to -optimize `MomentumOp`. - -`pprof` would mark performance critical parts of the program in -red. It's a good idea to follow the hints. diff --git a/develop/doc/_sources/howto/optimization/gpu_profiling_en.rst.txt b/develop/doc/_sources/howto/optimization/gpu_profiling_en.rst.txt index ed208ceaf7af0c5aab88fd4fcb18fa96b8c9ff38..50adb7da24906515cb5977db565e9f8a76599fef 100644 --- a/develop/doc/_sources/howto/optimization/gpu_profiling_en.rst.txt +++ b/develop/doc/_sources/howto/optimization/gpu_profiling_en.rst.txt @@ -54,7 +54,7 @@ In this tutorial, we will focus on nvprof and nvvp. :code:`test_GpuProfiler` from :code:`paddle/math/tests` directory will be used to evaluate above profilers. -.. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp +.. literalinclude:: ../../../../paddle/math/tests/test_GpuProfiler.cpp :language: c++ :lines: 137-151 :linenos: @@ -80,7 +80,7 @@ As a simple example, consider the following: 1. Add :code:`REGISTER_TIMER_INFO` and :code:`printAllStatus` functions (see the emphasize-lines). - .. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp + .. literalinclude:: ../../../../paddle/math/tests/test_GpuProfiler.cpp :language: c++ :lines: 137-151 :emphasize-lines: 8-12,14 @@ -127,7 +127,7 @@ To use this command line profiler **nvprof**, you can simply issue the following 1. Add :code:`REGISTER_GPU_PROFILER` function (see the emphasize-lines). - .. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp + .. literalinclude:: ../../../../paddle/math/tests/test_GpuProfiler.cpp :language: c++ :lines: 137-151 :emphasize-lines: 6-7 diff --git a/develop/doc/_sources/howto/read_source.md.txt b/develop/doc/_sources/howto/read_source.md.txt deleted file mode 100644 index edf46aff8c6cc9fc01d26c6453b3a8123238ef91..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/howto/read_source.md.txt +++ /dev/null @@ -1,67 +0,0 @@ -# PaddlePaddle Fluid Source Code Overview - -Examples: https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/tests/book - -Core: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework - -Operator: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators - -Memory: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory - -Platform: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/platform - -# Compile Time - -The following **defines** the NN. The definition goes into this [protocol buffer](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto). - -```python -x = fluid.layers.data(name='x', shape=[13], dtype='float32') -y = fluid.layers.data(name='y', shape=[1], dtype='float32') - -y_predict = fluid.layers.fc(input=x, size=1, act=None) -cost = fluid.layers.square_error_cost(input=y_predict, label=y) -avg_cost = fluid.layers.mean(x=cost) - -sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) -sgd_optimizer.minimize(avg_cost) -``` - -- Variables: `x`, `y`, `y_predict`, `cost` and `avg_cost`. [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/framework.py#) -- Layers: `fluid.layers.data`, `fluid.layers.fc` and `fluid.layers.mean` are layers. [Python](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/layers) - - Every Layer has one or more operators and variables/parameters - - All the operators are defined at [`paddle/operators/`](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators). Other worth-looking files: - - Base class: [`paddle/framework/operator.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h) - - Operator Registration: [`paddle/framework/op_registry.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h) - - Operator Lookup: [`paddle/framework/op_info.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_info.h) -- Optimizer: `fluid.optimizer.SGD`. It does the following - - Add backward operators. [[Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/backward.py)] - - Add optimizer operators. [[Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/optimizer.py)] - -# Run Time - -The following **evaluates** the NN. Instantiates all the variables, operators. - -```python -place = fluid.CPUPlace() -feeder = fluid.DataFeeder(place=place, feed_list=[x, y]) -exe = fluid.Executor(place) - -# Allocate memory. Initialize Parameter. -exe.run(fluid.default_startup_program()) - -# Allocate memory. Do computation. -exe.run(fluid.default_main_program(), - feed=feeder.feed(data), - fetch_list=[avg_cost]) -``` - -- Place: `place`. one of CPU, GPU or FPGA. [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h) - - The device handle are at [paddle/platform/device_context.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h) -- Executor: `fluid.Executor(place)`. [[Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/executor.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.cc)] - - Feeds the data: `feed=feeder.feed(data)` - - Evaluates all the operators - - Fetches the result: `fetch_list=[avg_cost]` -- Other worth looking files: - - Scope: [paddle/framework/scope.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/scope.h). Where all the variables live - - Variable: [paddle/framework/variable.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h). Where all the data (most likely tensors) live - - Tensor: [paddle/framework/tensor.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h). Where we allocate memory through [`paddle/memory/`](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory) diff --git a/develop/doc/_sources/mobile/cross_compiling_for_android_en.md.txt b/develop/doc/_sources/mobile/cross_compiling_for_android_en.md.txt deleted file mode 100644 index 6af16fc114a2310e364023ec43cc3c64149af8f7..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/mobile/cross_compiling_for_android_en.md.txt +++ /dev/null @@ -1,189 +0,0 @@ -# Build PaddlePaddle for Android - -There are two approaches to build PaddlePaddle for Android: - -- [Cross-Compiling Using Docker](#cross-compiling-using-docker) -- [Cross-Compiling on Linux](#cross-compiling-on-linux) - -## Cross-Compiling Using Docker - -Docker-based cross-compiling is the recommended approach because Docker runs on all major operating systems, including Linux, Mac OS X, and Windows. - -### Build the Docker Image - -The following steps pack all the tools that we need to build PaddlePaddle into a Docker image. - -```bash -$ git clone https://github.com/PaddlePaddle/Paddle.git -$ cd Paddle -$ docker build -t paddle:dev-android . -f Dockerfile.android -``` - -Users can directly use the published Docker image. - -```bash -$ docker pull paddlepaddle/paddle:latest-dev-android -``` - -For users in China, we provide a faster mirror. - -```bash -$ docker pull docker.paddlepaddlehub.com/paddle:latest-dev-android -``` - -### Build the Inference Library - -We can run the Docker image we just created to build the inference library of PaddlePaddle for Android using the command below: - -```bash -$ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=armeabi-v7a" -e "ANDROID_API=21" paddle:dev-android -``` - -The Docker image accepts two arguments `ANDROID_ABI` and `ANDROID_API`: - - -- - - - - - - - - - - - - - - - - - - - - - - -
ArgumentOptional ValuesDefault
ANDROID_ABIarmeabi-v7a, arm64-v8aarmeabi-v7a
ANDROID_API>= 1621
- -The ARM-64 architecture (`arm64-v8a`) requires at least level 21 of Android API. - -The default entry-point of the Docker image, [`paddle/scripts/docker/build_android.sh`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build_android.sh) generates the [Android cross-compiling standalone toolchain](https://developer.android.com/ndk/guides/standalone_toolchain.html) based on the argument: `ANDROID_ABI` or `ANDROID_API`. For information about other configuration arguments, please continue reading. - -The above command generates and outputs the inference library in `$PWD/install_android` and puts third-party libraries in `$PWD/install_android/third_party`. - -## Cross-Compiling on Linux - -The Linux-base approach to cross-compile is to run steps in `Dockerfile.android` manually on a Linux x64 computer. - -### Setup the Environment - -To build for Android's, we need [Android NDK]( -https://developer.android.com/ndk/downloads/index.html): - -```bash -wget -q https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip -unzip -q android-ndk-r14b-linux-x86_64.zip -``` - -Android NDK includes everything we need to build the [*standalone toolchain*](https://developer.android.com/ndk/guides/standalone_toolchain.html), which in then used to build PaddlePaddle for Android. (We plan to remove the intermediate stage of building the standalone toolchain in the near future.) - -- To build the standalone toolchain for `armeabi-v7a` and Android API level 21: - -```bash -your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \ - --arch=arm --platform=android-21 --install-dir=your/path/to/arm_standalone_toolchain -``` - - The generated standalone toolchain will be in `your/path/to/arm_standalone_toolchain`. - -- To build the standalone toolchain for `arm64-v8a` and Android API level 21: - -```bash -your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \ - --arch=arm64 --platform=android-21 --install-dir=your/path/to/arm64_standalone_toolchain -``` - - The generated standalone toolchain will be in `your/path/to/arm64_standalone_toolchain`. - -### Cross-Compiling Arguments - -CMake supports [choosing the toolchain](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling). PaddlePaddle provides [`android.cmake`](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/android.cmake), which configures the Android cross-compiling toolchain for CMake. `android.cmake` is not required for CMake >= 3.7, which support Android cross-compiling. PaddlePaddle detects the CMake version, for those newer than 3.7, it uses [the official version](https://cmake.org/cmake/help/v3.7/manual/cmake-toolchains.7.html#cross-compiling). - -Some other CMake arguments you need to know: - -- `CMAKE_SYSTEM_NAME` must be `Android`. This tells PaddlePaddle's CMake system to cross-compile third-party dependencies. This also changes some other CMake arguments like `WITH_GPU=OFF`, `WITH_AVX=OFF`, `WITH_PYTHON=OFF`, `WITH_RDMA=OFF`, `WITH_MKL=OFF` and `WITH_GOLANG=OFF`. -- `WITH_C_API` must be `ON`, to build the C-based inference library for Android. -- `WITH_SWIG_PY` must be `OFF` because the Android platform doesn't support SWIG-based API. - -Some Android-specific arguments: - -- `ANDROID_STANDALONE_TOOLCHAIN`: the absolute path of the Android standalone toolchain, or the path relative to the CMake build directory. PaddlePaddle's CMake extensions would derive the cross-compiler, sysroot and Android API level from this argument. -- `ANDROID_TOOLCHAIN`: could be `gcc` or `clang`. The default value is `clang`. - - For CMake >= 3.7, it should anyway be `clang`. For older versions, it could be `gcc`. - - Android's official `clang` requires `glibc` >= 2.15. -- `ANDROID_ABI`: could be `armeabi-v7a` or `arm64-v8a`. The default value is `armeabi-v7a`. -- `ANDROID_NATIVE_API_LEVEL`: could be derived from the value of `ANDROID_STANDALONE_TOOLCHAIN`. -- `ANROID_ARM_MODE`: - - could be `ON` or `OFF`, and defaults to `ON`, when `ANDROID_ABI=armeabi-v7a`; - - no need to specify when `ANDROID_ABI=arm64-v8a`. -- `ANDROID_ARM_NEON`: indicates if to use NEON instructions. - - could be `ON` or `OFF`, and defaults to `ON`, when `ANDROID_ABI=armeabi-v7a`; - - no need to specify when `ANDROID_ABI=arm64-v8a`. - -Other useful arguments: - -- `USE_EIGEN_FOR_BLAS`: indicates if using Eigen. Could be `ON` or `OFF`, defaults to `OFF`. -- `HOST_C/CXX_COMPILER`: specifies the host compiler, which is used to build the host-specific protoc and target-specific OpenBLAS. It defaults to the value of the environment variable `CC/C++`, or `cc/c++`. - -Some frequent configurations for your reference: - -```bash -cmake -DCMAKE_SYSTEM_NAME=Android \ - -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm_standalone_toolchain \ - -DANDROID_ABI=armeabi-v7a \ - -DANDROID_ARM_NEON=ON \ - -DANDROID_ARM_MODE=ON \ - -DUSE_EIGEN_FOR_BLAS=ON \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_C_API=ON \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -``` -cmake -DCMAKE_SYSTEM_NAME=Android \ - -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm64_standalone_toolchain \ - -DANDROID_ABI=arm64-v8a \ - -DUSE_EIGEN_FOR_BLAS=OFF \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_C_API=ON \ - -DWITH_SWIG_PY=OFF \ - .. -``` - - -There are some other arguments you might want to configure. - -- `CMAKE_BUILD_TYPE=MinSizeRel` minimizes the size of library. -- `CMAKE_BUILD_TYPE-Release` optimizes the runtime performance. - -Our own tip for performance optimization to use clang and Eigen or OpenBLAS: - -- `CMAKE_BUILD_TYPE=Release` -- `ANDROID_TOOLCHAIN=clang` -- `USE_EIGEN_BLAS=ON` for `armeabi-v7a`, or `USE_EIGEN_FOR_BLAS=OFF` for `arm64-v8a`. - -### Build and Install - -After running `cmake`, we can run `make; make install` to build and install. - -Before building, you might want to remove the `third_party` and `build` directories including pre-built libraries for other architectures. - -After building,in the directory `CMAKE_INSTALL_PREFIX`, you will find three sub-directories: - -- `include`: the header file of the inference library, -- `lib`: the inference library built for various Android ABIs, -- `third_party`: dependent third-party libraries built for Android. diff --git a/develop/doc/_sources/mobile/cross_compiling_for_ios_en.md.txt b/develop/doc/_sources/mobile/cross_compiling_for_ios_en.md.txt deleted file mode 100644 index 19bfe86c511c7e43b462f94c8cabba420b3007f1..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/mobile/cross_compiling_for_ios_en.md.txt +++ /dev/null @@ -1,120 +0,0 @@ -# Build PaddlePaddle for iOS - -This tutorial will walk you through cross compiling the PaddlePaddle library for iOS from the source in MacOS. - -## Preparation - -Apple provides Xcode for cross-compiling and IDE for iOS development. Download from App store or [here](https://developer.apple.com/cn/xcode/). To verify your installation, run command as follows - -```bash -$ xcodebuild -version -Xcode 9.0 -Build version 9A235 -``` - -## Cross-compiling configurations - -PaddlePaddle provides cross-compiling toolchain configuration documentation [cmake/cross_compiling/ios.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/ios.cmake), which has some default settings for frequently used compilers. - -There are some mandatory environment variables need to be set before cross compiling PaddlePaddle for iOS: - -- `CMAKE_SYSTEM_NAME`, CMake compiling target platform name, has to be `iOS`. PaddlePaddle CMake will compile all the third party dependencies and enforce some parameters (`WITH_C_API=ON`, `WITH_GPU=OFF`, `WITH_AVX=OFF`, `WITH_PYTHON=OFF`,`WITH_RDMA=OFF`) when this variable is set with value `iOS`. - -- `WITH_C_API`, Whether to compile inference C-API library, has to be `ON`, since C-API is the only supported interface for inferencing in iOS. -- `WITH_SWIG_PY`, has to be `OFF`. It's not supported to inference or train via swig in iOS. - -Optional environment variables for iOS are: - -- `IOS_PLATFORM`, either `OS` (default) or `SIMULATOR`. - - `OS`, build targets ARM-based physical devices like iPhone or iPad. - - `SIMULATOR`, build targets x86 architecture simulators. -- `IOS_ARCH`, target architecture. By default, all architecture types will be compiled. If you need to specify the architecture to compile for, please find valid values for different `IOS_PLATFORM` settings from the table below: - - - - - - - - - - - - - - - - - - - - - - -
IOS_PLATFORMIOS_ARCH
OSarmv7, armv7s, arm64
SIMULATORi386, x86_64
- -- `IOS_DEPLOYMENT_TARGET`, minimum iOS version to deployment, `7.0` by default. -- `IOS_ENABLE_BITCODE`, whether to enable [Bitcode](https://developer.apple.com/library/content/documentation/IDEs/Conceptual/AppDistributionGuide/AppThinning/AppThinning.html#//apple_ref/doc/uid/TP40012582-CH35-SW3), values can be `ON/OFF`, `ON` by default. -- `IOS_USE_VECLIB_FOR_BLAS`, whether to use [vecLib](https://developer.apple.com/documentation/accelerate/veclib) framework for BLAS computing. values can be `ON/OFF`, `OFF` by default. -- `IOS_DEVELOPMENT_ROOT`, the path to `Developer` directory, can be explicitly set with your `/path/to/platform/Developer`. If left blank, PaddlePaddle will automatically pick the Xcode corresponding `platform`'s `Developer` directory based on your `IOS_PLATFORM` value. -- `IOS_SDK_ROOT`, the path to `SDK` root, can be explicitly set with your `/path/to/platform/Developer/SDKs/SDK`. if left black, PaddlePaddle will pick the latest SDK in the directory of `IOS_DEVELOPMENT_ROOT`. - -other settings: - -- `USE_EIGEN_FOR_BLAS`, whether to use Eigen for matrix computing. effective when `IOS_USE_VECLIB_FOR_BLAS=OFF`. Values can be `ON/OFF`, `OFF` by default. -- `HOST_C/CXX_COMPILER`, host C/C++ compiler. Uses value from environment variable `CC/CXX` by default or `cc/c++` if `CC/CXX` doesn't exist. - -some typical cmake configurations: - -```bash -cmake -DCMAKE_SYSTEM_NAME=iOS \ - -DIOS_PLATFORM=OS \ - -DIOS_ARCH="armv7;arm64" \ - -DIOS_ENABLE_BITCODE=ON \ - -DIOS_USE_VECLIB_FOR_BLAS=ON \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_C_API=ON \ - -DWITH_TESTING=OFF \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -```bash -cmake -DCMAKE_SYSTEM_NAME=iOS \ - -DIOS_PLATFORM=SIMULATOR \ - -DIOS_ARCH="x86_64" \ - -DIOS_USE_VECLIB_FOR_BLAS=ON \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_C_API=ON \ - -DWITH_TESTING=OFF \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -You can set other compiling parameters for your own need. I.E. if you are trying to minimize the library size, set `CMAKE_BUILD_TYPE` with `MinSizeRel`; or if the performance is your concern, set `CMAKE_BUILD_TYPE` with `Release`. You can even manipulate the PaddlePaddle compiling procedure by manually set `CMAKE_C/CXX_FLAGS` values. - -**TIPS for a better performance**: - -- set `CMAKE_BUILD_TYPE` with `Release` -- set `IOS_USE_VECLIB_FOR_BLAS` with `ON` - -## Build and install - -After CMake, run following commands, PaddlePaddle will download the compile 3rd party dependencies, compile and install PaddlePaddle inference library. - -``` -$ make -$ make install -``` - -Please Note: if you compiled PaddlePaddle in the source directory for other platforms, do remove `third_party` and `build` directory within the source with `rm -rf` to ensure that all the 3rd party libraries dependencies and PaddlePaddle is newly compiled with current CMake configuration. - -`your/path/to/install` directory will have following directories after `make install`: - -- `include`, contains all the C-API header files. -- `lib`, contains PaddlePaddle C-API static library. -- `third_party` contains all the 3rd party libraries. - -Please note: if PaddlePaddle library need to support both physical devices and simulators, you will need to compile correspondingly, then merge fat library with `lipo`. - -Now you will have PaddlePaddle library compiled and installed, the fat library can be used in deep learning related iOS APPs. Please refer to C-API documentation for usage guides. diff --git a/develop/doc/_sources/mobile/cross_compiling_for_raspberry_en.md.txt b/develop/doc/_sources/mobile/cross_compiling_for_raspberry_en.md.txt deleted file mode 100644 index 3c1a5950ff9553bb725d5a96e3fdf2e5e9f6f95c..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/mobile/cross_compiling_for_raspberry_en.md.txt +++ /dev/null @@ -1,62 +0,0 @@ -# Build PaddlePaddle for Raspberry Pi - -You may use any of the following two approaches to build the inference library of PaddlePaddle for Raspberry Pi: - -1. Build using SSH: Log in to a Raspberry Pi using SSH and build the library. The required development tools and third-party dependencies are listed in here: [`/Dockerfile`](https://github.com/PaddlePaddle/Paddle/blob/develop/Dockerfile). - -1. Cross-compile: We talk about how to cross-compile PaddlePaddle for Raspberry Pi on a Linux/x64 machine, in more detail in this article. - -## The Cross-Compiling Toolchain - -Step 1. Clone the Github repo by running the following command. - -```bash -git clone https://github.com/raspberrypi/tools.git -``` - -Step 2. Use the pre-built cross-compiler found in `./tools/tree/master/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64`. To run it on a Linux computer, glibc version >= 2.14 is needed. - -## CMake Arguments - -CMake supports [cross-compiling](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling). All CMake configuration arguments required for the cross-compilation for Raspberry Pi can be found in [`cmake/cross_compiling/raspberry_pi.cmake`](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/raspberry_pi.cmake). - -Some important arguments that need to be set: - -- `CMAKE_SYSTEM_NAME`: The target platform. Must be `RPi`. - -- `RPI_TOOLCHAIN`: The absolute path of the cross-compiling toolchain. - -- `RPI_ARM_NEON`: Use ARM NEON Intrinsics. This is a required argument and set default to `ON`. - -- `HOST_C/CXX_COMPILER`: The C/C++ compiler for the host. It is used to build building tools running on the host, for example, protoc. - -A commonly-used CMake configuration is as follows: - -``` -cmake -DCMAKE_SYSTEM_NAME=RPi \ - -DRPI_TOOLCHAIN=your/path/to/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64 \ - -DRPI_ARM_NEON=ON \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_GPU=OFF \ - -DWITH_C_API=ON \ - -DWITH_PYTHON=OFF \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -To build the inference library, please set the argument WITH\_C\_API to ON: `WITH_C_API=ON`. - -You can add more arguments. For example, to minimize the size of the generated inference library, you may use `CMAKE_BUILD_TYPE=MinSizeRel`. For performance optimization, you may use `CMAKE_BUILD_TYPE=Release`. - -## Build and Install - -The following commands build the inference library of PaddlePaddle for Raspberry Pi and third-party dependencies. - -```bash -make -make install -``` - - The intermediate files will be stored in `build`. Third-party libraries will be located in `build/third_party`. If you have already built it for other platforms like Android or iOS, you may want to clear these directories by running the command: `rm -rf build`. - -The infernece library will be in `your/path/to/install/lib`, with related header files in `your/path/to/install/include`. diff --git a/develop/doc/_sources/survey/cluster_bootstrapping_tools.md.txt b/develop/doc/_sources/survey/cluster_bootstrapping_tools.md.txt deleted file mode 100644 index 1cd9962700bb49866f1ed6987abc28b27888a23f..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/survey/cluster_bootstrapping_tools.md.txt +++ /dev/null @@ -1,71 +0,0 @@ -# Cluster bootstrapping tool survey -## Abstract -In order to bring up a cluster from bare metal machine to a fully functional kubernetes cluster for Paddlepaddle to run, we need to utilize some tools. Here we are going to compare [Sextant](https://github.com/k8sp/sextant) and [Tectonic installer](https://github.com/coreos/tectonic-installer) - -## Basic assumptions -Here are some basic assumptions before we move on to details -1. You are an administrator of a bare metal machine cluster, which means: - * you have full control to each of the machines. - * you have full control to the network which machines are connected to. -2. Machines can be booted from network with PEX or iPXE -3. You understand the [general procedure to bring up a cluster](#appendix-general-procedure-to-bring-up-a-cluster) - -if your cluster is able to mark above items with checkmarks, then keep reading. - -## Comparing Sextant and Tectonic installer -### Sextant -Sextant is an end2end solution to bring up a bare metal cluster to a fully functional k8s cluster, it integrates DHCP, name service, PEX, cloud-config-service, docker registry services altogether. - -#### Pros -1. End2End: basically all admin need to do is to config the cluster.yaml and power on the cluster. -2. Offline cluster configuration: Sextant has 2 phases during working with it, config time and deploy time. when admin is configuring, it requires admin's machine has internet connectivity, which will download some images, etc. But in deploy time, it's completely OK to go offline since all dependencies are ready during config time. -3. docker registry integrated. -4. GPU machine took care of. - -### Cons -1. k8s API server is not deployed with high availability in considering by default. -2. No grouping support. -3. No API interface, a one-off service. - - -### Tectonic installer -First of all, Tectonic is not free, it requires coreos.com account as a step of installation, and free user can only create less than 10 nodes. - -Tectonic is a suite of software which wraps around k8s and providing more utility regarding dev ops, ie, -Tectonic installer as it's named, it installs Tectonic to a bare metal cluster which means it's not totally an equivalent of Sextant. At the "booting a cluster" part, it mostly utilizes [Matchbox](https://github.com/coreos/matchbox), which is a general cluster bootstrapper. - -Matchbox's Approach is similar to Sexstant. - -### Pros -1. supports grouping machines. -2. supports running provisioning service in rtk. (not a big deal though). -3. supports http/gRPC API interface. -4. supports multi-template. - -### Cons -1. Not an e2e solution to bring up a cluster, need a lot of extra work and other software. -2. [Not fully supporting](https://github.com/coreos/matchbox/issues/550) centOS deployment yet. - -## Conclusion -Sextant is a better solution overall for paddle cloud deploying to a bare metal cluster. It would be great if Sextant can also 1) deploy k8s api server with high availability by default; 2) not designed as a one-off service. - - - -## Appendix: General procedure to bring up a cluster -It's physically impossible for a cluster admin to manually install OS and applications into cluster nodes one by one, here is what an admin would do in cloud industry: -1. setup a bootstrap machine with static IP in the cluster, which has following services: - * DHCP: assigns ip address for rest of the nodes. - * name service: to map node name to a IP - * PXE related services: the booting related info will be delivered to newly booted machines as their IP is assigned via DHCP service, PXE service will provide further booting and installing info and image with TFTP and http protocol. - * cluster config service: this is for providing cluster node with OS config via http - * optional docker registry: a built-in docker registry makes the whole cluster independent from connecting internet, and speeds up software distribution. -2. New node powers on, it will - * broadcast the request for an IP address - * DHCP server assigns the IP address, and deliver the PXE booting related info to the node. - * cluster node will request config files with booting info delivered with DHCP via the TFTP service, and in most of the cases, the config file will point to a http service for the booting image. - * Since PXE is configured with initrd, it will utilize the cloud config service and do further installations like coreOS or K8s installations. - * then restart the node. - -For further understanding, following 2 links from Matchbox are some good readings: -* [Machine lifecycle](https://github.com/coreos/matchbox/blob/master/Documentation/machine-lifecycle.md) -* [PXE booting](https://github.com/coreos/matchbox/blob/master/Documentation/network-booting.md) diff --git a/develop/doc/build_and_install/build_from_source_en.html b/develop/doc/build_and_install/build_from_source_en.html index 0b204e71fa52248ef4f30fa895faa12385a30b13..b5205b56252fe75ec1e6f07f7cb997bccb4b2503 100644 --- a/develop/doc/build_and_install/build_from_source_en.html +++ b/develop/doc/build_and_install/build_from_source_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || []; +
  • C-API Prediction Library +
  • RNN Models
    • RNN Configuration
    • Recurrent Group Tutorial
    • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
    • Development
    • FAQ
        diff --git a/develop/doc/build_and_install/docker_install_en.html b/develop/doc/build_and_install/docker_install_en.html index 9b671607407a895c500e12b8aa0449fc45fa20b2..bdb1c639082007f360904af960397189de39fa56 100644 --- a/develop/doc/build_and_install/docker_install_en.html +++ b/develop/doc/build_and_install/docker_install_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || [];
    • +
    • C-API Prediction Library +
    • RNN Models
      • RNN Configuration
      • Recurrent Group Tutorial
      • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
      • Development
      • FAQ
          diff --git a/develop/doc/build_and_install/index_en.html b/develop/doc/build_and_install/index_en.html index 100f92196231525bf57a8a01450e32d1cca8ce1a..cb8a791304f237be5d833ef727b5b4f92a9e902c 100644 --- a/develop/doc/build_and_install/index_en.html +++ b/develop/doc/build_and_install/index_en.html @@ -123,6 +123,12 @@ var _hmt = _hmt || [];
      • +
      • C-API Prediction Library +
      • RNN Models
        • RNN Configuration
        • Recurrent Group Tutorial
        • @@ -136,6 +142,7 @@ var _hmt = _hmt || [];
        • Development
        • FAQ
            diff --git a/develop/doc/build_and_install/pip_install_en.html b/develop/doc/build_and_install/pip_install_en.html index d46ee40b08e489f880a91715a1580fb6506b6cca..d2c6ea45ee02a8ec51c114d757d0ff883915794c 100644 --- a/develop/doc/build_and_install/pip_install_en.html +++ b/develop/doc/build_and_install/pip_install_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || [];
        • +
        • C-API Prediction Library +
        • RNN Models
          • RNN Configuration
          • Recurrent Group Tutorial
          • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
          • Development
          • FAQ
              diff --git a/develop/doc/design/api.html b/develop/doc/design/api.html deleted file mode 100644 index ce7ec999cae4a3086315234b267b40420d6039c9..0000000000000000000000000000000000000000 --- a/develop/doc/design/api.html +++ /dev/null @@ -1,487 +0,0 @@ - - - - - - - - - - - - - PaddlePaddle Design Doc — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              PaddlePaddle Design Doc

              -
              -

              Ingredients

              -

              As our design principle is starting from the essence: how could we -allow users to express and solve their problems as neural networks. -Some essential concepts that our API have to provide include:

              -
                -
              1. A topology is an expression of layers.
              2. -
              3. A layer could be any kind of computation, including cost.
              4. -
              5. Some layers have parameters, some don’t. Most costs don’t have -parameters.
              6. -
              7. In some topologies, layers share parameters. For -example, -the network for training a ranking model.
              8. -
              9. At programming time, users specify topologies and possible sharing -of parameters. PaddlePaddle can figure out and create parameters -required (and possibly shared) by one or more topologies.
              10. -
              -
              -
              -

              Starting from Examples

              -

              As a summarization -of -our disucssion, -let us present two examples here:

              -
              -

              Example 1. Sharing Parameters between Layers

              -

              We use -the -3-branch ranking model -in this example. For your convenience, I copy-a-paste the model’s -topology as follows:

              -
              A -> f -\
              -Q -> f --> cost
              -B -> f -/
              -
              -
              -

              The following program trains the topology including the cost, and then -use the sub-network in the trained topology in inference:

              -
              def f(in):
              -    e = paddle.layer.embedding(in, parameter_name="embedding")
              -    o = paddle.layer.softmax(e, parameter_name="semantic")
              -    return o
              -
              -# Create 3 topologies (subnets), they share parameters because all
              -# correspoinding layers have the same parameter names.
              -fA = f(paddle.layer.data(input_name="A"))
              -fB = f(paddle.layer.data(input_name="B"))
              -fQ = f(paddle.layer.data(input_name="Q"))
              -
              -topology = paddle.layer.less_than(
              -               paddle.layer.cross_entropy(fA, fQ),
              -               paddle.layer.corss_entropy(fB, fQ))
              -
              -# Derive parameters required in topology and create them in model.
              -parameters = paddle.parameters.create(topology)
              -
              -# Estimate parameters used in topology from data.
              -paddle.train(topology, parameters, reader=read_ranking_model_data)
              -
              -# Inference using fA (or fB or fC, as they share their parameters).
              -[testA, testB, testQ] = read_ranking_model_data()
              -print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
              -
              -
              -
              -
              -

              Example 2. Sharing Parameters between “Models”

              -

              We use GAN in -this example. In the following example program, d0 and d1 -correspond to the two networks in the following figure:

              -

              -
              def G(in):
              -    # over-simplified example as G has only one layers:
              -    return paddle.layer.fc(in, parameter_name="G")
              -
              -def D(in);
              -    # again, over-simplified:
              -    return paddle.layer.fc(in, parameter_name="D")
              -
              -# Construct the first topology, which contains both D and G.
              -# By learning this topology, we update parameters of G.
              -d0 = paddle.layer.should_be_false(D(G(paddle.layer.data())))
              -
              -# Construct a second topology d1, which contains only D. By
              -# training this topology, we update parameters of D.  Note
              -# that d1 share parameters with d0.
              -d1 = paddle.layer.should_be_true(D(paddle.layer.data()))
              -
              -# Create parameters from a list of multiple topologies (models) for
              -# the chance to share parameters between these topologies.
              -parameters = paddle.parameters.create([d0, d1])
              -
              -# Iterative training of GAN.
              -for ...:
              -    train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"})
              -    train(d1, parameters, reader=read_from_realistic_images)
              -
              -# Use d1 for inference:
              -print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
              -
              -
              -
              -
              -

              Summarization

              -

              Above two programs reveal some important design concerns:

              -
                -
              1. Users describe a topology as an expression of layers. Every layer -has a parameter name. If the users don’t specify it explicitly, it’s automatically generated as a unique name. By -specifying the parameter name, users can specify the sharing of -parameters between layers and even between topologies.
              2. -
              3. paddle.parameters.create figures out parameters required by one -or more topologies from parameter names of layers. It creates these -parameters and returns a ParameterSet object, which is in essence -a map from parameter names to parameters.
              4. -
              5. At training and inference time, paddle.train and paddle.infer -requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons:
                  -
                1. This prevents users from forgetting to call -paddle.parameters.create.
                2. -
                3. paddle.train needs to know which parameter set to update.
                4. -
                5. Users could load another (pre-trained) parameter set and use it -with a topology in train.infer.
                6. -
                -
              6. -
              7. By specifying the immutable_parameters parameter of -paddle.train, we can forbid the update of these parameters.
              8. -
              -
              -
              -
              -

              Reader

              -

              Not all programming frameworks allow users to define I/O functions. -An example is Google MapReduce, which can only read from text, -SSTable, and RecordIO files. Hadoop MapReduce allows users to define -readers and writers by deriving from base classes Reader and -Writer. The former is less flexible but also less error-prone. We -decide to provide the flexibility to users to define their readers.

              -

              There are some open questions here:

              -
                -
              1. Should a reader return a Python dictionary?
              2. -
              3. How to map multiple outputs from a reader to multiple data layers?
              4. -
              5. How to easily compose some existing readers to read more data and -feed a topology with more data layers?
              6. -
              -
              -
              -

              Training

              -

              The recommended way to training a model is to call paddle.train, -which simply calls paddle.trainer.Default, a global variable of -type paddle.trainer.SGD. Equivalently, we can do

              -
              opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))
              -opt.train(topology, parameters, reader=read, ...)
              -
              -
              -
              -

              Updater

              -

              Please be aware that a trainer can accept an updater as its data -member, where an updater is a class derived from -paddle.trainer.Updater. This is to make it easier to customize -trainers, as discussed -here.

              -
              -
              -

              Event Handler

              -

              paddle.train and paddle.trainer.XXX.train take an optional -parameter event_handler, which should be either None or a function -that handle some events:

              -
                -
              1. BeginTraining
              2. -
              3. EndTraining
              4. -
              5. BeginIteration
              6. -
              7. EndIteration
              8. -
              9. BeginPass
              10. -
              11. EndPass
              12. -
              -

              where EndPass is sent if and only if the reader yields -end_pass=True.

              -

              An example as follows:

              -
              def event_handler(event):
              -    if ininstance(event, paddle.event.EndIteration):
              -        print paddle.test(...)
              -
              -paddle.train(topology, parameters, reader, event_handler)
              -
              -
              -

              If we are writing a PaddlePaddle program in and for iPython/Jypyter, -we can use metaplotlib in the event handler to plot a curve of -cost/error versus iterations, as shown -here.

              -
              -
              -

              Distributed Training

              -

              If users want to do distributed training on a cluster, s/he should -call paddle.dist_train and provides access tokens to the cluster as -a parameter.

              -

              For example, if the user has a TLS certificate that allows him to -access a Kubernetes cluster, s/he should be able to call

              -
              paddle.dist_train(model,
              -                  trainer=paddle.trainer.SGD(...,
              -                                             paddle.updater.Adam(...)),
              -                  reader=read,
              -                  k8s_user="yi",
              -                  k8s_token="kube_cluster_tls.pem",
              -                  k8s_job="hello",
              -                  num_parameter_servers=15)
              -
              -
              -

              The pseudo code of paddle.dist_train is as follows:

              -
              def dist_train(topology, parameters, trainer, reader, ...):
              -    if os.getenv("KUBERNETES_SERVICE_HOST") == None:
              -        image_name = k8s_user + '/' + k8s_job
              -        docker_build(image_name)
              -        docker_push()
              -        kube_ctrl_start_job(image_name, k8s_user, k8s_token)
              -    else:
              -        rank = kube_list_containers_in_job_and_return_current_containers_rank()
              -        if rank == 0:
              -            master()
              -        elif rank < 15:
              -            parameter_server()
              -        else:
              -            trainer.train(model, reader=read)
              -
              -
              -

              Please be aware that if a process is running on the Kubernetes -cluster, it will have some environment variables pre-defined.

              -

              If dist_train doesn’t see these environment variables, it knows -that it’s running on users’ personal computer, and it should work as a -launcher. Otherwise, it knows that it’s running on the cluster and -need to figure out its role as either the master, or a trainer, or a -parameter server.

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/auto_gradient_check.html b/develop/doc/design/auto_gradient_check.html deleted file mode 100644 index ed0cc19b019fdcd7e3df2ada3d9f3e685472cf23..0000000000000000000000000000000000000000 --- a/develop/doc/design/auto_gradient_check.html +++ /dev/null @@ -1,423 +0,0 @@ - - - - - - - - - - - - - Auto Gradient Check Design — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Auto Gradient Check Design

              -
              -
              -

              Background:

              -
                -
              • Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right because of the following challenges:
                  -
                1. The formula for backpropagation formula should be correct according to the forward computation.
                2. -
                3. The Implementation of the above shoule be correct in CPP.
                4. -
                5. It is difficult to prepare an unbiased test data.
                6. -
                -
              • -
              • Auto gradient checking gets a numerical gradient using forward Operator and uses it as a reference for the backward Operator’s result. It has several advantages:
                  -
                1. Numerical gradient checker only needs the forward operator.
                2. -
                3. The user only needs to prepare the input data for forward Operator and not worry about the backward Operator.
                4. -
                -
              • -
              -
              -
              -

              Mathematical Theory

              -

              The following documents from Stanford have a detailed explanation of how to compute the numerical gradient and why it is useful.

              - -
              -
              -

              Numerical Gradient Implementation

              -
              -

              Python Interface

              -
              def get_numerical_gradient(op,
              -                         input_values,
              -                         output_name,
              -                         input_to_check,
              -                         delta=0.005,
              -                         local_scope=None):
              -    """
              -    Get Numerical Gradient for the input of an operator.
              -
              -    :param op: C++ operator instance, could be an network.
              -    :param input_values: The input variables. Should be an dictionary, whose key is
              -    variable name, and value is a numpy array.
              -    :param output_name: The final output variable name.
              -    :param input_to_check: The input variable with respect to which the gradient has to be computed.
              -    :param delta: The perturbation value for numerical gradient method. The
              -    smaller the delta, the more accurate the result. But if the delta is too
              -    small, it will suffer from the numerical stability problem.
              -    :param local_scope: The local scope used for get_numeric_gradient.
              -    :return: The gradient array in numpy format.
              -    """
              -
              -
              -
              -
              -

              Explanation:

              -
                -
              • Why do we need an output_name
                  -
                • An Operator may have multiple Outputs, one can compute an independent gradient from each Output. So the caller should specify the name of the output variable.
                • -
                -
              • -
              • Why do we need input_to_check
                  -
                • One operator can have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numerical Gradient needs to calculate them one by one. So get_numeric_gradient is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call get_numeric_gradient multiple times each with a different input.
                • -
                -
              • -
              -
              -
              -

              Core Algorithm Implementation

              -
                  # we only compute the gradient of one element a time.
              -    # we use a for loop to compute the gradient of each element.
              -    for i in xrange(tensor_size):
              -        # get one input element using the index i.
              -        original = tensor_to_check.get_float_element(i)
              -
              -        # add delta to it, run the forward op and then
              -        # get the new value of the result tensor.
              -        x_pos = original + delta
              -        tensor_to_check.set_float_element(i, x_pos)
              -        y_pos = get_output()
              -
              -        # Subtract delta from this element, run the op again
              -        # and get the new value of the result tensor.
              -        x_neg = original - delta
              -        tensor_to_check.set_float_element(i, x_neg)
              -        y_neg = get_output()
              -
              -        # restore old value
              -        tensor_to_check.set_float_element(i, original)
              -
              -        # compute the gradient of this element and store
              -        # it into a numpy array.
              -        gradient_flat[i] = (y_pos - y_neg) / delta / 2
              -
              -    # reshape the gradient result to the shape of the source tensor.
              -    return gradient_flat.reshape(tensor_to_check.get_dims())
              -
              -
              -
              -
              -
              -

              Auto Gradient Check Framework

              -

              Each Operator Kernel has three kinds of Gradient:

              -
                -
              1. Numerical gradient
              2. -
              3. CPU kernel gradient
              4. -
              5. GPU kernel gradient (if supported by the device)
              6. -
              -

              The numerical gradient only relies on the forward Operator, so we use the numerical gradient as the reference value. The gradient checking is performed in the following three steps:

              -
                -
              1. Calculate the numerical gradient
              2. -
              3. Calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient.
              4. -
              5. Calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient. (if supported)
              6. -
              -
              -

              Python Interface

              -
                  def check_grad(self,
              -                   forward_op,
              -                   input_vars,
              -                   inputs_to_check,
              -                   output_name,
              -                   no_grad_set=None,
              -                   only_cpu=False,
              -                   max_relative_error=0.005):
              -        """
              -        :param forward_op: used to create backward_op
              -        :param input_vars: numpy value of input variable. The following
              -          computation will use these variables.
              -        :param inputs_to_check: the input variable with respect to which the
              -          gradient will be computed.
              -        :param output_name: The final output variable name.
              -        :param max_relative_error: The relative tolerance parameter.
              -        :param no_grad_set: used to create backward ops
              -        :param only_cpu: only compute and check gradient on cpu kernel.
              -        :return:
              -        """
              -
              -
              -
              -
              -

              How to check if two numpy arrays are close enough?

              -

              if abs_numerical_grad is nearly zero, then use absolute error for numerical_grad.

              -
              numerical_grad = ...
              -operator_grad = numpy.array(scope.find_var(grad_var_name(name)).get_tensor())
              -
              -abs_numerical_grad = numpy.abs(numerical_grad)
              -# if abs_numerical_grad is nearly zero, then use abs error for
              -# numeric_grad, instead of relative error.
              -abs_numerical_grad[abs_numerical_grad < 1e-3] = 1
              -
              -diff_mat = numpy.abs(abs_numerical_grad - operator_grad) / abs_numerical_grad
              -max_diff = numpy.max(diff_mat)
              -
              -
              -
              -

              Notes:

              -

              The Input data for auto gradient checker should be reasonable to avoid numerical stability problem.

              -
              - -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/backward.html b/develop/doc/design/backward.html deleted file mode 100644 index 94debbec11505eac6a2a50a56d0eb1345e22322c..0000000000000000000000000000000000000000 --- a/develop/doc/design/backward.html +++ /dev/null @@ -1,394 +0,0 @@ - - - - - - - - - - - - - Backward Building — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Backward Building

              -
              -

              Motivation

              -

              In Neural Network, most models are solved by the backpropagation algorithm(known as BP) at present. Technically, BP calculates the gradient of the loss function, then propagates it back through the networks following the chain rule. However, when configuring the model structure, users do not need to define the backward part. So a mechanism is required by the framework which can complete the model’s backward part automatically according to the given forward part.

              -

              When implementing a specific op, the developer is also asked to implement its backward version, called grad_op. A grad_op takes gradients of its corresponding op‘s outputs, and calculate gradients of the op‘s inputs. During the building of a model’s backward part, the framework creates each forward op‘s grad_op, and then string them together in reverse order of forwarding part. In this way, gradients spread from the end to the beginning of the model, in another word, from the loss to parameters.

              -
              -
              -

              Challenges

              -

              The motivation of backward building is apparent. However, implementation it correctly is not so easy. In the Fluid design, a deep learning model is described by Program, Block, Op and Variable. The Block itself can be nested. It means that the ops and variables are scattered across different blocks rather than all be gathered in a single graph. Our backward building algorithm shall visit blocks in recursive order and be able to insert grad_ops and new created variables into the right place.

              -
              -
              -

              Usage

              -

              Although the whole algorithm is comprised of many functions, only one is exposed as API:

              -
              def append_backward(loss, parameter_list=None, no_grad_set=None):
              -    """
              -    Append backward part to main_program
              -
              -    Args:
              -        loss(Variable): The variable generated by the cost function.
              -        parameter_list(list): Parameters that need to be updated by optimizers.
              -            If None, it means all parameters need to be updated.
              -
              -        no_grad_set(set): Variables that have no gradients in Block 0. 
              -            If None, the set will be generated inside the function and 
              -            contains all variables with `step_gradient=True` from all blocks.
              -        
              -    Return:
              -        (list[Variable]): list of (parameters, gradients) pair.
              -    """
              -
              -
              -

              By invoking this API, the framework appends backward part of the program where the loss is. It takes three arguments. loss means the final loss value. It must be a scalar and is usually the output of the loss layer. It is also where the gradient generated and backpropagation starts. parameter_list marks all parameters needs updating. If it’s None, all parameter will be updated by optimizers. no_grad_set marks variables without gradient. if all outputs of some grad_op are in no_grad_set, the grad_op will not be run.

              -

              This API will be invoked automatically before optimizer building. -As a result, in most cases, users do not need to invoke the API by themselves to append backward part.

              -
              -
              -

              Implementation

              -

              The implementation of backward building algorithm is in backward.py file. The whole algorithm can be divided into two independent parts: creating grad_ops and creating new variables.

              -
              -

              Creating grad_ops

              -

              The creating of grad_ops is implemented by:

              -
              def _append_backward_ops_(target,
              -                          block,
              -                          target_block,
              -                          no_grad_dict,
              -                          grad_to_var):
              -    """
              -    Create all grad ops, and insert them into given block
              -
              -    Args:
              -        target(Variable): the target variable of forward pass
              -        block(Block): the block where forward ops are
              -        target_block(Block): the block which is going to hold new generated grad ops
              -        no_grad_dict(dict): 
              -            key(int)  block index
              -            val(set) a set of varibale names. These varibales have no gradient
              -        grad_to_var(dict)(output argument):
              -            key(str): grad variable name
              -            val(str): corresponding forward variable name
              -    """
              -
              -
              -

              Given a block, the function will traverses all ops in this block in reverse order, gets corresponding grad_op from the C++ core via core.get_grad_op_desc(), then append it to target_block.

              -

              However, some specific op(e.g. while_op, if_else_op) can hold its own sub-block. For these sub-blocks contains ops as well, the grad_op creating should be recursive.

              -

              During the reverse traversal, we check each op whether it has an attribute named sub_block. If so, it means there is a sub-block and we need to deal with it first. After creating a new block whose father is the one in op‘s attribute, we invoke _append_backward_ops_() recursively, assigning the new block to parameter target_block and the one in op‘s attribute to block. The pseudo-code shows this process:

              -
              ******* pseudo-code ********
              -for op in reversed(block.ops):
              -    if op has an attribute named 'sub_block':
              -        Get the sub-block(`s_block`) from op's attribute.
              -        Create a new block(`grad_s_block`), whose father is `s_block`.
              -        Invoke _append_backward_ops_(), with `block=s_block` and `target_block=grad_s_block`
              -    
              -    Invoke `core.get_grad_op_desc()` to get op's grad_op.
              -    Insert name correspondings between variables and their gradients of the grad_op to grad_to_var
              -    Assign grad_s_block to grad_op as it's 'sub_block' attribute.
              -    Append grad_op to current target_block.
              -
              -
              -

              The first invoking of _append_backward_ops_() is initiated by append_backward(), in which parameters block and target_block are all assigned with root block(the block with index 0).

              -
              -
              -

              Corner Cases of grad_op Creating

              -

              In the previous section, we show the regular process of grad_op creating. However, in some corner cases, the conventional algorithm is not enough to get the correct result and appending handling is required. These additional processes run after the algorithm mentioned above and do some special adjusts on its output grad_ops.

              -
              -

              Shared Variables

              -

              If a variable is read by more than one op in the forward pass, its gradient is likely to be written by more than one grad_ops in the next backward pass. To make the gradient result being the sum of all grad_ops’ outputs instead of the last running one, we assign each output with a temporary variable and then add a sum_op to add them up.

              -

              For the debug convenience, if the final gradient name is w@GRAD, it’s corresponding temporary variables will be named as w@GRAD@RENAME@0, w@GRAD@RENAME@1...

              -

              See function _addup_repetitive_outputs_ in backward.py for implementation details.

              -
              -
              -

              No Gradient Variables

              -

              In our framework, variables can be marked as no_gradient, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some grad_op are marked as no_gradient, the grad_op itself can be skipped in backward pass.

              -

              Another situation is all the gradient inputs of some grad_op are marked as no_gradient, which means all of them can be considered as zeros. For grad_ops are in essence the propagation of gradients, all the outputs are definitely zeros when all gradient inputs are zeros. Therefore the grad_op can also be skipped.

              -

              It should be noted that all these zero gradients still need to be creating and initialized by something, otherwise following grad_ops who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ fill_zeros_like_op to initialize them as all zeros.

              -

              This features are implemented in function _remove_no_grad_branch_. It checks new created grad_ops one-by-one, removes who can be skipped and inserts fill_zeros_like_op when its necessary. We can get the no_grad_set from the _append_backward_ops_ argument no_grad_dict or generate it on the fly by scanning all variables’ no_gradient attribute(True or False).

              -
              -
              -
              -

              Creating Backward Variables

              -

              Up to now, we have completed all creating and adjusting jobs of grad_ops. However, backward variables have not been created. Now they are only represented by grad_op‘s input and output arguments. The backward variable creating job will be done by:

              -
              def _append_backward_vars_(block, 
              -                           start_op_idx, 
              -                           grad_to_var, 
              -                           grad_info_map):
              -    """
              -    Create new variables required by backward pass.
              -
              -    Args:
              -        block(Block): the block where new variables will be created
              -        start_op_idx(int): Only variables required by ops in block.ops[start_op_idx : ] will be created
              -        grad_to_var(dict):
              -            key(str): grad variable name
              -            val(str): corresponding forward variable name
              -            In most cases, this dict is generated by _append_backward_ops_()
              -        grad_info_map(dict)(output argument):
              -            key(str): forward variable name
              -            val(tuple): a tuple of (str, int), str is the corresponding grad name, int is the block index
              -    """
              -
              -
              -

              Given a block, this function traverses all the grad_ops in it(The argument start_op_idx indicates where the grad_op sequence starts.) and creates all the uncreated outputs. The pseudo-code shows this process:

              -
              for op in block.ops[start_op_idx : ]:
              -
              -    if op has an attribute named 'sub_block':
              -        Get the sub-block(`s_block`) from op's attribute.
              -        Invoke _append_backward_vars_(), with `block=s_block`
              -        
              -    for var_name in op.all_output_names():
              -        if block.has_var_recursive(var_name) or var_name is the name of empty variable:
              -            continue
              -        create a new variable named 'var_name' in block
              -        if grad_to_var.has_key(var_name):
              -            set grad_info_map[grad_to_var[var_name]] as a tuple of (var_name. block)
              -            
              -    do op's var type inference
              -    do op's shape inference
              -
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/block.html b/develop/doc/design/block.html deleted file mode 100644 index cb3f25c7ba4999fb843b6a744129216fabfce638..0000000000000000000000000000000000000000 --- a/develop/doc/design/block.html +++ /dev/null @@ -1,559 +0,0 @@ - - - - - - - - - - - - - Design Doc: Block and Scope — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Block and Scope

              -
              -

              The Representation of Computation

              -

              Both deep learning systems and programming languages help users describe computation procedures. These systems use various representations of computation:

              -
                -
              • Caffe, Torch, and Paddle: sequences of layers.
              • -
              • TensorFlow, Caffe2, Mxnet: graph of operators.
              • -
              • PaddlePaddle: nested blocks, like C++ and Java programs.
              • -
              -
              -
              -

              Block in Programming Languages and Deep Learning

              -

              In programming languages, a block is a pair of curly braces that includes local variables definitions and a sequence of instructions or operators.

              -

              Blocks work with control flow structures like if, else, and for, which have equivalents in deep learning:

              -

              | programming languages | PaddlePaddle | -|———————–|———————–| -| for, while loop | RNN, WhileOp | -| if, if-else, switch | IfElseOp, SwitchOp | -| sequential execution | a sequence of layers |

              -

              A key difference is that a C++ program describes a one pass computation, whereas a deep learning program describes both the forward and backward passes.

              -
              -
              -

              Stack Frames and the Scope Hierarchy

              -

              The existence of the backward pass makes the execution of a block of PaddlePaddle different from traditional programs:

              -

              | programming languages | PaddlePaddle | -|———————–|———————————| -| stack | scope hierarchy | -| stack frame | scope | -| push at entering block| push at entering block | -| pop at leaving block | destroy when minibatch completes|

              -
                -
              1. In traditional programs:
                  -
                • When the execution enters the left curly brace of a block, the runtime pushes a frame into the stack, where it realizes local variables.
                • -
                • After the execution leaves the right curly brace, the runtime pops the frame.
                • -
                • The maximum number of frames in the stack is the maximum depth of nested blocks.
                • -
                -
              2. -
              3. In PaddlePaddle
                  -
                • When the execution enters a block, PaddlePaddle adds a new scope, where it realizes variables.
                • -
                • PaddlePaddle doesn’t pop a scope after the execution of the block because variables therein are used by the backward pass. So it has a stack forest known as a scope hierarchy.
                • -
                • The height of the highest tree is the maximum depth of nested blocks.
                • -
                • After the processing of a minibatch, PaddlePaddle destroys the scope hierarchy.
                • -
                -
              4. -
              -
              -
              -

              Use Blocks in C++ and PaddlePaddle Programs

              -

              Let us consolidate the discussion by presenting some examples.

              -
              -

              Blocks with if-else and IfElseOp

              -

              The following C++ programs shows how blocks are used with the if-else structure:

              -
              namespace pd = paddle;
              -
              -int x = 10;
              -int y = 1;
              -int z = 10;
              -bool cond = false;
              -int o1, o2;
              -if (cond) {
              -  int z = x + y;
              -  o1 = z;
              -  o2 = pd::layer::softmax(z);
              -} else {
              -  int d = pd::layer::fc(z);
              -  o1 = d;
              -  o2 = d+1;
              -}
              -
              -
              -

              An equivalent PaddlePaddle program from the design doc of the IfElseOp operator is as follows:

              -
              import paddle as pd
              -
              -x = minibatch([10, 20, 30]) # shape=[None, 1]
              -y = var(1) # shape=[1], value=1
              -z = minibatch([10, 20, 30]) # shape=[None, 1]
              -cond = larger_than(x, 15) # [false, true, true]
              -
              -ie = pd.ifelse()
              -with ie.true_block():
              -    d = pd.layer.add_scalar(x, y)
              -    ie.output(d, pd.layer.softmax(d))
              -with ie.false_block():
              -    d = pd.layer.fc(z)
              -    ie.output(d, d+1)
              -o1, o2 = ie(cond)
              -
              -
              -

              In both examples, the left branch computes x+y and softmax(x+y), the right branch computes fc(x) and x+1 .

              -

              The difference is that variables in the C++ program contain scalar values, whereas those in the PaddlePaddle programs are mini-batches of instances.

              -
              -
              -

              Blocks with for and RNNOp

              -

              The following RNN model in PaddlePaddle from the RNN design doc :

              -
              x = sequence([10, 20, 30]) # shape=[None, 1]
              -m = var(0) # shape=[1]
              -W = var(0.314, param=true) # shape=[1]
              -U = var(0.375, param=true) # shape=[1]
              -
              -rnn = pd.rnn()
              -with rnn.step():
              -  h = rnn.memory(init = m)
              -  h_prev = rnn.previous_memory(h)
              -  a = layer.fc(W, x)
              -  b = layer.fc(U, h_prev)  
              -  s = pd.add(a, b)
              -  act = pd.sigmoid(s)
              -  rnn.update_memory(h, act)
              -  rnn.output(a, b)
              -o1, o2 = rnn()
              -
              -
              -

              has its equivalent C++ program as follows

              -
              int* x = {10, 20, 30};
              -int* m = {0};
              -int* W = {0.314};
              -int* U = {0.375};
              -
              -int mem[sizeof(x) / sizeof(x[0]) + 1];
              -int o1[sizeof(x) / sizeof(x[0]) + 1];
              -int o2[sizeof(x) / sizeof(x[0]) + 1];
              -for (int i = 1; i <= sizeof(x)/sizeof(x[0]); ++i) {
              -  int x = x[i-1];
              -  if (i == 1) mem[0] = m;
              -  int a = W * x;
              -  int b = Y * mem[i-1];
              -  int s = fc_out + hidden_out;
              -  int act = sigmoid(sum);
              -  mem[i] = act;
              -  o1[i] = act;
              -  o2[i] = hidden_out;
              -}
              -
              -
              -
              -
              -
              -

              Compilation and Execution

              -

              Like TensorFlow, a PaddlePaddle program is written in Python. The first part describes a neural network as a protobuf message, and the rest executes the message for training or inference.

              -

              The generation of this protobuf message is similar to how a compiler generates a binary executable file. The execution of the message is similar to how the OS executes the binary file.

              -
              -
              -

              The “Binary Executable File Format”

              -

              The definition of the protobuf message is as follows:

              -
              message BlockDesc {
              -  repeated VarDesc vars = 1;
              -  repeated OpDesc ops = 2;
              -}
              -
              -
              -

              The step net in above RNN example would look like

              -
              BlockDesc {
              -  vars = {
              -    VarDesc {...} // x
              -    VarDesc {...} // h
              -    VarDesc {...} // fc_out
              -    VarDesc {...} // hidden_out
              -    VarDesc {...} // sum
              -    VarDesc {...} // act
              -  }
              -  ops = {
              -    OpDesc {...} // matmul
              -    OpDesc {...} // add_two
              -    OpDesc {...} // sigmoid
              -  }
              -};
              -
              -
              -

              Also, the RNN operator in above example is serialized into a protobuf message of type OpDesc and would look like:

              -
              OpDesc {
              -  inputs = {0} // the index of x in vars of BlockDesc above
              -  outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above
              -  attrs {
              -    "states" : {1} // the index of h
              -    "step_net" : <above step net>
              -  }
              -};
              -
              -
              -

              This OpDesc value is in the ops field of the BlockDesc value representing the global block.

              -
              -
              -

              The Compilation of Blocks

              -

              During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator).

              -

              VarDesc in a block should have its name scope to avoid local variables affecting parent block’s name scope. -Child block’s name scopes should inherit the parent’s so that OpDesc in child block can reference a VarDesc that is stored in the parent block. For example:

              -
              a = pd.Variable(shape=[20, 20])
              -b = pd.fc(a, params=["fc.w", "fc.b"])
              -
              -rnn = pd.create_rnn()
              -with rnn.stepnet():
              -    x = a.as_step_input()
              -    # reuse fc's parameter
              -    fc_without_b = pd.get_variable("fc.w")
              -    rnn.output(fc_without_b)
              -
              -out = rnn()
              -
              -
              -

              The method pd.get_variable can help retrieve a Variable by the name. The Variable may be stored in a parent block, but might be retrieved in a child block, so block should have a variable scope that supports inheritance.

              -

              In compiler design, the symbol table is a data structure created and maintained by compilers to store information about the occurrence of various entities such as variable names, function names, classes, etc.

              -

              To store the definition of variables and operators, we define a C++ class SymbolTable, like the one used in compilers.

              -

              SymbolTable can do the following:

              -
                -
              • store the definitions (some names and attributes) of variables and operators,
              • -
              • verify if a variable was declared,
              • -
              • make it possible to implement type checking (offer Protobuf message pointers to InferShape handlers).
              • -
              -
              // Information in SymbolTable is enough to trace the dependency graph. So maybe
              -// the Eval() interface takes a SymbolTable is enough.
              -class SymbolTable {
              - public:
              -  SymbolTable(SymbolTable* parent) : parent_(parent) {}
              -
              -  OpDesc* NewOp(const string& name="");
              -
              -  // TODO determine whether name is generated by python or C++.
              -  // Currently assume that a unique name will be generated by C++ if the
              -  // argument name is left default.
              -  VarDesc* Var(const string& name="");
              -
              -  // find a VarDesc by name, if recursive is true, find parent's SymbolTable
              -  // recursively.
              -  // this interface is introduced to support InferShape, find protobuf messages
              -  // of variables and operators, pass pointers into InferShape.
              -  //
              -  // NOTE maybe some C++ classes such as VarDescBuilder and OpDescBuilder should
              -  // be proposed and embedded into pybind to enable python operation on C++ pointers.
              -  VarDesc* FindVar(const string& name, bool recursive=true);
              -
              -  OpDesc* FindOp(const string& name);
              -
              -  BlockDesc Compile() const;
              -
              - private:
              -  SymbolTable* parent_;
              -
              -  map<string, OpDesc> ops_;
              -  map<string, VarDesc> vars_;
              -};
              -
              -
              -

              After all the description of variables and operators is added into SymbolTable, -the block has enough information to run.

              -

              The Block class takes a BlockDesc as input, and provides Run and InferShape functions.

              -
              namespace {
              -
              -class Block : OperatorBase {
              -public:
              -  Block(const BlockDesc& desc) desc_(desc) {}
              -
              -  void InferShape(const framework::Scope& scope) const override {
              -    if (!symbols_ready_) {
              -      CreateVariables(scope);
              -      CreateOperators();
              -    }
              -    // should run InferShape first.
              -    for (auto& op : runtime_table_.ops()) {
              -      op->InferShape(scope);
              -    }
              -  }
              -
              -  void Run(const framework::Scope& scope,
              -           const platform::Place& place) const override {
              -    PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first.");
              -    for (auto& op : runtime_table_.ops()) {
              -      op->Run(scope, place);
              -    }
              -  }
              -
              -  void CreateVariables(const framework::Scope& scope);
              -  void CreateOperators();
              -
              -  // some other necessary interfaces of NetOp are listed below
              -  // ...
              -
              -private:
              -  BlockDesc desc_;
              -  bool symbols_ready_{false};
              -};
              -
              -
              -
              -
              -

              The Execution of Blocks

              -

              Block inherits from OperatorBase, which has a Run method. -Block’s Run method will run its operators sequentially.

              -

              There is another important interface called Eval, which takes some arguments called targets and generates a minimal graph which treats targets as the end points and creates a new Block. After Run, Eval will get the latest value and return the targets.

              -

              The definition of Eval is as follows:

              -
              // clean a block description by targets using the corresponding dependency graph.
              -// return a new BlockDesc with minimal number of operators.
              -// NOTE: The return type is not a Block but the block's description so that this can be distributed
              -// to a cluster.
              -BlockDesc Prune(const BlockDesc& desc, vector<string> targets);
              -
              -void Block::Eval(const vector<string>& targets,
              -                 const framework::Scope& scope,
              -                 const platform::DeviceContext& dev_ctx) {
              -  BlockDesc min_desc = Prune(desc_, targets);
              -  Block min_block(min_desc);
              -  min_block.Run(scope, dev_ctx);
              -}
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/build_system/README.html b/develop/doc/design/build_system/README.html deleted file mode 100644 index 2b46da65563b7f6c203334ff94fa4a2c94bea182..0000000000000000000000000000000000000000 --- a/develop/doc/design/build_system/README.html +++ /dev/null @@ -1,401 +0,0 @@ - - - - - - - - - - - - - Required CMake Function — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -

              A few months ago when we were trying to replace CMake with Bazel, @emailweixu suggested that we rewrite those handy Bazel functions using CMake. Now it seems that it’s the right time to get this done, as we are facing problems from the porting of Majel and the development of new the parameter server using Go and C++.

              -

              Here are some initial thoughts. Your comments are welcome!

              -
              -

              Required CMake Function

              -

              I think we need only the following few CMake functions to make a project description mean and clean:

              -

              | C++ | CUDA C++ | Go | -|—|—|—| -| cc_library | nv_library | go_library | -| cc_binary | nv_binary | go_binary | -| cc_test | nv_test | go_test |

              -
                -
              • The _library functions generate .a files from source code.
              • -
              • The _binary functions generate executable binary files.
              • -
              • The _test functions generate executable unit test files. They work like _binary but links -lgtest and -lgtest_main.
              • -
              -

              The difference between nv_ functions and cc_ functions is that the former use nvcc instead of the system-default C++ compiler.

              -

              Both nv_ and cc_ functions enables C++11 (-std=c++11).

              -

              Also,

              -
                -
              • to describe external dependencies, we need external_library.
              • -
              • to build shared libraries, we need shared_library.
              • -
              -
              -
              -

              An Example Project

              -

              Suppose that we have aforementioned functions defined in our /cmake directory. The following example CMakeLists.txt describes a project including the following source files:

              -
                -
              • tensor.h
              • -
              • tensor.cc
              • -
              • tensor_test.cc
              • -
              • ops.h
              • -
              • ops.cu
              • -
              • ops_test.cu
              • -
              • api.go
              • -
              • api_test.go
              • -
              -

              Suppose that ops.cu depends on CUDNN.

              -
              # cc_binary parses tensor.cc and figures out that target also depend
              -# on tensor.h.
              -cc_binary(tensor
              -  SRCS
              -  tensor.cc)
              -
              -# The dependency to target tensor implies that if any of
              -# tensor{.h,.cc,_test.cc} is changed, tensor_test need to be re-built.
              -cc_test(tensor_test
              -  SRCS
              -  tensor_test.cc
              -  DEPS
              -  tensor)
              -
              -# I don't have a clear idea what parameters external_library need to
              -# have.  @gangliao as a CMake expert would have better ideas.
              -external_library(cudnn
              -  ....)
              -
              -# Suppose that ops.cu depends on external target CUDNN.  Also, ops.cu
              -# include global functions that take Tensor as their parameters, so
              -# ops depend on tensor.  This implies that if any of tensor.{h.cc},
              -# ops.{h,cu} is changed, ops need to be re-built.
              -nv_library(ops
              -  SRCS
              -  ops.cu
              -  DEPS
              -  tensor
              -  cudnn)  # cudnn is defined later.
              -
              -nv_test(ops_test
              -  SRCS
              -  ops_test.cu
              -  DEPS
              -  ops)
              -
              -# Because api.go defines a GO wrapper to ops and tensor, it depends on
              -# both.  This implies that if any of tensor.{h,cc}, ops.{h,cu}, or
              -# api.go is changed, api need to be re-built.
              -go_library(api
              -  SRCS
              -  api.go
              -  DEPS
              -  tensor # Because ops depend on tensor, this line is optional.
              -  ops)
              -
              -go_test(api_test
              -  SRCS
              -  api_test.go
              -  DEPS
              -  api)
              -
              -
              -# This builds libapi.so.  shared_library might use CMake target
              -# api_shared so to distinguish it from above target api.
              -shared_library(api
              -  DEPS
              -  api)
              -
              -
              -
              -
              -

              Implementation

              -

              As above example CMakeLists.txt executes, each function invocation adds “nodes” to a dependency graph. It also use this graph to generate CMake commands including add_executable, add_dependencies, target_link_libraries, and add_test.

              -
              -
              -

              Using Package Manager For Go

              -

              Building Go binaries and libraries need to satisfy their dependencies, generally -we can do go get ./... to download and compile all external dependencies. The -problems are:

              -
                -
              1. go get will always get the latest code from the default branch of the -remote repo, so changes of dependents might break the build. This is very -different with what we already have in cmake/external which download a -specific version or commit id of the dependency.
              2. -
              3. Some locations can not access external dependencies through the internet, as mentioned -in https://github.com/PaddlePaddle/Paddle/issues/2605. Using package management -tools can package the dependencies as a “vendor” package, which can be mirrored -at many cloud file hosting, so users what to compile paddle by themselves can -download this “vendor” package from a mirror site.
              4. -
              -
              -

              Choose A Suitable Tool

              -

              As mentioned by @wangkuiyi, Here -list dozens of Go package managers. We choose the tool using following principles:

              -
                -
              • Most “active” projects with more stars, more pull requests or commits
              • -
              • Widely used project
              • -
              -

              After comparing all these projects, we shall choose between the most popular -tools: Godep and Glide.

              -

              Here’s a brief comparison between Godep and Glide -: https://github.com/Masterminds/glide/wiki/Go-Package-Manager-Comparison. There are -also many complaints about using Godep. There’s also a new “official” pakcage -management tool has been started at: https://github.com/golang/dep to resolve -such problems, but it’s currently at Alpha stage. So the best choice now is -glide obviously.

              -
              -
              -

              Manage Go Packages

              -
                -
              • Dependencies: go/glide.yaml will store the dependencies and their versions which -is directly imported by paddle. go/glide.lock will store all dependencies recursively -with their commit id. Builds will “lock” to these packages if we don’t glide up -them
              • -
              • Vendor package: go/vendor directory will generated when running cmake command. cmake -will download the code corresponding to go/glide.lock. If we put a vendor folder -under go/, cmake will just check the commit id to the packages under the folder, -if commit id matches, there will be no download at all.
              • -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/README.html b/develop/doc/design/cluster_train/README.html deleted file mode 100644 index 00a0bc6f73b85984e54b0d194cf2352caf0fa588..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/README.html +++ /dev/null @@ -1,430 +0,0 @@ - - - - - - - - - - - - - Design Doc: Distributed Training — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Distributed Training

              -
              -

              Objective

              -

              In this slides, we explained that we’d like PaddlePaddle running on general-purpose clusters like those managed by Kubernetes, so to address demands for AI from both Internet and non-Internet industries.

              -

              This poses technical challenges to PaddlePaddle:

              -
                -
              1. Support fault-recovery.
              2. -
              3. Support both offline and online training.
              4. -
              5. Serverless computing of distributed training.
              6. -
              -
              -
              -

              Training Job

              -

              A training job will be created once user asks Paddle cloud to train a model. The training job is made up of different processes that collaboratively consume data and produce a trained model. There are three kinds of processes:

              -
                -
              1. the master server process, which dispatches tasks to
              2. -
              3. one or more trainer processes, which run distributed training and synchronize gradients/models via
              4. -
              5. one or more parameter server processes, where each holds a shard of the global model, and receive the uploaded gradients from every trainer process, so they can run the optimize functions to update their parameters.
              6. -
              -

              Their relation is illustrated in the following graph:

              -

              -

              By coordinating these processes, PaddlePaddle supports use both Synchronize Stochastic Gradient Descent (sync SGD) and Asynchronous Stochastic Gradient Descent (async SGD) to train user-defined neural network topologies.

              -

              When training with sync SGD, parameter servers wait for all trainers to finish gradients update and then send the updated parameters to trainers, training can not proceed until the trainer received the updated parameters. This creates a synchronization point between trainers. When training with async SGD, each trainer upload gradient and download new parameters individually, without the synchronization with other trainers. Using asyc SGD will be faster in terms of time per pass, but have more noise in gradient since trainers are likely to have a stale model.

              -
              -

              Master Server Process

              -

              The master server process will:

              -
                -
              • Partition a dataset into tasks and dispatch tasks to trainers.
              • -
              • Keep track of training progress on the dataset with task queue. A training job will iterate on the dataset for a full pass until it goes into next pass.
              • -
              -
              -

              Task

              -

              A task is a data shard to be trained. The total number of tasks will be much bigger than the total number of trainers. The number of data instances inside a task will be much bigger than the mini-batch size.

              -
              -
              -

              Task Queue

              -

              The master server has three task queues to track training progress. As illustrated in the graph below, Job A and Job B both have one master server. Each master server process has three task queues.

              -

              -
                -
              • The todo queue holds tasks to be dispatched. When a job starts, the master server fills in the todo queue with all tasks.
              • -
              • The pending queue holds tasks that are currently training by trainers.
              • -
              • the done queue holds tasks that are already trained.
              • -
              -

              The life cycle of a single task is illustrated below:

              -

              -
                -
              1. When a new pass of training starts, all tasks will be placed in the todo queue.
              2. -
              3. Upon trainer requests for new task, the master server will dispatch a task from todo queue to it, put the task in the pending queue and wait for completion.
              4. -
              5. The trainer will work on its task and tell the master server once the task is completed and ask for new task. The master server will dispatch a new task to that trainer.
              6. -
              7. If a task fails for any reason in trainer, or takes longer than a specific period of time, the master server will move the task back to the todo queue. The timeout count for that task will increase by one. If the timeout count is above a threshold, the task is likely to cause a trainer to crash, then it will be discarded.
              8. -
              9. The master server will move completed task to the done queue. When the todo queue is empty, the master server will start a new pass by moving all tasks in the done queue to todo queue and reset the timeout counter of all tasks to zero.
              10. -
              -
              -
              -
              -

              Trainer Process

              -

              The trainer process will:

              -
                -
              • Request tasks from the master.
              • -
              • Work on the tasks
              • -
              • Upload gradient to parameter servers, and update local model by downloading new parameters from parameter servers.
              • -
              -
              -
              -

              Parameter Server Process

              -

              Parameter server processes hold the parameters collaboratively. The parameters are partitioned on different parameter servers.

              -

              The parameter server will:

              -
                -
              • Receive gradient from the trainers, update its parameters, and give the trainers the latest parameters.
              • -
              • Periodically save its parameters to distributed file system by overriding the previous save.
              • -
              -
              -
              -

              Optimization Algorithms

              -

              The communication pattern between the trainers and the parameter servers depends on the category of optimization algorithm:

              -
                -
              • Synchronous Stochastic Gradient Descent (sync-SGD)

                -

                Parameter server will wait for all trainer finish n-th mini-batch calculation and send their gradients before broadcasting new parameters to every trainer. Every trainer will wait for the new parameters before starting n+1-th mini-batch.

                -
              • -
              • Asynchronous Stochastic Gradient Descent (async-SGD)

                -

                There will no synchronization between different trainers, and parameter server updates its parameter as soon as it receives new gradient:

                -
                  -
                • Each trainer uploads its accumulated gradient every n mini-batches.
                • -
                • Every m mini-batches, the trainer downloads new parameters from parameter server.
                • -
                • n and m do not have to be equal.
                • -
                -
              • -
              -
              -
              -
              -

              Fault Tolerant

              -

              The training job will pause if the master server processes is dead, or any of the parameter server process is dead. They will be started by Kubernetes and recover in few minutes. Please refer to fault recovery.

              -

              The training job will continue to make progress if there is at least one training process running. The strategy depends on the type of optimization algorithm:

              -
                -
              • sync-SGD

                -

                TODO

                -
              • -
              • async-SGD

                -

                Since async-SGD does not require synchronization between mini-batches, the system will by definition make process if at least one trainer is running.

                -
              • -
              -
              -
              -

              Fault Recovery

              -

              PaddlePaddle uses etcd to keep track of the states of processes. Because etcd is a distributed reliable key-value store, the restarted process can recover its states from etcd. The model parameters are periodically saved into distributed file system, so a restarted parameter server can recover its parameters from the saved file.

              -

              Now we will introduce how each process recovers from a failure, the graph below shows how etcd is used:

              -

              -
              -

              Master Server Process

              -

              When the master is started by the Kubernetes, it executes the following steps at startup:

              -
                -
              1. Grabs a unique master lock in etcd, which prevents concurrent master instantiations.
              2. -
              3. Recovers the task queues from etcd if they already exist, otherwise, the master will create them.
              4. -
              5. Write its ip address to /master/addr so that trainers can discover it.
              6. -
              7. Listens to trainers’ request of task, dispatch one upon request, and updates task queue using an etcd transaction to ensure lock is held during the update.
              8. -
              -

              When the master server process is dead for any reason, Kubernetes will restart it. It will be online again with all states recovered from etcd in few minutes.

              -
              -
              -

              Trainer Process

              -

              When the trainer is started by the Kubernetes, it executes the following steps at startup:

              -
                -
              1. Watches the available parameter server prefix keys /ps/ on etcd and waits until the count of parameter servers reaches the desired count /ps_desired.
              2. -
              3. Finds and watches /master/addr to get master’s address.
              4. -
              5. Requests for tasks from the master to start training.
              6. -
              -

              When a trainer fails, Kuberentes would try to restart it. The recovered trainer would fetch tasks from master and go on training.

              -
              -
              -

              Parameter Server Process

              -

              When the parameter server is started by Kubernetes, it executes the following steps at startup:

              -
                -
              1. Read desired total number of parameter servers from etcd /ps_desired

                -
              2. -
              3. Search through etcd keys /ps/<index> (/ps/0, /ps/1, ...) to find the first non-existant key whose index is smaller than the total number of parameter servers. Set the key using a transaction to avoid concurrent writes. The parameter server’s index is inferred from the key name.

                -

                The desired number of parameter servers is 3:

                -

                -

                The third parameter server joined:

                -

                -
              4. -
              5. The parameter server can load parameters if there are already saved parameters in the save path (inferred from its index).

                -
              6. -
              7. Now the parameter server is ready for the trainers’ requests.

                -
              8. -
              -

              If the parameter server’s etcd lease expires, the parameter server will kill itself.

              -
              -
              -
              -

              Parameter Server Checkpointing

              -

              See here

              -
              -
              -

              Store and dispatching trainning data

              -

              See here

              -
              -
              -

              Dynamic Scaling

              -
              -

              Trainer Scaling

              -

              TODO

              -
              -
              -

              Parameter Server Scaling

              -

              Not planned for v1.

              -
              -
              -
              -

              Training Dataset Format

              -

              TODO

              -
              -
              -

              User Interface

              -

              TODO

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/checkpointing.html b/develop/doc/design/cluster_train/checkpointing.html deleted file mode 100644 index 78bf1a684e6f61750eb8939fd5ce421b8acca4a7..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/checkpointing.html +++ /dev/null @@ -1,305 +0,0 @@ - - - - - - - - - - - - - 模型参数检查点(Checkpointing) — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              模型参数检查点(Checkpointing)

              -

              模型数据检查点的实现,可以有效的避免parameter server的单点或多点同时故障。模型参数检查点通过定期向磁盘上保存一份存储在parameter server内存中的模型数据的完整镜像,来保证训练过程可以从中间状态重新启动。在一个不可中断并缺少备份的训练任务中,可以通过阶段性的保存每个parameter server的数据快照(snapshot)到 分布式存储服务 达到容灾的目的,比如每隔10分钟最新的快照,并删除更早的快照。在出现单点故障时,只需要恢复这台节点,或者将这台节点迁移到另一个节点并启动即可恢复训练任务。

              -

              -
              -

              快照保存的设计如下:

              -

              说明:

              -
                -
              • parameter server在集群中启动后,自动挂载分布式存储目录,并把快照保存到这个目录下。
              • -
              • 注:每个parameter server的检查点各自独立保存,暂时不考虑多个parameter server同步的保存一个特定时间点的全局检查点,因为这样做也没法保证消除随机性。
              • -
              -

              检查点保存程序流程:

              -
                -
              1. 如果满足条件”每隔10分钟”时,parameter server会获取parameters内存的read_lock,启动一个新的线程开始保存检查点。如果已经正在执行保存检查点的线程,则忽略。由于对parameters的更新需要获取parameters内存的write_lock,所以在写入快照的过程中,parameter server会暂停参数更新并等待。
              2. -
              3. parameter server生成一个UUID,向指定的目录中一个新的文件(文件名为此UUID)写入快照数据。在快照写入完成后,计算这个文件的MD5 sum。然后在etcd的/checkpoints/[pserver_id]中写入json内容:{"uuid": [UUID], "md5", "MD5 sum", "timestamp": xxxx}
              4. -
              5. 删除磁盘目录中不是当前uuid的快照文件。
              6. -
              7. 释放对paramters内存的锁定,停止保存检查点的线程。
              8. -
              -

              这里需要用户额外注意,在您的实际环境中,训练任务的运行可能会占满trainer和parameter server之间的网络带宽,如果parameter server此时还需要通过网络访问分布式存储以保存快照,可能会造成网络拥塞,而出现阶段性的运行停滞。

              -
              -
              -

              从快照恢复

              -

              在parameter server第一次启动或任意时间parameter server故障后被Kubernetes重新启动,则需要回滚到上一个检查点:

              -
                -
              1. 从etcd中读取节点:/checkpoints/[pserver_id]获取最新的检查点的文件uuid
              2. -
              3. 从磁盘文件中加载uuid文件名的检查点快照文件,并加载其中的参数
              4. -
              5. 如果上面两步出现错误,则使用启动参数定义的初始化方法初始化参数
              6. -
              7. 开始提供服务
              8. -
              -
              -
              -
              -

              TODO List

              -
              -

              推测执行/加速执行(TODO)

              -

              在异构集群中,如果存在某些trainer执行速度过慢会影响整体集群的速度(如图中Trainer 1),此时master将负责启动一个新的Trainer(Accelerate Trainer 2),使用同样的训练数据block。哪个trainer先完成block的训练,则把另一个慢速的kill掉。

              -
              -
              -

              动态扩容/缩容

              -

              目前只考虑动态扩容trainer数量,可以减小系统复杂性。

              -
              -
              -
              -

              术语

              -
                -
              • model: 指深度学习训练之后得到的所有参数,使用这个神经网络可以完成对新数据的预测
              • -
              • parameters: 神经网络中的参数,包括权重w和偏置b。一个神经网络的模型由大量的参数组成
              • -
              • shard: 分片,通常指将一个整体拆分成多份的其中的一份。
              • -
              • model shard: 将一个神经网络参数拆分成多份,每个shard分别存储在其中一台parameter server之上
              • -
              • parameter block: 多个parameter block构成一个model shard
              • -
              • 单点故障: 任意时刻只可能同时有一台服务器故障。由于集群中同时存在两台机器故障的概率极低((平均故障率*平均故障修复时间)^2)只对特殊在线系统考虑两台以上同时故障的容灾。
              • -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/data_dispatch.html b/develop/doc/design/cluster_train/data_dispatch.html deleted file mode 100644 index 506a37824ff3234f6b4256d93b3e0f6d71b3b42c..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/data_dispatch.html +++ /dev/null @@ -1,406 +0,0 @@ - - - - - - - - - - - - - 训练数据的存储和分发 — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              训练数据的存储和分发

              -
              -

              概念解释

              -
              -
              -

              流程介绍

              -

              生产环境中的训练数据集通常体积很大,并被存储在诸如Hadoop HDFS,Ceph,AWS S3之类的分布式存储之上。这些分布式存储服务通常会把数据切割成多个分片分布式的存储在多个节点之上。这样就可以在云端执行多种数据类计算任务,包括:

              -
                -
              • 数据预处理任务
              • -
              • Paddle训练任务
              • -
              • 在线模型预测服务
              • -
              -
              - -

              在上图中显示了在一个实际生产环境中的应用(人脸识别)的数据流图。生产环境的日志数据会通过实时流的方式(Kafka)和离线数据的方式(HDFS)存储,并在集群中运行多个分布式数据处理任务,比如流式数据处理(online data process),离线批处理(offline data process)完成数据的预处理,提供给paddle作为训练数据。用户也可以上传labeled data到分布式存储补充训练数据。在paddle之上运行的深度学习训练输出的模型会提供给在线人脸识别的应用使用。

              -
              -
              -

              训练数据存储

              -

              我们选择CephFS作为存储系统。

              -
                -
              • 无论是从PFSClient的角度,还是从Pod中运行任务的角度,统一用/pfs/$DATACENTER/home/$USER来访问用户自己的数据。
              • -
              • /pfs/$DATACENTER/common下存放公共数据集合
                  -
                • 做只读挂载
                • -
                -
              • -
              -
              - -
              -
              -

              文件预处理

              -

              在开始训练之前, 数据集需要预先被转换成PaddlePaddle分布式训练使用的存储格RecordIO。我们提供两个转换方式:

              -
                -
              1. 用户在本地转换好再上传
              2. -
              3. 用户上传数据后,在机群上运行转换程序
              4. -
              -

              转换生成的文件名会是以下格式:

              -
              name_prefix-aaaaa-of-bbbbb
              -
              -
              -

              “aaaaa”和”bbbbb”都是五位的数字,每一个文件是数据集的一个shard,”aaaaa”代表shard的index,”bbbbb”代表这个shard的最大index。

              -

              比如ImageNet这个数据集可能被分成1000个shard,它们的文件名是:

              -
              imagenet-00000-of-00999
              -imagenet-00001-of-00999
              -...
              -imagenet-00999-of-00999
              -
              -
              -
              -

              转换库

              -

              无论是在本地或是云端转换,我们都提供Python的转换库,接口是:

              -
              def convert(output_path, reader, num_shards, name_prefix)
              -
              -
              -
                -
              • output_path: directory in which output files will be saved.
              • -
              • reader: a data reader, from which the convert program will read data instances.
              • -
              • num_shards: the number of shards that the dataset will be partitioned into.
              • -
              • name_prefix: the name prefix of generated files.
              • -
              -

              reader每次输出一个data instance,这个instance可以是单个值,或者用tuple表示的多个值:

              -
              yield 1 # 单个值
              -yield numpy.random.uniform(-1, 1, size=28*28) # 单个值
              -yield numpy.random.uniform(-1, 1, size=28*28), 0 # 多个值
              -
              -
              -

              每个值的类型可以是整形、浮点型数据、字符串,或者由它们组成的list,以及numpy.ndarray。如果是其它类型,会被Pickle序列化成字符串。

              -
              -
              -
              -

              示例程序

              -
              -

              使用转换库

              -

              以下reader_creator生成的reader每次输出一个data instance,每个data instance包涵两个值:numpy.ndarray类型的值和整型的值:

              -
              def reader_creator():
              -    def reader():
              -        for i in range(1000):
              -            yield numpy.random.uniform(-1, 1, size=28*28), 0 # 多个值
              -    return reader
              -
              -
              -

              reader_creator生成的reader传入convert函数即可完成转换:

              -
              convert("./", reader_creator(), 100, random_images)
              -
              -
              -

              以上命令会在当前目录下生成100个文件:

              -
              random_images-00000-of-00099
              -random_images-00001-of-00099
              -...
              -random_images-00099-of-00099
              -
              -
              -
              -
              -

              进行训练

              -

              PaddlePaddle提供专用的data reader creator,生成给定RecordIO文件对应的data reader。无论在本地还是在云端,reader的使用方式都是一致的

              -
              # ...
              -reader = paddle.reader.creator.RecordIO("/pfs/datacenter_name/home/user_name/random_images-*-of-*")
              -batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128)
              -trainer.train(batch_reader, ...)
              -
              -
              -

              以上代码的reader输出的data instance与生成数据集时,reader输出的data instance是一模一样的。

              -
              -
              -
              -

              上传训练文件

              -

              使用下面命令,可以把本地的数据上传到存储集群中。

              -
              paddle pfs cp filename /pfs/$DATACENTER/home/$USER/folder/
              -
              -
              -

              比如,把之前示例中转换完毕的random_images数据集上传到云端的/home/可以用以下指令:

              -
              paddle pfs cp random_images-*-of-* /pfs/$DATACENTER/home/$USER/folder/
              -
              -
              -

              需要$DATACENTER的配置写到配置文件中,例如

              -
              # config file
              -[datacenter_1]
              -username=user
              -usercert=user.pem
              -userkey=user-key.pem
              -endpoint=datacenter1.paddlepaddle.org
              -
              -[datacenter_2]
              -username=user
              -usercert=user.pem
              -userkey=user-key.pem
              -endpoint=datacenter2.paddlepaddle.org
              -
              -
              -
              -
              -
              -

              TODO

              -
              -

              文件访问的权限

              -

              控制用户权限

              -
                -
              • 用户可以把自己的数据分享给别人
              • -
              -
              -
              -

              文件访问方式

              -

              不用mount的方式来访问数据,而是直接用API的接口远程访问

              -

              例如:

              -
              f = open('/pfs/datacenter_name/home/user_name/test1.dat')
              -
              -
              -
              -
              -

              支持用户自定义的数据预处理job

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/large_model_dist_train.html b/develop/doc/design/cluster_train/large_model_dist_train.html deleted file mode 100644 index 9d46d073ee2a0a2af665d4fcfc532152bc052993..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/large_model_dist_train.html +++ /dev/null @@ -1,343 +0,0 @@ - - - - - - - - - - - - - Alalysis of large model distributed training in Paddle — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Alalysis of large model distributed training in Paddle

              -

              NOTE: This is only some note for how we implemeted this scheme in V1, not a new design.

              -
              -

              What is it

              -

              We often encounter cases that the embedding layer parameters(sparse) are so large that we can not store it in the trainer’s memory when training. So we need to put them to several servers, and fetch them row by row instead of fetch all of the parameters.

              -
              -
              -

              How to use

              -

              Specify command-line argument like --loadsave_parameters_in_pserver=true --ports_num_for_sparse=1 --use_old_updater=1 when starting the paddle trainer. And also add something like --ports_num_for_sparse=1 --pserver_num_threads=5 when starting pserver processes.

              -

              Accrodingly, configure your embedding layers like:

              -
              SPARSE_REMOTE=True
              -
              -w1 = data_layer(name="w1", size=dict_size)
              -emb1 = embedding_layer(input=w1, size=32, param_attr=ParameterAttribute(sparse_update=SPARSE_REMOTE))
              -w2 = data_layer(name="w2", size=dict_size)
              -emb2 = embedding_layer(input=w2, size=32, param_attr=ParameterAttribute(sparse_update=SPARSE_REMOTE))
              -...
              -
              -
              -
              -
              -

              Implementation details

              -
              enum MatType {
              -  MAT_NORMAL,
              -  MAT_NORMAL_SHARED,
              -  MAT_VALUE_SHARED,
              -  MAT_SPARSE_ROW_IDS,
              -  MAT_SPARSE_ROW_AUTO_GROW,
              -  MAT_CACHE_ROW,
              -  MAT_SPARSE_ROW,
              -  MAT_SPARSE_ROW_PREFETCH,
              -  MAT_SPARSE_ROW_PREFETCH_FULL_SIZE,
              -};
              -
              -
              -

              MAT_SPARSE_ROW_PREFETCH is what we use when configured to fetch only row of matrix when training.

              -

              In trainer_internal.cpp:L93 trainOneBatch:

              -
                if (config_->getOptConfig().use_sparse_remote_updater()) {
              -    REGISTER_TIMER("prefetch");
              -    gradientMachine_->prefetch(inArgs);
              -    parameterUpdater_->getParametersRemote();
              -  }
              -
              -
              -

              When doing actual network forward and backward, at the beginning of each batch, the trainer will try to download one row of data from pserver.

              -

              In trainer/RemoteParameterUpdater.cpp: parameterUpdater_->getParametersRemote();:

              -
              if (fullSize) {
              -    ...
              -} else {
              -getParams = [&] {
              -    parameterClient_->getParameterSparse(
              -        /* recvParameterType= */ PARAMETER_VALUE, sendBackParameterType);
              -};
              -applyL1 = [](Parameter& para, real decayRate) {
              -    para.getMat(PARAMETER_VALUE)->applyL1(/*lr=*/1.0f, decayRate);
              -};
              -}
              -
              -
              -

              Calling parameterClient_->getParameterSparse will do remote call to pserver’s getParameterSparse:

              -
              void ParameterServer2::getParameterSparse(const SendParameterRequest& request,
              -                                          std::vector<Buffer>& inputBuffers,
              -                                          SendParameterResponse* response,
              -                                          std::vector<Buffer>* outputBuffers) {
              -  (void)inputBuffers;
              -  auto& buffer = *readWriteBuffer_;
              -  size_t numReals = 0;
              -  for (const auto& block : request.blocks()) {
              -    numReals += getParameterConfig(block).dims(1);
              -  }
              -  buffer.resize(numReals);
              -
              -  VLOG(3) << "pserver: getParameterSparse, numReals=" << numReals;
              -
              -  ReadLockGuard guard(parameterMutex_);
              -  size_t offset = 0;
              -  for (const auto& block : request.blocks()) {
              -    size_t width = getParameterConfig(block).dims(1);
              -    Buffer buf = {buffer.data() + offset, width};
              -    int type = request.send_back_parameter_type();
              -    sendBackParameterSparse(block, type, response, &buf, width, outputBuffers);
              -    offset += width;
              -  }
              -}
              -
              -
              -

              getParameterConfig(block).dims(1) returns the width of the current “parameter block”(a shard of parameter object), -then getParameterSparse remote call returns only one row of data to the client.

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/master_server.html b/develop/doc/design/cluster_train/master_server.html deleted file mode 100644 index bdf41d457bb1df9a748d1b2a7afe37c4783f4b42..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/master_server.html +++ /dev/null @@ -1,341 +0,0 @@ - - - - - - - - - - - - - Design Doc: Master Server — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Master Server

              -

              For an overview of master server’s role, please refer to distributed training design doc. In this design doc we will discuss the master server in more details. The master will be implemented in Go.

              -
              -

              Dataset

              -

              -

              A dataset is a list of files in RecordIO format. A RecordIO file consists of chunks, whereas each chunk consists some records.

              -
              -
              -

              Task Queue

              -

              As mentioned in distributed training design doc, a task is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple chunks from one or multiple files. The master server maintains task queues to track the training progress.

              -
              -

              Task Queue Creation

              -
                -
              1. Each trainer will make an RPC call (using Go’s rpc package) to the master server, telling it the RecordIO files representing the dataset specified by the user. Since every trainer will tell the master server the same dataset, only the first RPC call will be honored.

                -

                The RPC interface is:

                -
                func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error {
                -}
                -
                -
                -
              2. -
              3. The master server will scan through each RecordIO file to generate the chunk index and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.

                -

                The definition of the chunk is:

                -
                type Chunk struct {
                -    Idx   int // index of the chunk within the file
                -    Path  string
                -    Index recordio.Index // chunk index
                -}
                -
                -
                -
              4. -
              5. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.

                -

                The definition of the task is:

                -
                type Task struct {
                -    Index  int
                -    Chunks []Chunk
                -}
                -
                -
                -

                The elements in the tasks queues is of type TaskEntry, containing a timeout counter (described in task retry logic), and a task:

                -
                type TaskEntry struct {
                -    NumTimeout int
                -    Task       Task
                -}
                -
                -
                -

                The definition of task queues is:

                -
                type TaskQueues struct {
                -    Todo    []TaskEntry
                -    Pending map[int]TaskEntry // map from task index to task entry
                -    Done    []TaskEntry
                -}
                -
                -
                -
              6. -
              -
              -
              -

              Task Queue Persistence

              -

              The task queues need to be persisted on etcd for fault recovery. Since the task queues only change once a task is completed or timed out, which is not very frequent, we can afford to synchronize with etcd every time the task queues change.

              -

              We will serialize the task queues data structure with gob encoding, compress with gzip, and save into etcd synchronously under key /task_queues.

              -
              -
              -

              Task Dispatch

              -

              The trainer will make an RPC call to master to get a new task when:

              -
                -
              • the trainer first started, or
              • -
              • the trainer finishes a task.
              • -
              -

              The RPC interface is:

              -
              func (m *RPCServer) GetTask(finished *Task, result *Task) error {
              -}
              -
              -
              -

              Argument finished will be nil when the trainer is just started.

              -

              During the RPC call the master will do the following:

              -
                -
              • Make a copy of the task queues, and update the copy reflecting the finished tasks and the new pending tasks.
              • -
              • Synchronize the copy of task queues with etcd using a transaction conditioned on holding the master lock.
              • -
              • Replace the task queues with the copy and report to the trainer with the new tasks if succeeded, or discard the copy and report the error to the trainer if failed.
              • -
              -
              -
              -

              Task Retry Logic

              -

              When a task is dispatched to the trainer, the master will schedule a function for execution after the timeout duration (based on the moving average of task completion time). If the task entry in still in the pending queue, its timeout counter will increase by one, and the task will be moved to todo queue. If the timeout counter is above the threshold, the master will log the error and discard the task.

              -

              Please note that since a timed out task could be completed after it has been dispatched for retry, so it is possible for a task to be processed multiple times. We do not try to prevent it from happening since it’s fine to train on the same task multiple times due to the stochastic nature of the stochastic gradient decent algorithm.

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/pserver_client.html b/develop/doc/design/cluster_train/pserver_client.html deleted file mode 100644 index 4cd70aed90fb4ef1273837fb77e7763583e5ac17..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/pserver_client.html +++ /dev/null @@ -1,418 +0,0 @@ - - - - - - - - - - - - - Design Doc: The Client Library of Parameter Server — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: The Client Library of Parameter Server

              -

              For an overview of trainer’s role, please refer to distributed training design doc. In this design doc, we will discuss the parameter server’s client library, which will manage communication with parameter servers. The library will be implemented in Go and made available as a static or dynamic library with a C header file.

              -
              -

              Parameter Partition

              -

              Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The sparse parameter require a little different treatment:

              -
              -

              Sparse Parameter

              -

              The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector.

              -

              Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention:

              -

              If a sparse parameter is partitioned into n shards, they should be named as:

              -
              name:sparse-0
              -name:sparse-1
              -...
              -name:sparse-n-1
              -
              -
              -

              The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention.

              -
              -
              -
              -

              Model Optimization Using Gradients

              -

              There are two ways to perform model optimization using gradients:

              -
                -
              • On Client

                -

                The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization).

                -
              • -
              • On Parameter Server

                -

                The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients.

                -
              • -
              -
              -
              -

              L1 and L2 Regularization

              -

              PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary.

              -
              -
              -

              Parameter Initialization

              -

              The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.

              -
              -

              Trainer Selection

              -

              To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:

              -

              -
              -
              -

              Trainer Selection Process

              -

              The trainer select process is encapsulated in the C API function:

              -
              int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
              -
              -
              -

              The selected trainer’s call to paddle_begin_init_params will return with 1, and the other trainers’ call to paddle_begin_init_params will return 0. paddle_get_params will be blocked until initialization is completed. As illustrated below:

              -

              -
              -
              -
              -

              C Interface

              -
              typedef enum {
              -  PADDLE_ELEMENT_TYPE_INT32   = 0,
              -  PADDLE_ELEMENT_TYPE_UINT32  = 1,
              -  PADDLE_ELEMENT_TYPE_INT64   = 2,
              -  PADDLE_ELEMENT_TYPE_UINT64  = 3,
              -  PADDLE_ELEMENT_TYPE_FLOAT32 = 4,
              -  PADDLE_ELEMENT_TYPE_FLOAT64 = 5,
              -} paddle_element_type;
              -
              -typedef struct {
              -  char*               name;
              -  paddle_element_type element_type;
              -  unsigned char*      content;
              -  int                 content_len;
              -} paddle_parameter, paddle_gradient;
              -
              -typedef int paddle_pserver_client;
              -
              -/**
              - * @brief creates a pserver client that talks to etcd for coordination.
              - */
              -paddle_pserver_client paddle_new_etcd_pserver_client(char* etcd_addr);
              -
              -/**
              - * @brief creates a pserver client given pserver addresses.
              - *
              - * @param pserver_addrs comma-separated pserver addresses.
              - * @param selected if current pserver client is selected to initialize all parameter servers.
              - */
              -paddle_pserver_client paddle_new_pserver_client(char* pserver_addrs, int selected);
              -void paddle_pserver_client_release(paddle_pserver_client c);
              -
              -/**
              - * @brief paddle_begin_init_params begins to initialize parameters on
              - * parameter servers.
              - *
              - * paddle_begin_init_params will be called from multiple trainers,
              - * only one trainer will be selected to initialize the parameters on
              - * parameter servers. Other trainers need to get the initialized
              - * parameters from parameter servers using @paddle_get_params.
              - *
              - * @return 1 if the trainer is selected to initialize parameter
              - * servers, otherwise 0.
              - */
              -int paddle_begin_init_params(paddle_pserver_client client);
              -
              -/**
              - * @brief paddle_init_param initializes the parameter on parameter
              - * servers.
              - *
              - * @param param the parameter to initialize.
              - * @param param_config_proto the configuration for the parameter.
              - * @param config_len the length of param_config_proto
              - * @return 0 if successful, otherwise -1. On failure, the trainer
              - * needs to restart the entire initialization process (starting from
              - * @paddle_begin_init_param). Or simply exit the program and wait for
              - * the cluster management system to restart the trainer.
              - */
              -int paddle_init_param(paddle_pserver_client client, paddle_parameter param, const unsigned char* param_config_proto, int config_len);
              -
              -/**
              - * @brief paddle_finish_init_params tells parameter servers client has
              - * sent all parameters to parameter servers as initialization.
              - *
              - * @return 0 if successful, otherwise -1. On failure, the trainer
              - * needs to restart the entire initialization process (starting from
              - * @paddle_begin_init_param). Or simply exit the program and wait for
              - * the cluster management system to restart the trainer.
              - */
              -int paddle_finish_init_params(paddle_pserver_client client);
              -
              -/**
              - * @brief paddle_send_grads sends gradients to parameter servers for
              - * updating parameters.
              - *
              - * @param grads the array of gradients to send.
              - * @param len the length of the gradient array.
              - * @param learning_rate the learning rate for the gradients.
              - * @return 0 if successful, otherwise -1.
              - */
              -int paddle_send_grads(paddle_pserver_client client, const paddle_gradient* grads, int len);
              -
              -/**
              - * @brief paddle_get_params gets parameters from parameter servers.
              - *
              - * paddle_get_params will block until parameters are initialized on
              - * the parameter servers.
              - *
              - * @param dst the destination array of parameter pointers to save to.
              - * The parameter pointer must be pre-popullated with required parameter name,
              - * and the content of parameter must be pre-allocated of the size of required
              - * parameter on pserver.
              - * @param len the length of the names array and the paddle_parameter
              - * array.
              - * @return 0 if successful, otherwise -1.
              - */
              -int paddle_get_params(paddle_pserver_client client, paddle_parameter** dst, int len);
              -
              -/**
              - * @brief paddle_save_model indicates parameters to save the parameter
              - * to the given path
              - *
              - * @param path the path to save parameters.
              - * @return 0 if successful, otherwise -1.
              - */
              -int paddle_save_model(paddle_pserver_client client, const char* path);
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/remote_parameter_updater.html b/develop/doc/design/cluster_train/remote_parameter_updater.html deleted file mode 100644 index b993733790a6ceebac60f2b45570decaf2a83c90..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/remote_parameter_updater.html +++ /dev/null @@ -1,273 +0,0 @@ - - - - - - - - - - - - - Design Doc: Remote Parameter Updater for Cluster Train — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Remote Parameter Updater for Cluster Train

              -

              For an overview of distribute training, please refer to distributed training design doc. In this design doc, we will discuss the parameter updater that will use parameter server cclient The Client Library of Parameter Server Design Doc to manage and update parameters.

              -
              -

              Parameter Updater

              -

              Parameter Updater is used by trainer to manage and update parameter, there are mainly two kind of parameter updater: local and remote, since this design is for cluster train, we will only discuss remote parameter updater here.

              -
              -

              Remote Parameter Updater

              -

              Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver(The Client Library of Parameter Server Design Doc)

              -

              In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a instance of parameter updater and call it’s functions directly. In this design, we will also expose the api of RemoteParameterUpdater to python with swig.

              -
              -

              Sparse Remote Parameter Updater

              -

              Since we will only implement dense parameter management new, the mechanism for sparse parameter will be discussed in next stage.

              -
              -
              -
              -

              Interface Design

              -

              TBD

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/save_model.html b/develop/doc/design/cluster_train/save_model.html deleted file mode 100644 index 92e58966fe7eabcb19ba7c9aa4ee510022d23eb1..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/save_model.html +++ /dev/null @@ -1,360 +0,0 @@ - - - - - - - - - - - - - Design Doc: Save Model — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Save Model

              -
              -

              Overview

              -

              The model is the output of the training process. There are two -ways from which user can obtain a model:

              -
                -
              • Save model triggered by user code: user code asks PaddlePaddle to -save a model.
              • -
              • Convert model from the checkpoint: model being converted from -pservers’ periodic checkpoint. In this way, the user can cancel a -job at any time, and still have a relatively fresh model (we -checkpoint around every 5 minutes).
              • -
              -
              -

              Trainer Saving Model vs. Pservers Saving Model

              -

              Both trainers and pservers have access to the model. So the model can -be saved from a trainer or pservers. We need to decide where the model -is saved from.

              -
              -

              Dense Update vs. Sparse Update

              -

              There are two types of model update methods: dense update and sparse -update (when the model parameter is configured to be sparse).

              -
                -
              • Dense update

                -

                Every trainer has it’s own full copy of the model. Every model -update will update the entire model.

                -
              • -
              • Sparse update

                -

                The training input is sparse, and the trainer does not have the -entire model. It will only download the sub-model necessary related -to the input. When updating the model, only the sub-model related to -the training input is updated.

                -
              • -
              -
              -
              -

              Pservers Saving Model

              -

              The benefit of letting pservers save model is they have the entire -model all the time. However, since pservers are on different nodes, it -requires a merging process to merge model shards into the same -model. Thus requires the pservers to write models to a distributed -filesystem, making the checkpoint shards visible to the merge program.

              -
              -
              -

              Trainer Saving Model

              -

              The benefit of letting one trainer to save the model is it does not -require a distributed filesystem. And it’s reusing the same save model -logic when training locally - except when doing sparse update, the -trainer needs to download the entire model during the saving process.

              -
              -
              -

              Conclusion

              -

              Given trainer saving model does not require a distributed filesystem, -and is an intuitive extension to trainer saving model when training -locally, we decide to let the trainer save the model when doing -distributed training.

              -
              -
              -
              -

              Convert Model from Checkpoint

              -

              TODO

              -
              -
              -
              -

              Timeline

              -

              We first implement trainer save the model. Converting the latest -snapshot to a model will be a TODO for future.

              -
              -
              -

              Trainer Save Model

              -
              -

              Trainer Election

              -

              One trainer will be elected as the one to save the model. When using -etcd, trainer ID is a randomly generated UUID, the trainer will -contact the master server requesting to save the model, and find out -if itself is elected. When the master server is not used, unique -trainer IDs will be given by the administrator, the trainer whose ID -is “0” is elected to save the model.

              -
              -
              -

              Model Save Path

              -

              Each trainer will be given the directory to save the model. The -elected trainer will save the model to -given-directory/trainerID. Since the trainer ID is unique, this -would prevent concurrent save to the same file when multiple trainers -are elected to save the model when split-brain problem happens.

              -
              -
              -

              What Happens When Model Is Saving

              -

              It takes some time to save model, we need to define what will happen -when save model is taking place.

              -

              When doing dense update, the trainer uses the local model. Pservers -does not need to pause model update.

              -

              When doing sparse update. The trainer needs to download the entire -model while saving. To get the most accurate model, the model update -needs to be paused before the download starts and resumed after the -download finishes. Otherwise, the trainer gets a model that is -“polluted”: some part of the model is old, some part of the model is -new.

              -

              It’s unclear that the “polluted” model will be inferior due to the -stochastic nature of deep learning, and pausing the model update will -add more complexity to the system. Since supporting sparse update is a -TODO item. We defer the evaluation of pause the model update or not -during saving model to the future.

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cluster_train/submit-job.html b/develop/doc/design/cluster_train/submit-job.html deleted file mode 100644 index 1a85ff027df83b3f361fb8aa45fe6a22a957d73e..0000000000000000000000000000000000000000 --- a/develop/doc/design/cluster_train/submit-job.html +++ /dev/null @@ -1,389 +0,0 @@ - - - - - - - - - - - - - Submit a Distributed Training Job — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Submit a Distributed Training Job

              -

              The user can submit a distributed training job with Python code, rather than with a command-line interface.

              -
              -

              Runtime Environment On Kubernetes

              -

              For a distributed training job, there is two Docker image called runtime Docker image and base Docker image. The runtime Docker image is the Docker image that gets scheduled by Kubernetes to run during training. The base Docker image is for building the runtime Docker image.

              -
              -

              Base Docker Image

              -

              Usually, the base Docker image is PaddlePaddle product Docker image including paddle binary files and python package. And of course, users can specify any image name hosted on any docker registry which users have the access right.

              -
              -
              -

              Runtime Docker Image

              -

              The trainer package which user upload and some Python dependencies are packaged into a runtime Docker image based on base Docker image.

              -
                -
              • Handle Python Dependencies

                -

                You need to provide requirements.txt file in your trainer-package folder. Example:

                -
                pillow
                -protobuf==3.1.0
                -
                -
                -

                More details about requirements, an example project looks like:

                -
                  paddle_example
                -    |-quick_start
                -      |-trainer.py
                -      |-dataset.py
                -      |-requirements.txt
                -
                -
                -
              • -
              -
              -
              -
              -

              Submit Distributed Training Job With Python Code

              -

              -
                -
              • paddle.job.dist_train() will call the Job Server API /v1/packages to upload the trainer package and save them on CephFS, and then call /v1/trainer/job to submit the PaddlePaddle distributed job.
              • -
              • /v1/trainer/job will start a building job for preparing the runtime Docker image. When the building job is finished, Job Server will submit the PaddlePaddle distributed job to Kubernetes.
              • -
              • NOTE: For the first version, we will not prepare the runtime Docker image, instead, the package is uploaded to Paddle Cloud, and Paddle Cloud will mount the package in a temporary folder into the base Docker image. We will not support custom Python dependencies in the first version as well.
              • -
              -

              You can call paddle.job.dist_train and provide distributed training configuration as the parameters:

              -
              paddle.job.dist_train(
              -  trainer=dist_trainer(),
              -  paddle_job=PaddleJob(
              -    job_name = "paddle-cloud",
              -    entry_point = "python %s"%__file__,
              -    trainer_package = "/example/word2vec",
              -    image = "yancey1989/paddle-job",
              -    trainers = 10,
              -    pservers = 3,
              -    trainer_cpu = 1,
              -    trainer_gpu = 1,
              -    trainer_mem = "10G",
              -    pserver_cpu = 1,
              -    pserver_mem = "2G"
              -  ))
              -
              -
              -

              The parameter trainer of paddle.job.dist_train is a function and you can implement it as follows:

              -
              def dist_trainer():
              -  def trainer_creator():
              -    trainer = paddle.v2.trainer.SGD(...)
              -    trainer.train(...)
              -  return trainer_creator
              -
              -
              -

              The pseudo code of paddle.job.dist_train is as follows:

              -
              def dist_train(trainer, paddle_job):
              -  # if the code is running on cloud, set PADDLE_ON_CLOUD=YES
              -  if os.getenv("RUNNING_ON_CLOUD", "NO") == "NO":
              -    #submit the paddle job
              -    paddle_job.submit()
              -  else:
              -    #start the training
              -    trainer()
              -
              -
              -
              -

              PaddleJob Parameters

              -

              parameter | type | explanation -— | — | — -job_name | str | the unique name for the training job -entry_point | str | entry point for startup trainer process -trainer_package | str | trainer package file path which user have the access right -image|str|the base image for building the runtime image -pservers|int| Parameter Server process count -trainers|int| Trainer process count -pserver_cpu|int| CPU count for each Parameter Server process -pserver_mem|str| memory allocated for each Parameter Server process, a plain integer using one of these suffixes: E, P, T, G, M, K -trainer_cpu|int| CPU count for each Trainer process -trainer_mem|str| memory allocated for each Trainer process, a plain integer using one of these suffixes: E, P, T, G, M, K -trainer_gpu|int| GPU count for each Trainer process, if you only want CPU, do not set this parameter

              -
              -
              -

              Deploy Parameter Server, Trainer and Master Process

              -
                -
              • Deploy PaddlePaddle Parameter Server processes, it’s a Kubernetes ReplicaSet.
              • -
              • Deploy PaddlePaddle Trainer processes, it’s a Kubernetes Job.
              • -
              • Deploy PaddlePaddle Master processes, it’s a Kubernetes ReplicaSet.
              • -
              -
              -
              -
              -

              Job Server

              -
                -
              • RESTful API

                -

                Job server provides RESTful HTTP API for receiving the trainer package and displaying -PaddlePaddle job related informations.

                -
                  -
                • POST /v1/package receive the trainer package and save them on CephFS
                • -
                • POST /v1/trainer/job submit a trainer job
                • -
                • GET /v1/jobs/ list all jobs
                • -
                • GET /v1/jobs/<job-name> the status of a job
                • -
                • DELETE /v1/jobs/<job-name> delete a job
                • -
                • GET /v1/version job server version
                • -
                -
              • -
              • Build Runtime Docker Image on Kubernetes

                -

                paddle.job.dist_train will upload the trainer package to Job Server, save them on the distributed filesystem, and then start up a job for building the runtime Docker image that gets scheduled by Kubernetes to run during training.

                -

                There are some benefits for building runtime Docker image on JobServer:

                -
                  -
                • On Paddle Cloud, users will run the trainer code in a Jupyter Notebook which is a Kubernetes Pod, if we want to execute docker build in the Pod, we should mount the host’s docker.sock to the Pod, user’s code will connect the host’s Docker Engine directly, it’s not safe.
                • -
                • Users only need to upload the training package files, does not need to install docker engine, docker registry as dependencies.
                • -
                • If we want to change another image type, such as RKT, users do not need to care about it.
                • -
                -
              • -
              • Deploy Parameter Server, Trainer and Master Processes

                -

                POST /v1/trainer/job receives the distributed training parameters, and deploy the job as follows:

                -
                  -
                • Deploy PaddlePaddle Parameter Server processes, it’s a Kubernetes ReplicaSet.
                • -
                • Deploy PaddlePaddle Trainer processes, it’s a Kubernetes Job.
                • -
                • Deploy PaddlePaddle Master processes, it’s a Kubernetes ReplicaSet.
                • -
                -
              • -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/concurrent_programming.html b/develop/doc/design/concurrent_programming.html deleted file mode 100644 index f17ac541ae7ab9179a871413b8795167246a8f5e..0000000000000000000000000000000000000000 --- a/develop/doc/design/concurrent_programming.html +++ /dev/null @@ -1,413 +0,0 @@ - - - - - - - - - - - - - Design Doc: Concurrent Programming with Fluid — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Concurrent Programming with Fluid

              -

              With PaddlePaddle Fluid, users describe a program other than a model. The program is a ProgramDesc protobuf message. TensorFlow/MxNet/Caffe2 applications generate protobuf messages too, but their protobuf messages represent the model, a graph of operators, but not the program that trains/uses the model.

              -

              Many know that when we program TensorFlow, we can specify the device on which each operator runs. This allows us to create a concurrent/parallel AI application. An interesting questions is how does a ProgramDesc represents a concurrent program?

              -

              The answer relies on the fact that a ProgramDesc is similar to an abstract syntax tree (AST) that describes a program. So users just program a concurrent program that they do with any concurrent programming language, e.g., Go.

              -
              -

              An Analogy

              -

              The following table compares concepts in Fluid and Go

              -

              | Go | Fluid | -|—-|——-| -|user-defined functions | layers | -| control-flow and built-in functions | intrinsics/operators | -| goroutines, channels | class ThreadPool | -| runtime | class Executor |

              -
              -
              -

              An Example Concurrent Program

              -

              To review all above concepts in an example, let us take a simple program and writes its distributed version.

              -

              Suppose that we want to parallelize a naive Fluid program (written in Go and calling Fluid’s Go binding) that multiplies two tensors.

              -
              import "fluid"
              -
              -func paddlepaddle() {
              -  X = fluid.read(...)
              -  W = fluid.Tensor(...)
              -  Y = fluid.mult(X, W)
              -}
              -
              -
              -

              Please be aware that the Fluid’s Go binding provides the default main function, which calls the paddlepaddle function, which, in this case, is defined in above program and creates the following ProgramDesc message.

              -
              message ProgramDesc {
              -  block[0] = Block {
              -    vars = [X, W, Y],
              -    ops = [
              -      read(output = X)
              -      assign(input = ..., output = W)
              -      mult(input = {X, W}, output = Y)
              -    ],
              -  }
              -}
              -
              -
              -

              Then, the default main function calls fluid.run(), which creates an instance of the class Executor and calls Executor.Run(block[0]), where block[0] is the first and only block defined in above ProgramDesc message.

              -

              The default main function is defined as follows:

              -
              func main() {
              -  paddlepaddle()
              -  fluid.run()
              -}
              -
              -
              -
              -
              -

              The Concurrent Version

              -

              By parallelizing the above program, we could support very big tensor X by splitting into small pieces {x_1, x_2, ...} and sent each piece to worker process/node for parallel multiplication.

              -

              In this case, we can write a transpiler that takes a ProgramDesc message that represents the above example program and outputs two ProgramDesc messages, one for running on the master process/node, and the other one for worker processes/nodes.

              -
              -

              The Master Program

              -

              The master program could look like the following:

              -
              message ProgramDesc {
              -  block[0] = Block {
              -    vars = [X, L, Y],
              -    ops = [
              -      read(output = X)
              -      kube_get_workers_addrs(output = L)
              -      Y = tensor_array(len(L))
              -      parallel_for(input = X, output = Y, 
              -                   attrs = {L, block_id(1)}) # referring to block 1
              -    ]
              -  }
              -  
              -  block[1] = Block {
              -    parent = 0,
              -    vars = [x, y, index],
              -    ops = [
              -      slice(input = [X, index], output = x) # index is initialized by parallel_for
              -      send(input = x, attrs = L[index])
              -      recv(outputs = y, attrs = L[index])
              -      assign(input = y, output = Y[index])
              -    ]
              -  }
              -}
              -
              -
              -

              The equivalent Fluid program (calling the Go binding) is:

              -
              func main() {  //// block 0
              -  X = fluid.read(...)
              -  L = fluid.k8s.get_worker_addrs()
              -  Y = fluid.tensor_array(len(L))
              -  fluid.parallel_for(X, L, 
              -                     func(index int) {  //// block 1
              -                       x = X[index]
              -                       fluid.send(L[index], x)
              -                       y = fluid.recv(L[index])
              -                       Y[index] = y
              -                     })
              -}
              -
              -
              -

              An explanation of the above program:

              -
                -
              • fluid.k8s is a package that provides access to Kubernetes API.
              • -
              • fluid.k8s.get_worker_addrs returns the list of IP and ports of all pods of the current job except for the current one (the master pod).
              • -
              • fluid.tensor_array creates a tensor array. fluid.parallel_for creates a ParallelFor intrinsic, which, when executed,
                  -
                1. creates len(L) scopes, each for the concurrent running of the sub-block (block 1 in this case), and initializes a variable named “index” in the scope to an integer value in the range [0, len(L)-1], and
                2. -
                3. creates len(L) threads by calling into the ThreadPool singleton, each thread
                    -
                  1. creates an Executor instance, and
                  2. -
                  3. calls Executor.Run(block), where block is block 1 as explained above.
                  4. -
                  -
                4. -
                -
              • -
              -
                -
              1. Please be aware that block 1 is a sub-block of block 0, so ops in block 1 could refer to variables defined in block 0.
              2. -
              -
              -
              -

              The Worker Program

              -

              The worker program looks like

              -
              func main() {
              -  W = Tensor(...)
              -  x = fluid.listen_and_do(
              -        fluid.k8s.self_addr(),
              -        func(input Tensor) {
              -          output = fluid.mult(input, W)
              -        })
              -}
              -
              -
              -

              where

              -
                -
              • fluid.listen_and_do creates a ListenAndDo intrinsic, which, when executed,
                  -
                1. listens on the current pod’s IP address, as returned by fliud.k8s.self_addr(),
                2. -
                3. once a connection is established,
                    -
                  1. creates a scope of two parameters, “input” and “output”,
                  2. -
                  3. reads a Fluid variable and saves it into “input”,
                  4. -
                  5. creates an Executor instance and calls Executor.Run(block), where the block is generated by running the lambda specified as the second parameter of fluid.listen_and_do.
                  6. -
                  -
                4. -
                -
              • -
              -
              -
              -
              -

              Summarization

              -

              From the above example, we see that:

              -
                -
              1. Fluid enables the imperative programming paradigm by:
                  -
                1. letting users describe a program, but not a model (a sequence of layers, or a graph of operators), and
                2. -
                3. call the fluid.run function that runs the program implicitly.
                4. -
                -
              2. -
              3. The program is described as a ProgramDesc protobuf message.
              4. -
              5. Function Executor.Run takes a block, instead of a ProgramDesc, as its parameter.
              6. -
              7. fluid.run calls Executor.Run to run the first block in the ProgramDesc message.
              8. -
              9. Executor.Run‘s implementation is extremely simple – it doesn’t plan the execution nor create threads; instead, it runs on the current thread and execute intrinsics/operators’ Run method sequentially as they appear in the Block.ops array.
              10. -
              11. Intrinsics/operators’ Run method might create threads. For example, the ListenAndDo operator creates a thread to handle each incoming request.
              12. -
              13. Threads are not necessarily OS thread; instead, they could be green threads managed by ThreadPool. Multiple green threads might run on the same OS thread. An example green threads is Go’s goroutines.
              14. -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/cpp_data_feeding.html b/develop/doc/design/cpp_data_feeding.html deleted file mode 100644 index cdc3b0ecf465aa49397098187e5148c74264b35c..0000000000000000000000000000000000000000 --- a/develop/doc/design/cpp_data_feeding.html +++ /dev/null @@ -1,322 +0,0 @@ - - - - - - - - - - - - - C++ Data Feeding — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              C++ Data Feeding

              -

              In training with Paddle V2 API, data feeding wholly dependents on Python code. To get rid of the Python environment and achieve the goal of “wrapping the whole training by a while loop op” in Paddle Fluid, a C++ data feeding mechanism is required.

              -

              In this document we show the fundamental design of C++ data feeding process, which includes the data reading, shuffling and batching.

              -
              -

              Reader

              -

              A new concept named ‘Reader’ is introduced. Reader is a series of inherited classes which can be hold by our Variable and they are used to read or process file data.

              -
              -

              ReaderBase

              -

              ReaderBase is the abstract base class of all readers. It defines the all readers’ interfaces.

              -
              class ReaderBase {
              - public:
              -  explicit ReaderBase(const std::vector<DDim>& shapes) : shapes_(shapes) {
              -    PADDLE_ENFORCE(!shapes_.empty());
              -  }
              -  // Read the next batch of data. (A 'batch' can be only one instance)
              -  virtual void ReadNext(std::vector<LoDTensor>* out) = 0;
              -  // Show whether the next bacth exists.
              -  virtual bool HasNext() const = 0;
              -  
              -  // Reinitialize the reader and read the file from the begin.
              -  virtual void ReInit() = 0;
              -  
              -  // Get a certain read in data's shape.
              -  DDim shape(size_t idx) const;
              -  // Get shapes of all read in data.
              -  std::vector<DDim> shapes() const { return shapes_; }
              -  // Set shapes of read in data.
              -  void set_shapes(const std::vector<DDim>& shapes) { shapes_ = shapes; }
              -
              -  virtual ~ReaderBase() {}
              -
              - protected:
              -  std::vector<DDim> shapes_;
              -};
              -
              -
              -
              -
              -

              FileReader and DecoratedReader

              -

              These two classes are derived from the ReaderBase and will further be derived by respective specific readers. That is to say, in our design, there are two kinds of readers: file readers and decorated readers. A file reader reads from a file of some specific format, and yield only one instance of data at a time. e.g. RecordIO reader, jpg reader, .... A decorated reader takes another reader(both file reader and decorated reader are OK) as its ‘underlying reader’. It gets data from its underlying reader, does some process on them(shuffling, or batching), then yields processed data. The output data of a decorated reader can be a single instance or a batch. ShuffleReader and BatchReader are both decorated readers.

              -

              All the readers share exactly the same interfaces defined in ReaderBase. So they can be decorated for more than one time: We can shuffle a reader’s outputs and then batch the shuffle outputs. The interface consistency also allows related ops use readers without knowing what they are exactly.

              -
              -
              -

              ReaderHolder

              -

              Different readers belong to different class types. It leads to a problem: How can we drop them into Variables and fetch them out by a unified method? For example, if a Variable holds a BatchReader, we can not get it by the following code:

              -
              var->Get<ReaderBase>("batch_reader");
              -
              -
              -

              we have to write:

              -
              var->Get<BatchReader>("batch_reader");
              -
              -
              -

              This requires each time getting a reader from a variable we must know the reader’s type exactly. It is nearly impossible.

              -

              To solve this problem, we introduce ReaderHolder as a wrapper. It acts as an empty decorator of ReaderBase, which erases reader’s type. With ReaderHolder we are able to fetch all types of readers by var->Get<ReaderHolder>("...") and regard the obtained object as a reader.

              -
              -
              - -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/csp.html b/develop/doc/design/csp.html deleted file mode 100644 index 38bc562da48d6998a986f8c1549057be0e529394..0000000000000000000000000000000000000000 --- a/develop/doc/design/csp.html +++ /dev/null @@ -1,451 +0,0 @@ - - - - - - - - - - - - - Design Doc: CSP in PaddlePaddle Fluid — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: CSP in PaddlePaddle Fluid

              -
              -

              Motivation

              -

              Concurrent programming is important for deep learning. Few example applications are:

              -
                -
              1. The main thread keeps reading the next mini-batch while another thread uses the GPU for computing.
              2. -
              3. The main thread performs the computation while another thread uploads the local gradients from each trainer to the parameter server.
              4. -
              -

              Most DL systems, including TensorFlow, Caffe2, and MxNet, can asynchronously execute operators in a graph. However, Fluid doesn’t have the concept of a graph at all, as the design goal of Fluid is that of a programming language.

              -
              -
              -

              Concurrent Programming Models

              -

              There were many concurrent programming models, implemented in various forms:

              -

              | concurrent programming model | implementation | -|—–|—–| -| mutex | types and functions in standard libraries | -| semaphore | types and functions in standard libraries | -| communicating sequential processes (CSP) | Go programming language | -| actor model | Erlang programming language | -| message passing | MPI | -| bulk synchronous parallel (BSP) | Pregel distributed programming framework |

              -

              Since Fluid was designed to be a programming language, we would like to implement CSP in Fluid.

              -
              -

              CSP v.s. Actor Model

              -

              A well-known implementation of Actor Model is the Erlang programming language. In Actor Model, processes could send messages to another process and receive messages from another process given the process IDs. We can find the three ingredients, process with ID, send, and recv, in MPI too. Indeed, we can rewrite Erlang programs in Python + MPI with possibly fewer lines of code. Our concern with Actor Model is that it doesn’t seem reasonable to implement process management in a programming language’s runtime library; instead, it should be the operating systems’ responsibility to manage processes and libraries like MPI for send/recv.

              -
              -
              -
              -

              CSP in Fluid

              -

              Fluid has two fundamental control-flows: if-else and while. If we are to implement CSP, we need the following:

              -
                -
              1. a new data type: channel and operators send and recv,
              2. -
              3. goroutine or thread, and
              4. -
              5. a new control-flow: select.
              6. -
              -

              We also need Python wrappers for the above components.

              -

              The type channel is conceptually the blocking queue. In Go, its implemented is a blocking circular queue, which supports send and recv.

              -

              The select operation has been in OS kernels long before Go language. All Unix kernels implement system calls poll and select. They monitor multiple file descriptors to see if I/O is possible on any of them. This takes O(N) time. Since Linux 2.6, a new system call, epoll, can do the same in O(1) time. In BSD systems, there is a similar system call kqueue. Go’s Linux implementation uses epoll.

              -

              It might be a good idea to implement Fluid’s select using epoll too. In this design doc, we start from the O(N) way so that we could focus on Python binding and the syntax.

              -
              -

              Type Channel

              -

              Fluid supports many data types:

              -
                -
              1. Tensor,
              2. -
              3. Row-sparse Tensor
              4. -
              5. LoD Tensor,
              6. -
              7. Tensor array, etc
              8. -
              -

              Each data type is registered in the framework.proto as an enum value. To add a new type channel, we need to add a new type enum.

              -

              To expose a C++ type to Python, we need to edit the pybind.cc file. Here is an example how we expose C++ class LoDTensor.

              -
              -
              -
              -

              Syntax Design

              -
              -

              Create Channel

              -

              In Go, we create a channel by specifying the element type and buffer size:

              -
              ch  := make(chan int)       // a channel without buffer
              -ch1 := make(chan int, 100)  // a channel that can buffer 100 ints.
              -
              -
              -

              In Fluid, we should be able to do the same:

              -
              ch  = fluid.make_channel(dtype=INT)
              -ch1 = fluid.make_channel(dtype=INT, 100)
              -
              -
              -

              In addition to that, we want channels that can hold more complex element types, e.g., Tensors of float16:

              -
              ch = fluid.make_channel(dtype=Tensor, etype=float16)
              -
              -
              -

              or Tensors of Tensors of float16 etc.

              -

              The point here is that we need a consistent way to compose types, like in C++ we can have Tensor<Tensor<...<float16>...> >.

              -
              -
              -

              Send and Recv

              -

              Go’s CSP implementation depends on data type channel. There are two types of channels:

              -
                -
              1. The unblocked channel, or buffered channel, is a blocking queue with a non-zero sized buffer. The sending to buffered channel blocks if the buffer is full, and the receive operation blocks if the buffer is empty.
              2. -
              3. blocked channel, or unbuffered channel, is a blocking queue with no buffer. Both sending and receiving block with unbuffered channels.
              4. -
              -

              There are four types of actions with a channel:

              -
                -
              1. Create a channel

                -
                ch := make(chan int) // this is an unbuffered channel
                -ch := make(chan int, 100) // this is a buffered channel of 100 ints.
                -
                -
                -
              2. -
              3. Send

                -
                ch <- 111
                -
                -
                -
              4. -
              5. Recv

                -
                y, ok <- ch
                -
                -
                -
              6. -
              7. Close

                -
                close(ch)
                -
                -
                -

                Please be aware that a closed channel is not a nil channel, which is var ch chan int.

                -
              8. -
              -

              There are some axioms with channels:

              -
                -
              1. A send to a nil channel blocks forever
              2. -
              3. A receive from a nil channel blocks forever
              4. -
              5. A send to a closed channel panics
              6. -
              7. A receive from a closed channel returns the residual values and then zeros.
              8. -
              -

              In Fluid, we have buffered channels and unbuffered channels

              -

              The following program illustrates the Python syntax for accessing Fluid buffers.

              -
              import fluid
              -
              -buffer_size = 10
              -ch = fluid.make_channel(dtype=INT, buffer_size)
              -
              -# Now write three elements to the channel
              -with fluid.while(steps=buffer_size):
              -  fluid.send(ch, step)
              -
              -fluid.close_channel(ch)
              -
              -with fluid.while(steps=buffer_size):
              -  fluid.print(fluid.recv(ch))
              -
              -
              -

              The following example shows that to avoid the always-blocking behavior of unbuffered channels, we need to use Fluid’s goroutines.

              -
              import fluid
              -
              -ch = fluid.make_channel(dtype=INT)
              -
              -with fluid.go():
              -  fluid.send(ch)
              -
              -y = fluid.recv(ch)
              -
              -fluid.close_channel(ch)
              -
              -
              -
              -
              -

              Select

              -

              In Go, the select statement lets a goroutine wait on multiple communication operations. A select blocks until one of its cases can run, then it executes that case. It chooses one at random if multiple are ready.

              -
              ch1  := make(chan int)       
              -ch2  := make(chan int, 100)
              -
              -x := 0
              -
              -for {
              -    select {
              -    case ch1 <- x:
              -      x := x + 1
              -    case y <- ch2:
              -      fmt.Println("Received on channel")
              -    default:
              -      fmt.Println("Default")
              -    }
              -  }
              -
              -
              -

              In Fluid, we should be able to do the same:

              -
              ch1  = fluid.make_chan(dtype=INT)
              -ch2 = fluid.make_chan(dtype=INT, 100)
              -
              -sel = fluid.select()
              -
              -with sel.case(ch1, 'w', X):
              -    fluid.layers.increment(X)
              -
              -with sel.case(ch2, 'r', Y):
              -    fluid.print("Received on Channel")
              -
              -with sel.default():
              -    fluid.print("Default")
              -
              -
              -
              -

              In the above code snippet, X and Y are variables. Now let us look at each of these statements one by one.

              -
                -
              • sel.case(ch1, 'w', X) : This specifies that we are writing to ch1 and we want to write the integer in variable X to the channel. The character w is used here to make the syntax familiar to write syntax in Python I/O.
              • -
              • sel.case(ch2, 'r', Y) : This specifies that we would like to read the result from ch2 into variable Y. The character r is used here to make the syntax familiar to read syntax in Python I/O.
              • -
              • sel.default() : This is equivalent to the default in Go select. If none of the channels are ready for read or write, then the fluid code in the default block will be executed.
              • -
              -
              -
              -
              -

              Example Programs

              -
              -

              1. RPC between Trainers and Parameter Servers

              -
              -
              -

              2. Concurrent Minibatch Loading

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/dist_refactor/distributed_architecture.html b/develop/doc/design/dist_refactor/distributed_architecture.html deleted file mode 100644 index 0e2b9e754d3854e05b991db24c401fe3d122dbba..0000000000000000000000000000000000000000 --- a/develop/doc/design/dist_refactor/distributed_architecture.html +++ /dev/null @@ -1,424 +0,0 @@ - - - - - - - - - - - - - Design Doc: Distributed Training Architecture — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Distributed Training Architecture

              -
              -

              Abstract

              -

              PaddlePaddle version 0.10.0 uses the “trainer-parameter server” architecture. We run multiple instances of trainers (where each trainer runs the same model) and parameter servers for distributed training. This architecture serves well, but has few limitations:

              -
                -
              1. There is a need to write special code that handles tasks which should only be run on a single trainer. E.g., initializing the model, saving the model etc.
              2. -
              3. Model parallelism is hard: It would need all the if-else branches conditioned on the trainer ID to partition the model onto the trainers, and eventually manually writing out the inter-model-shard communication code to communicate between different trainers.
              4. -
              5. The user can not directly specify the parameter update rule: This would need to modify the parameter server code and compile a new binary. This makes things more complicated for researchers: A lot of extra effort is required to make this work. Besides, the training job submission program may not allow running arbitrary binaries.
              6. -
              -

              This design doc discusses PaddlePaddle’s new distributed training architecture that addresses the above mentioned limitations.

              -
              -
              -

              Analysis

              -

              The assumption is that the user writes the trainer program in either Python or C++.

              -
              -

              Limitation 1

              -

              There are two basic functionalities in the trainer program:

              -
                -
              1. The training logic such as loading / saving the model and printing out the logs.
              2. -
              3. The neural network definition such as the definition of the data layer, the fully connected layer, the cost function and the -optimizer.
              4. -
              -

              When we train using PaddlePaddle v0.10.0 in a distributed fashion, multiple instances of the same Python code are run on different nodes, hence both: the -training logic as well as the neural network computation logic, is replicated.

              -

              The tasks that only need to be run once belong to the training logic. Hence if we only replicate the neural network computation part, and do not -replicate the training logic, the limitation mentioned above can be avoided.

              -
              -
              -

              Limitation 2

              -

              Model parallelism means that a single model is partitioned into different components and each node runs one of the component separately. This comes at the extra cost of managing the -inter-model-shard communication between nodes.

              -

              PaddlePaddle should ideally be able to modify the neural network computation and figure out the support for model parallelism automatically. However, the -computation is only specified in Python code which sits outside of PaddlePaddle, hence PaddlePaddle can not support the feature in this setup.

              -

              Similar to how a compiler uses an intermediate representation (IR) so that the programmer does not need to manually optimize their code for most of the cases, we can have an intermediate representation in PaddlePaddle as well. The compiler optimizes the IR as follows:

              -

              -

              PaddlePaddle can support model parallelism by converting the IR so that the user no longer needs to manually perform the computation and operations in the Python component:

              -

              -

              The IR for PaddlePaddle after refactoring is called a Block, it specifies the computation dependency graph and the variables used in the computation.

              -
              -
              -

              Limitation 3

              -

              The user can not directly specify the parameter update rule for the parameter server in the Python module, since the parameter server does not use the same computation definition as the trainer. Instead, the update rule is baked inside the parameter server. The user can not specify the update rule explicitly.

              -

              This could be fixed by making the parameter server also run an IR, which can be different to the trainer side -For a detailed explanation, refer to this document - -Design Doc: Parameter Server

              -
              -
              -
              -

              Distributed Training Architecture

              -

              The revamped distributed training architecture can address the above discussed limitations. Below is the illustration of how it does so:

              -

              -

              The major components are: Python API, Distribute Transpiler and Remote Executor.

              -
              -

              Python API

              -

              Python API is the Python library that user’s Python code invokes, to read the data, build the neural network topology, and start training, etc.

              -
              images = fluid.layers.data(name='pixel', shape=[1, 28, 28], dtype='float32')
              -label = fluid.layers.data(name='label', shape=[1], dtype='int64')
              -...
              -predict = fluid.layers.fc(input=conv_pool_2, size=10, act="softmax")
              -cost = fluid.layers.cross_entropy(input=predict, label=label)
              -avg_cost = fluid.layers.mean(x=cost)
              -optimizer = fluid.optimizer.Adam(learning_rate=0.01)
              -optimizer.minimize(avg_cost)
              -
              -train_reader = paddle.batch(
              -    paddle.reader.shuffle(
              -        paddle.dataset.mnist.train(), buf_size=500),
              -    batch_size=BATCH_SIZE)
              -
              -place = fluid.CPUPlace()
              -exe = fluid.Executor(place)
              -
              -for pass_id in range(10):
              -    for data in train_reader():
              -        loss, acc = exe.run(trainer_prog,
              -                            feed=feeder.feed(data),
              -                            fetch_list=[avg_cost])
              -
              -
              -

              The code above is a typical local training program, the “Training Program” is built using helper functions such as -fluid.layer.fc. The training is done by calling Executor.run -iteratively.

              -

              For more details, the implementation of IR is Program, and ProgramDesc is the protobuf type.

              -

              Executor simply runs the ProgramDesc. For local training you generally use -Executor to run the program locally. For any kind of distributed training, you can use -RemoteExecutor to specify desired distributed training method with some optional arguments.

              -
              -
              -

              Distributed Transpiler

              -

              The Distributed Transpiler automatically converts the IR (in protobuf format) to partitioned IRs. Then -the Remote Executor dispatches the new IRs to Remote Executors across the cluster. -Below are the steps that are followed :

              -
                -
              1. User only need to change Executor to RemoteExecutor to change local program to distributed program.
              2. -
              3. RemoteExecutor calls Distributed Transpiler to “transpile” user’s program to several IRs representing a -distributed training program:
                  -
                1. Parse configurations from RemoteExecutor.
                2. -
                3. Determine the type of distributed program, can be DataParallelism, ModelParallelism or Streaming.
                4. -
                5. Partition the ProgramDesc according to type and add send / recv OP pair on the boundaries. Take -DataParallelism type for example, it removes the optimization operators and add a send OP to the -“trainer” role, then add the optimization operators to the parameter server role within the recv OP.
                6. -
                -
              4. -
              5. Dispatch the partitioned graph to different RemoteExecutor in the cluster.
              6. -
              7. RemoteExecutor on each node run the received ProgramDesc utill the end.
              8. -
              -
              -
              -

              RemoteExecutor

              -

              As shown in the graph, RemoteExecutor.run sends the IR to the cluster for Execution. -You can also use parameter fetch_list to interactively fetch variable back to local for -log printing.

              -

              The Python RemoteExecutor is derived from Executor class.

              -
              exe = RemoteExecutor(
              -    feed=feeder.feed(data),
              -    fetch_list=[avg_cost],
              -    job_desc=JobDesc(
              -      jobname,
              -      num_trainer,
              -      num_pserver,
              -      cpu_per_trainer,
              -      gpu_per_trainer,
              -      mem_per_trainer,
              -      cpu_per_pserver,
              -      mem_per_pserver
              -    ))
              -for data in train_reader():
              -    loss, acc = exe.run(trainer_prog,
              -                        feed=feeder.feed(data),
              -                        fetch_list=[avg_cost])
              -
              -
              -

              JobDesc object describe the distributed job resource specification to run on -Cluster environment.

              -

              -

              RemoteExecutor.run sends the ProgramDesc and -TrainingJob -to a server in the cluster which executes RemoteExecutor.listen. This server is responsible -to start the final Kubernetes Jobs to run the different role of ProgramDesc from ConfigMap.

              -
              -
              -

              Placement Algorithm

              -

              Our first implementation will only support “trainer-parameter server” placement: the parameters, initializers, and optimizers are all placed on the PaddlePaddle runtimes with the parameter server role. Everything else will be placed on the PaddlePaddle runtimes with the trainer role. This has the same functionality as the “trainer-parameter server” architecture of PaddlePaddle v0.10.0, but is more generic and flexible.

              -

              In the future, a more general placement algorithm should be implemented, which makes placements according to the input IR, and a model of device computation time and device communication time. Model parallelism requires the generic placement algorithm.

              -
              -
              -

              Local Training Architecture

              -

              The local training architecture will be the same as the distributed training architecture, the difference is that everything runs locally, and there is just one PaddlePaddle runtime:

              -

              -
              -
              -

              Training Data

              -

              In PaddlePaddle v0.10.0, training data is typically read -with data reader from Python. This approach is -no longer efficient when training distributedly since the Python -process no longer runs on the same node with the trainer processes, -the Python reader will need to read from the distributed filesystem -(assuming it has the access) and send to the trainers, doubling the -network traffic.

              -

              When doing distributed training, the user can still use Python data -reader: the training data are sent with Executor.run. However, should -be used for debugging purpose only. The users are encouraged to use -the read data OPs.

              -
              -
              - -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/dist_refactor/multi_cpu.html b/develop/doc/design/dist_refactor/multi_cpu.html deleted file mode 100644 index 63c35fc79f032b271e8b85dbfa27f5c748eb76f1..0000000000000000000000000000000000000000 --- a/develop/doc/design/dist_refactor/multi_cpu.html +++ /dev/null @@ -1,306 +0,0 @@ - - - - - - - - - - - - - Design Doc: Execute the Program with Multi CPU — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Execute the Program with Multi CPU

              -
              -

              Abstract

              -

              This Design Doc propose an approach to make the user-defined Op graph -running with multi-CPU, we will use an auto transpiler to convert the user-defined -Op graph to a multi-CPU Op graph, and run ParallelDo Op to run the graph.

              -
              -
              -

              Transpiler

              -

              -

              After converted:

              -

              -
              -
              -

              Implement

              -
                -
              • Multi-CPU Transpiler will convert the graph to a multi-CPU graph -which would be executed with multi-threads.

                -
              • -
              • BlockingCounter will Init/Decrement an atomic counter, and Blocking Wait -for the atomic counter become 0:

                -
                BlockingCounter bc(thread_count);
                -for (int i = 0; i < thread_count; ++i) {
                -  thread_pool->Start([&bc] {bc.DecrementCount(); })
                -}
                -bc.Wait();
                -
                -
                -
              • -
              • ParallelDo Operator

                -
                  -
                • Initialize a thread pool which is a Singleton.
                • -
                • Use a block id as the input, and create run the specify Block on independent scope -with multi-threads.
                • -
                • Initialize a BlockingCounter instance and wait until all threads are done.
                • -
                -
              • -
              • Split Operator will split the Input Tensor into a TensorArray.

                -
              • -
              • Merge merge all the gradients which calculated in different threads -with mean/sum/max/min... method, and then run the Optimizer Op to optimize W.

                -
              • -
              -
              -
              -

              TODO

              -
                -
              • Improve the optimizer stage with multi-threads, since we could -assign the parameters to the different threads and execute -optimizer with multi-threads.
              • -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/dist_refactor/parameter_server.html b/develop/doc/design/dist_refactor/parameter_server.html deleted file mode 100644 index 3b09173d80d531339827869f24e7b3052c91ab2a..0000000000000000000000000000000000000000 --- a/develop/doc/design/dist_refactor/parameter_server.html +++ /dev/null @@ -1,354 +0,0 @@ - - - - - - - - - - - - - Design Doc: Parameter Server — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Parameter Server

              -
              -

              Abstract

              -

              We propose an approach to implement the parameter server. In this -approach, there is no fundamental difference between the trainer and -the parameter server: they both run subgraphs, but subgraphs of -different purposes.

              -
              -
              -

              Background

              -

              The previous implementations of the parameter server do not run a -fluid sub-program. Parameter initialization, optimizer computation, network -communication and checkpointing are implemented twice on both the -trainer as well as the parameter server.

              -

              It would be great if we can write code once and use them on both: the -trainer and the parameter server, since this reduces code duplication and -improves extensibility. Given that after the current refactoring, we are -representing everything as a computation graph on the -trainer. Representing everything as a computation graph on the parameter -server becomes a natural extension.

              -
              -
              -

              Design

              -
              -

              Distributed Transpiler

              -

              The Distributed Transpiler converts the user-defined fluid program -into sub-programs to be scheduled on different nodes with the following -steps:

              -
                -
              1. OP placement: the OPs will be placed on different nodes according -to a heuristic that minimizes the estimated total computation -time. Currently we will use a simple heuristic that puts parameter -variable on parameter server workers and everything else on trainer -workers.
              2. -
              3. Add communication OPs to enable the communication between nodes.
              4. -
              -

              We will need these OPs: Send, Recv, Enqueue, Dequeue.

              -

              Below is an example of converting the user defined graph to the -subgraphs for the trainer and the parameter server:

              -

              -

              After converting:

              -

              -
                -
              1. The parameter variable W and its optimizer program are placed on the parameter server.
              2. -
              3. Operators are added to the program.
                  -
                • Send sends data to the connected Recv operator. The -scheduler on the receive node will only schedule Recv operator -to run when the Send operator has ran (the Send OP will mark -the Recv OP runnable automatically).
                • -
                • Enqueue enqueues the input variable, it can block until space -become available in the queue.
                • -
                • Dequeue outputs configurable numbers of tensors from the -queue. It will block until the queue has the required number of -tensors.
                • -
                -
              4. -
              -
              -
              -

              Benefits

              -
                -
              • Model parallelism becomes easier to implement: it is an extension to -the trainer - parameter server approach. We can have several “Transpilers” -to achieve different goals.
              • -
              • User-defined optimizer is easier to add - user can now express it as -a sub-program.
              • -
              • No more duplication logic inside the trainer and the parameter -server mentioned in the background section.
              • -
              -
              -
              -

              Challenges

              -
                -
              • It is important to balance the parameter shards on multiple -parameter servers. If a single parameter is very big (for example: some -word-embedding, fully connected, softmax layer), we need to -automatically partition the single parameter onto different -parameter servers when possible (only element-wise optimizer depends -on the parameter variable).
              • -
              • In the “Async SGD” figure, the “W” variable on the parameter server -could be read and written concurrently. See -here for more -details about concurrent program in Fluid.
              • -
              -
              -
              -

              Discussion

              -
                -
              • Can the Enqueue OP be implemented under our current tensor design -(put the input tensor into the queue tensor)?
              • -
              • Dequeue OP will have variable numbers of output (depending on the -min_count attribute), does our current design support it? (similar -question for the Add OP)
              • -
              -
              - -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/error_clip.html b/develop/doc/design/error_clip.html deleted file mode 100644 index 382711db6366b1c7379f594cf23320f065f54896..0000000000000000000000000000000000000000 --- a/develop/doc/design/error_clip.html +++ /dev/null @@ -1,332 +0,0 @@ - - - - - - - - - - - - - Error Clip — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Error Clip

              -
              -

              Overview

              -

              Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables’ gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next grad_op and be shrunk if necessary.

              -
              -
              -

              Usage

              -

              Users are allowed to assign different error clip methods or attributes to different Variables. Users can specify it as a parameter of Variable‘s constructor:

              -
              var = framework.Variable(..., error_clip=myErrorClip, ...)
              -
              -
              -

              The default value of error_clip is None, which means no error clip is employed. When it’s not None, it should take an object of BaseErrorClipAttr‘s derived class. So far, BaseErrorClipAttr has only one derived class: ErrorClipByValue, whose constructor is:

              -
              ErrorClipByValue(max, min=None)
              -
              -
              -

              max and min represent the maximal and minimal clip threshold respectively. In backward pass, all values of var‘s gradient greater than max or less than min will be clipped to max and min respectively. When the min is None, the minimal threshold will be assigned with -max automatically.

              -

              So we can enable the error clip with threshold [-5.0, 5.0] for variable var by:

              -
              var = framework.Variable(..., error_clip=ErrorClipByValue(max=5.0), ...)
              -
              -
              -
              -
              -

              Implementation

              -

              The BaseErrorClipAttr and its derived class ErrorClipByValue are defined in clip.py.

              -
              class BaseErrorClipAttr(object):
              -    def append_clip_op(self, block, grad_name):
              -        raise NotImplementedError()
              -
              -
              -class ErrorClipByValue(BaseErrorClipAttr):
              -    def __init__(self, max, min=None):
              -        max = float(max)
              -        if min is None:
              -            min = -max
              -        else:
              -            min = float(min)
              -        self.max = max
              -        self.min = min
              -
              -    def append_clip_op(self, block, grad_name):
              -        clip_op_desc = block.desc.append_op()
              -        clip_op_desc.set_type("clip")
              -        clip_op_desc.set_input("X", [grad_name])
              -        clip_op_desc.set_output("Out", [grad_name])
              -        clip_op_desc.set_attr("min", self.min)
              -        clip_op_desc.set_attr("max", self.max)
              -
              -
              -

              The BaseErrorClipAttr have one main member functions: append_clip_op(self, block, grad_name).

              -

              This function is used to create a clip_op and append it to the end of given block. For different error clip algorithm require different clip_op, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.

              -

              These clip_ops should be inserted after grad_ops whose output gradients need to be clipped. It is equivalent to appending some clip_ops to the end of the target block every time a new grad_op is added.

              -
              for op_desc in grad_op_descs:
              -        new_op_desc = target_block.desc.append_op()
              -        new_op_desc.copy_from(op_desc)
              -        callback(block=target_block, context=grad_to_var)
              -
              -
              -

              Here we employ a callback function to complete this kind of jobs. In _append_backward_ops_ function, each time after a grad_op is added to the target_block, a callback function is invoked. The logic of clip_op appending can be implemented inside the callback function.

              -

              The callback function for clip_op appending is defined in clip.py:

              -
              def error_clip_callback(block, context):
              -    # the context is a grad_to_var map
              -    grad_to_var = context
              -    op_desc = block.desc.op(block.desc.op_size() - 1)
              -    for grad_n in filter(lambda n: grad_to_var.has_key(n),
              -                         op_desc.output_arg_names()):
              -        fwd_var = block.var_recursive(grad_to_var[grad_n])
              -        error_clip = getattr(fwd_var, "error_clip", None)
              -        if not (error_clip is None or isinstance(error_clip,
              -                                                 BaseErrorClipAttr)):
              -            raise TypeError(
              -                "Variable's error_clip should be an instance of BaseErrorClipAttr or None."
              -            )
              -        if error_clip is not None:
              -            error_clip.append_clip_op(block, grad_n)
              -
              -
              -

              This function takes a block and a context(which is actually a grad_to_var map) as inputs. It checks each output of the last OpDesc in the block. Notice that the last OpDesc of the block must be a grad_op and its outputs must be some forward variables’ gradients. If an output gradient’s corresponding forward variable has an attribute of error_clip, error_clip_callback will call the error_clip‘s append_clip_op function to append the required clip_op into the block.

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/evaluator.html b/develop/doc/design/evaluator.html deleted file mode 100644 index 11935402b8795df22cf770eda2be99db977674d4..0000000000000000000000000000000000000000 --- a/develop/doc/design/evaluator.html +++ /dev/null @@ -1,312 +0,0 @@ - - - - - - - - - - - - - Evaluator Design — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Evaluator Design

              -
              -

              Problem Statement

              -

              During training or inference, we provide an evaluation function to measure the model performance, for example, accuracy, precision, etc. In the operator based framework design, the data passes through the network pipeline batch by batch. As a result, inside the operator, we only calculate the metrics for one minibatch. Thus, we need to provide a mechanism to calculate the metrics for each N pass/batch the user wants.

              -
              -
              -

              Evaluator Design

              -

              Currently, every operation is expressed in the graph. We divide the evaluator process into three steps.

              -
                -
              1. Initialize the metric state and add it into the block.
              2. -
              3. Calculate the concerned metrics for every mini-batch. The single evaluator operator is only responsible for calculating the necessary statistics for one mini-batch. For example, the accuracy operator only calculates the accuracy for a minibatch data if run once.
              4. -
              -
                -
              1. Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. When it comes to distributed training/Multi-GPU training, aggregate the value from different devices.
              2. -
              -
              -
              -

              Implementation

              -

              This design is shown in the Python API. -Each metric operator needs to caculate the metric statistic and return the batch-aware states. Python side is responsible for accumulating the states for each pass.

              -
              class Evaluator(object):
              -    """
              -    Evaluator Base class.
              -    """
              -    def __init__(self, name, **kwargs):
              -       """
              -       Different evaluator may has different metric states. E.g, Accuracy need two variables, total and right sample counts.
              -       Auc need four variables, `true_positives`,
              -         `true_negatives`, `false_positives` and `false_negatives`. So every evaluator should create its needed variables and append to main_program
              -
              -       The initialization of Evaluator should be responsible for:
              -       create metric states and append to the main_program
              -       """ 
              -       pass
              -
              -    def _update_ops(self, input, label, **kwargs)
              -       """
              -       Add mini-batch evaluator caculate operators to the main_program.
              -       Add increment operator to accumulate the metric states.
              -       """
              -    
              -
              -    def reset(self, executor, reset_program=None):
              -      """
              -      Reset metric states at the begin of each pass/user specified batch number.
              -      Execute the reset_program to reset the states.
              -      """
              -      
              -
              -    def eval(self, executor, eval_program=None):
              -      """
              -      Merge the mini-batch statistics to form the evaluation result for multiple mini-batches.
              -      Execute the eval_program and return the result.
              -      """
              -      return eval_result
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/executor.html b/develop/doc/design/executor.html deleted file mode 100644 index 92243f88022eafcb96ed585d31ce2e5ad2345f85..0000000000000000000000000000000000000000 --- a/develop/doc/design/executor.html +++ /dev/null @@ -1,284 +0,0 @@ - - - - - - - - - - - - - Executor Design Doc — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Executor Design Doc

              -
              -

              Motivation

              -

              In fluid, we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message -ProgramDesc that describes the process and is conceptually like an abstract syntax tree.

              -

              The executor runs the ProgramDesc like an interpreter. ProgramDesc contains the intrinsics (operators in this case) and variables which will be used, executor explicitly executes the stored precompiled code.

              -
              -
              -

              Overview

              -

              An executor takes a ProgramDesc, a block_id and a Scope. The ProgramDesc is a list of blocks and each block contains the protobuf definition of all the parameters and operators in the block. The block_id specifies the entrance block. And the Scope is the container of all the variable instances, which is persistent throughout different runs.

              -
              -
              -

              Executor

              -

              The Executor explicitly executes all the intrinsics (operators here) in the block_idth block of a ProgramDesc. Essentially, it instantiates Variables and Operators, then runs all the operators in sequence one-by-one. -It is very similar to how a push stack frame works when entering a block, following which it cleans up all the temporary variables when a mini-batch is finished. It does not however, have the stack frame pop process.

              -
              -

              The interface

              -
                Executor(places);
              -
              -
              -

              A executor does not own any computing resources, a user can only construct an executor using the specified places.

              -
              -
              -

              Running an Executor

              -
                void Run(ProgramDesc, Scope, block_id, create_local_scope);
              -
              -
              -

              An Executor only provides a unified way to execute ProgramDesc. ProgramDesc is the target that will be executed, the Scope specifies the variable container, the block_id indicates the entrance block and create_local_scope is a boolean that states whether it will destroy the temporary variables after the execution is finished.

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/file_manager/README.html b/develop/doc/design/file_manager/README.html deleted file mode 100644 index 8160b3c67dbcb3b251329608f4a7ae6f927dc9c1..0000000000000000000000000000000000000000 --- a/develop/doc/design/file_manager/README.html +++ /dev/null @@ -1,365 +0,0 @@ - - - - - - - - - - - - - FileManager设计文档 — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              FileManager设计文档

              -
              -

              目标

              -

              在本文档中,我们设计说明了名为FileManager系统,方便用户上传自己的训练数据以进行分布式训练

              -

              主要功能包括:

              -
                -
              • 提供常用的命令行管理命令管理文件和目录
              • -
              • 支持大文件的断点上传、下载
              • -
              -
              -
              -

              名词解释

              -
                -
              • PFS:是Paddlepaddle cloud File System的缩写,是对用户文件存储空间的抽象,与之相对的是local filesystem。目前我们用CephFS来搭建。
              • -
              • CephFS:一个POSIX兼容的文件系统。
              • -
              • Chunk:逻辑划上文件分块的单位。
              • -
              -
              -
              -

              模块

              -
              -

              架构图

              -

              -
              -
              -

              PFSClient

              -
                -
              • 功能: 详细设计link
                  -
                • 提供用户管理文件的命令
                • -
                • 需要可以跨平台执行
                • -
                -
              • -
              • 双向验证PFSClient需要和Ingress之间做双向验证tls,所以用户需要首先在cloud.paddlepaddle.org上注册一下,申请用户空间,并且把系统生成的CA(certificate authority)、Key、CRT(CA signed certificate)下载到本地,然后才能使用PFSClient。
              • -
              -
              -
              -

              Ingress

              -
                -
              • 功能:提供七层协议的反向代理、基于粘性会话的负载均衡功能。
              • -
              • 透传用户身份的办法Ingress需要把PFSClient的身份信息传给PFSServer,配置的方法参考link
              • -
              -
              -
              -

              PFSServer

              -

              PFSServer提供RESTful API接口,接收处理PFSClient端的文件管理请求,并且把结果返回PFSClient端。

              -

              RESTful API

              -
                -
              • /api/v1/files
                  -
                • GET /api/v1/files: Get metadata of files or directories.
                • -
                • POST /api/v1/files: Create files or directories.
                • -
                • PATCH /api/v1/files: Update files or directories.
                • -
                • DELETE /api/v1/files: Delete files or directories.
                • -
                -
              • -
              • /api/v1/file/chunks
                  -
                • GET /api/v1/storage/file/chunks: Get chunks’s metadata of a file.
                • -
                -
              • -
              • /api/v1/storage/files
                  -
                • GET /api/v1/storage/files: Download files or directories.
                • -
                • POST /api/v1/storage/files: Upload files or directories.
                • -
                -
              • -
              • /api/v1/storage/file/chunks
                  -
                • GET /api/v1/storage/file/chunks: Download chunks’s data.
                • -
                • POST /api/v1/storage/file/chunks: Upload chunks’s data.
                • -
                -
              • -
              -
              -
              -
              -

              文件传输优化

              -
              -

              分块文件传输

              -

              用户文件可能是比较大的,上传到Cloud或者下载到本地的时间可能比较长,而且在传输的过程中也可能出现网络不稳定的情况。为了应对以上的问题,我们提出了Chunk的概念,一个Chunk由所在的文件偏移、数据、数据长度及校验值组成。文件的上传和下载都是通过对Chunk的操作来实现的。由于Chunk比较小(默认256K),完成一个传输动作完成的时间也比较短,不容易出错。PFSClient需要在传输完毕最后一个Chunk的时候检查destination文件的MD5值是否和source文件一致。

              -

              一个典型的Chunk如下所示:

              -
              type Chunk struct {
              -    fileOffset int64
              -    checksum uint32
              -    len     uint32
              -    data    []byte
              -}
              -
              -
              -
              -
              -

              生成sparse文件

              -

              当destination文件不存在或者大小和source文件不一致时,可以用Fallocate生成sparse文件,然后就可以并发写入多个Chunk。

              -
              -
              -

              覆盖不一致的部分

              -

              文件传输的的关键在于需要PFSClient端对比source和destination的文件Chunks的checksum是否保持一致,不一致的由PFSClient下载或者传输Chunk完成。这样已经传输成功的部分就不用重新传输了。

              -
              -
              -
              -

              用户使用流程

              -

              参考link

              -
              -
              -

              框架生成

              -

              swagger生成PFSClient和PFSServer的框架部分,以便我们可以把更多的精力放到逻辑本身上。

              -
              - -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/file_manager/pfs/pfsclient.html b/develop/doc/design/file_manager/pfs/pfsclient.html deleted file mode 100644 index 8e2d7f67417d11f36607ee3bb1dff4932ece70e2..0000000000000000000000000000000000000000 --- a/develop/doc/design/file_manager/pfs/pfsclient.html +++ /dev/null @@ -1,391 +0,0 @@ - - - - - - - - - - - - - PFSClient — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              PFSClient

              -
              -

              Description

              -

              The pfs command is a Command Line Interface to manage your files on PaddlePaddle Cloud

              -
              -
              -

              Synopsis

              -
              paddle [options] pfs <subcommand> [parameters]
              -
              -
              -
              -
              -

              Options

              -
              --profile (string)
              -    Use a specific profile from your credential file.
              -
              ---help (string)
              -    Display more information about command
              -
              ---version
              -    Output version information and exit
              -
              ---debug
              -    Show detailed debugging log 
              -    
              ---only-show-errors (boolean) 
              -    Only errors and warnings are displayed. All other output is suppressed.
              -
              -
              -
              -
              -

              Path Arguments

              -

              When using a command, we need to specify path arguments. There are two path argument type: localpath and pfspath.

              -

              A pfspath begin with /pfs, eg: /pfs/$DATACENTER/home/$USER/folder.

              -

              Here is how to config datacenters.

              -
              -
              -

              order of Path Arguments

              -

              Commonly, if there are two path arguments, the first is the source, and the second is the destination.

              -
              -
              -

              Subcommonds

              -
                -
              • rm - remove files or directories
              • -
              -
              Synopsis:
              -    rm [-r] [-v] <PFSPath> ...
              -
              -Options:
              -    -r 
              -        Remove directories and their contents recursively 
              -    -v      
              -        Cause rm to be verbose, showing files after they are removed.
              -    
              -Examples:
              -    paddle pfs rm /pfs/$DATACENTER/home/$USER/file
              -    paddle pfs rm -r /pfs/$DATACENTER/home/$USER/folder
              -
              -
              -
                -
              • mv - move (rename) files
              • -
              -
              Synopsis:
              -    mv [-f | -n] [-v] <LocalPath> <PFSPath>
              -    mv [-f | -n] [-v] <LocalPath> ... <PFSPath>
              -    mv [-f | -n] [-v] <PFSPath> <LocalPath> 
              -    mv [-f | -n] [-v] <PFSPath> ... <LocalPath> 
              -    mv [-f | -n] [-v] <PFSPath> <PFSPath> 
              -    mv [-f | -n] [-v] <PFSPath> ... <PFSPath> 
              -    
              -Options:
              -    -f      
              -        Do not prompt for confirmation before overwriting the destination path.  (The -f option overrides previous -n options.)
              -    -n      
              -        Do not overwrite an existing file.  (The -n option overrides previous -f options.)
              -    -v      
              -        Cause mv to be verbose, showing files after they are moved.
              -        
              -Examples:
              -    paddle pfs mv ./text1.txt /pfs/$DATACENTER/home/$USER/text1.txt
              -
              -
              -
                -
              • cp - copy files or directories
              • -
              -
              Synopsis:
              -    cp [-r] [-f | -n] [-v] [--preserve--links] <LocalPath> <PFSPath>
              -    cp [-r] [-f | -n] [-v] [--preserve--links] <LocalPath> ... <PFSPath>
              -    cp [-r] [-f | -n] [-v] [--preserve--links] <PFSPath> <LocalPath> 
              -    cp [-r] [-f | -n] [-v] [--preserve--links] <PFSPath> ... <LocalPath>
              -    cp [-r] [-f | -n] [-v] [--preserve--links] <PFSPath> <PFSPath> 
              -    cp [-r] [-f | -n] [-v] [--preserve--links] <PFSPath> ... <PFSPath>
              -
              -Options:
              -    -r
              -        Copy directories recursively
              -    -f      
              -        Do not prompt for confirmation before overwriting the destination path.  (The -f option overrides previous -n options.)
              -    -n      
              -        Do not overwrite an existing file.  (The -n option overrides previous -f options.)
              -    -v      
              -        Cause cp to be verbose, showing files after they are copied.
              -    --preserve--links
              -       Reserve links when copy links
              -       
              -Examples:
              -    paddle pfs cp ./file /pfs/$DATACENTER/home/$USER/file
              -    paddle pfs cp /pfs/$DATACENTER/home/$USER/file ./file
              -
              -
              -
                -
              • ls- list files
              • -
              -
              Synopsis:
              -    ls [-r] <PFSPath> ...
              -    
              -Options:
              -    -R
              -        List directory(ies) recursively
              -
              -Examples:
              -    paddle pfs ls  /pfs/$DATACENTER/home/$USER/file
              -    paddle pfs ls  /pfs/$DATACENTER/home/$USER/folder
              -
              -
              -
                -
              • mkdir - mkdir directory(ies) -Create intermediate directory(ies) as required.
              • -
              -
              Synopsis:
              -    mkdir <PFSPath> ...
              -
              -Examples:
              -    paddle pfs mkdir  /pfs/$DATACENTER/home/$USER/folder
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/float16.html b/develop/doc/design/float16.html deleted file mode 100644 index 355a5a79d45a01d91670dd4a5c5ca4b60be4c79c..0000000000000000000000000000000000000000 --- a/develop/doc/design/float16.html +++ /dev/null @@ -1,370 +0,0 @@ - - - - - - - - - - - - - Design Doc: float16 — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: float16

              -
              -

              Why float16

              -

              Half precision (float16) is a binary floating-point format that occupies 16 bits in memory. float16 is half the size of traditional 32-bit single precision format (float) and has lower precision and smaller range.

              -

              When high precision computation is not required, using float16 data type could potentially

              -
                -
              • reduce storage space, memory bandwidth, and power usages;
              • -
              • increase the chance of data fitting into a smaller cache of lower latency;
              • -
              • provide arithmetic speed up if supported by hardware.
              • -
              -
              -
              -

              Survey of current float16 support

              -

              A brief survey of float16 support on different compilers, hardwares, and libraries can be found below. Interested readers can refer to link1 and link2 for more info.

              -

              The goal of float16 is to serve as a key for the executor to find and run the correct version of compute method specialized for float16 in operator kernel. It should be compatible with various natively supported float16 implementations including __half for cuda, float16_t for ARM, and Eigen::half for Eigen to make writing customized float16 kernels easier.

              -
              -

              Compiler

              -
                -
              • nvcc supports __half data type after CUDA 7.5.
              • -
              • __fp16 or float16_t is supported as storage type for gcc >= 6.1 and clang >= 3.4.
              • -
              • __fp16 or float16_t is supported as arithmetic type for gcc >= 7.1 and clang >= 3.9.
              • -
              -
              -
              -

              Hardware

              -
                -
              • __half is supported on GPU with compute capability >= 5.3.
              • -
              • __fp16 is supported as storage type for ARMv7-A, ARMv8-A, and above.
              • -
              • __fp16 is supported as arithmetic type after ARMv8.2-A (currently, the only microarchitecture implementing ARMv8.2-A is ARM Cortex-A75, which is announced in May 2017. There seems to be no application processors currently available on market that adopts this architecture. It is reported that Qualcomm Snapdragon 845 uses Cortex-A75 design and will be available in mobile devices in early 2018).
              • -
              -
              -
              -

              Libraries

              -
                -
              • Eigen >= 3.3 supports float16 calculation on both GPU and CPU using the Eigen::half class. It is mostly useful for Nvidia GPUs because of the overloaded arithmetic operators using cuda intrinsics. It falls back to using software emulation on CPU for calculation and there is no special treatment to ARM processors.
              • -
              • ARM compute library >= 17.02.01 supports NEON FP16 kernels (requires ARMv8.2-A CPU).
              • -
              -
              -
              -

              CUDA version issue

              -

              There are currently three versions of CUDA that supports __half data type, namely, CUDA 7.5, 8.0, and 9.0. -CUDA 7.5 and 8.0 define __half as a simple struct that has a uint16_t data (see cuda_fp16.h) as follows:

              -
              typedef struct __align__(2) {
              -   unsigned short x;
              -} __half;
              -
              -typedef __half half;
              -
              -
              -

              This struct does not define any overloaded arithmetic operators. So you have to directly use __hadd instead of + to correctly add two half types:

              -
              __global__ void Add() {
              -  half a, b, c;
              -  c = __hadd(a, b); // correct
              -  c = a + b; // compiler error: no operator "+" matches these operands
              -}
              -
              -
              -

              CUDA 9.0 provides a major update to the half data type. The related code can be found in the updated cuda_fp16.h and the newly added cuda_fp16.hpp.

              -

              Essentially, CUDA 9.0 renames the original __half type in 7.5 and 8.0 as __half_raw, and defines a new __half class type that has constructors, conversion operators, and also provides overloaded arithmetic operators such as follows:

              -
              typedef struct __CUDA_ALIGN__(2) {
              -    unsigned short x;
              -} __half_raw;
              -
              -
              -struct __CUDA_ALIGN__(2) __half {
              -protected:
              -    unsigned short __x;
              -public:
              -    // constructors and conversion operators from/to 
              -    // __half_raw and other built-in data types
              -}
              -
              -typedef __half half;
              -
              -__device__ __forceinline__ 
              -__half operator+(const __half &lh, const __half &rh) { 
              -    return __hadd(lh, rh); 
              -}
              -
              -// Other overloaded operators
              -
              -
              -

              This new design makes c = a + b work correctly for CUDA half data type.

              -
              -
              -
              -

              Implementation

              -

              The float16 class holds a 16-bit uint16_t data internally.

              -
              struct float16 {
              -  uint16_t x;
              -};
              -
              -
              -

              float16 supports the following features:

              -
                -
              • constructors / assignment operators that take input from primitive data types including bool, integers of various length, float, and double.
              • -
              • constructors / assignment operators that take input from __half on cuda, float16_t on ARM, and Eigen::half on Eigen.
              • -
              • conversion operators to primitive data types and half precision data types on cuda, ARM and Eigen.
              • -
              • overloaded arithmetic operators for cuda, arm, and non-arm cpu, respectively. These operators will take advantage of the cuda and ARM intrinsics on the corresponding hardware.
              • -
              -

              To support the above features, two fundamental conversion functions are provided:

              -
              float16 float_to_half_rn(float f);  // convert to half precision in round-to-nearest-even mode
              -float half_to_float(float16 h);
              -
              -
              -

              which provides one-to-one conversion between float32 and float16. These twos functions will do different conversion routines based on the current hardware. CUDA/ARM instrinsics will be used when the corresonding hardware is available. If the hardware or compiler level does not support float32 to float16 conversion, software emulation will be performed to do the conversion.

              -
              -
              -

              To do

              -

              After float16 class is available, some of the future items are below:

              -
                -
              • Update pybind/tensor_py.h to bind c++ float16 with numpy float16.
              • -
              • Modify GetKernelType() method in framework/operator.h to make it compatible with float16.
              • -
              • Create a type-casting operator that can convert the data type in tensor between float16 and other types.
              • -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/fluid.html b/develop/doc/design/fluid.html deleted file mode 100644 index 9084d6b4f98d1cfcb1c558b3dff6458da8c25352..0000000000000000000000000000000000000000 --- a/develop/doc/design/fluid.html +++ /dev/null @@ -1,350 +0,0 @@ - - - - - - - - - - - - - Design Doc: PaddlePaddle Fluid — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: PaddlePaddle Fluid

              -
              -

              Why Fluid

              -

              When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system at the time was Caffe. However, when PaddlePaddle was open-sourced in 2016, many other choices were available. There was a challenge – what is the need for open sourcing yet another deep learning framework?

              -

              Fluid is the answer. Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the “process” of training or inference using the concept of a model. In fact in PyTorch, TensorFlow Eager Execution and Fluid, there is no concept of a model at all. The details are covered in the sections below. Fluid is currently more extreme in the above mentioned idea than PyTorch and Eager Execution, and we are trying to push Fluid towards the directions of a compiler and a new programming language for deep learning.

              -
              -
              -

              The Evolution of Deep Learning Systems

              -

              Deep learning infrastructure is one of the fastest evolving technologies. Within four years, there have already been three generations of technologies invented.

              -

              | Existed since | model as sequence of layers | model as graph of operators | No model | -|–|–|–|–| -| 2013 | Caffe, Theano, Torch, PaddlePaddle | | | -| 2015 | | TensorFlow, MxNet, Caffe2, ONNX, n-graph | | -| 2016 | | | PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid |

              -

              From the above table, we see that the deep learning technology is evolving towards getting rid of the concept of a model. To understand the reasons behind this direction, a comparison of the programming paradigms or the ways to program deep learning applications using these systems, would be helpful. The following section goes over these.

              -
              -
              -

              Deep Learning Programming Paradigms

              -

              With the systems listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following:

              -
              x = layer.data("image")
              -l = layer.data("label")
              -f = layer.fc(x, W)
              -s = layer.softmax(f)
              -c = layer.mse(l, s)
              -
              -for i in xrange(1000): # train for 1000 iterations
              -    m = read_minibatch()
              -    forward({input=x, data=m}, minimize=c)
              -    backward(...)
              -
              -print W # print the trained model parameters.
              -
              -
              -

              The above program includes two parts:

              -
                -
              1. The first part describes the model, and
              2. -
              3. The second part describes the training process (or inference process) for the model.
              4. -
              -

              This paradigm has a well-known problem that limits the productivity of programmers. If the programmer made a mistake in configuring the model, the error messages wouldn’t show up until the second part is executed and forward and backward propagations are performed. This makes it difficult for the programmer to debug and locate a mistake that is located blocks away from the actual error prompt.

              -

              This problem of being hard to debug and re-iterate fast on a program is the primary reason that programmers, in general, prefer PyTorch over the older systems. Using PyTorch, we would write the above program as following:

              -
              W = tensor(...)
              -
              -for i in xrange(1000): # train for 1000 iterations
              -    m = read_minibatch()
              -    x = m["image"]
              -    l = m["label"]
              -    f = layer.fc(x, W)
              -    s = layer.softmax(f)
              -    c = layer.mse(l, s)
              -    backward()
              -
              -print W # print the trained model parameters.
              -
              -
              -

              We can see that the main difference is the moving the model configuration part (the first step) into the training loop. This change would allow the mistakes in model configuration to be reported where they actually appear in the programming block. This change also represents the model better, or its forward pass, by keeping the configuration process in the training loop.

              -
              -
              -

              Describe Arbitrary Models for the Future

              -

              Describing the process instead of the model also brings Fluid, the flexibility to define different non-standard models that haven’t been invented yet.

              -

              As we write out the program for the process, we can write an RNN as a loop, instead of an RNN as a layer or as an operator. A PyTorch example would look like the following:

              -
              for i in xrange(1000):
              -    m = read_minibatch()
              -    x = m["sentence"]
              -    for t in xrange x.len():
              -        h[t] = the_step(x[t])
              -
              -
              -

              With Fluid, the training loop and the RNN in the above program are not really Python loops, but just a “loop structure” provided by Fluid and implemented in C++ as the following:

              -
              train_loop = layers.While(cond)
              -with train_loop.block():
              -  m = read_minibatch()
              -  x = m["sentence"]
              -  rnn = layers.While(...)
              -  with rnn.block():
              -    h[t] = the_step(input[t])
              -
              -
              -

              An actual Fluid example is described here.

              -

              From the example, the Fluid programs look very similar to their PyTorch equivalent programs, except that Fluid’s loop structure, wrapped with Python’s with statement, could run much faster than just a Python loop.

              -

              We have more examples of the if-then-else structure of Fluid.

              -
              -
              -

              Turing Completeness

              -

              In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine. For a programming language, if it provides if-then-else and loop, it is Turing complete. From the above examples, Fluid seems to be Turing complete; however, it is noteworthy to notice that there is a slight difference between the if-then-else of Fluid and that of a programming language. The difference being that the former runs both of its branches and splits the input mini-batch into two – one for the True condition and another for the False condition. This hasn’t been researched in depth if this is equivalent to the if-then-else in programming languages that makes them Turing-complete. Based on a conversation with Yuang Yu, it seems to be the case but this needs to be looked into in-depth.

              -
              -
              -

              The Execution of a Fluid Program

              -

              There are two ways to execute a Fluid program. When a program is executed, it creates a protobuf message ProgramDesc that describes the process and is conceptually like an abstract syntax tree.

              -

              There is a C++ class Executor, which runs a ProgramDesc, similar to how an interpreter runs a Python program.

              -

              Fluid is moving towards the direction of a compiler, which is explain in fluid.

              -
              -
              -

              Backward Compatibility of Fluid

              -

              Given all the advantages from the removal of the concept of a model, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as n-graph. Similarly, Movidius is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known ONNX is also a file format of graphs of operators.

              -

              For Fluid, we can write a converter that extracts the parts in the ProgramDesc protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/fluid_compiler.html b/develop/doc/design/fluid_compiler.html deleted file mode 100644 index 443508b649de596315340e32c02bd0a92db20c0a..0000000000000000000000000000000000000000 --- a/develop/doc/design/fluid_compiler.html +++ /dev/null @@ -1,352 +0,0 @@ - - - - - - - - - - - - - PaddlePaddle Fluid: Towards a Compiled Programming Language — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              -
                -
              • Docs »
              • - -
              • PaddlePaddle Fluid: Towards a Compiled Programming Language
              • -
              • - - - View page source - - -
              • -
              -
              -
              -
              -
              - -
              -

              PaddlePaddle Fluid: Towards a Compiled Programming Language

              -

              As described in fluid.md, when a Fluid application program -runs, it generates a ProgramDesc protobuf message as an intermediate -representation of itself. The C++ class Executor can run this -protobuf message as an interpreter. This article describes the Fluid -compiler.

              -

              -
              -

              ProgramDesc

              -

              Before we go deeper into the idea of compiled language, let us take a -look at a simple example Fluid application.

              -
              import "fluid"
              -
              -func paddlepaddle() {
              -  X = fluid.read(...)
              -  W = fluid.Tensor(...)
              -  Y = fluid.mult(X, W)
              -}
              -
              -
              -

              This program consists of a block of three operators – -read, assign, and mult. Its ProgramDesc message looks like -the following

              -
              message ProgramDesc {
              -  block[0] = Block {
              -    vars = [X, W, Y],
              -    ops = [
              -      read(output = X)
              -      assign(input = ..., output = W)
              -      mult(input = {X, W}, output = Y)
              -    ],
              -  }
              -}
              -
              -
              -
              -
              -

              Transpilers

              -

              We can write a transpiler program that takes a ProgramDesc, e.g., -the above one, and outputs another ProgramDesc. Let us take some -examples:

              -
                -
              1. Memory optimization transpiler: We can write a transpiler that -inserts some FreeMemoryOps in the above example ProgramDesc so -to free memory early, before the end of an iteration, so to keep a -small memory footprint.
              2. -
              3. Distributed training transpiler: We can write a transpiler that -converts aProgramDesc into its distributed version of two -ProgramDescs – one for running by the trainer processes and the -other for the parameter server.
              4. -
              -

              In the rest of this article, we talk about a special kind of -transpiler, Native code generator, which takes a ProgramDesc and -generates a .cu (or .cc) file, which could be built by C++ -compilers (gcc, nvcc, icc) into binaries.

              -
              -
              -

              Native Code Generator

              -

              For the above example, the native code generator transpiler, say, the -CUDA code generator, should generate a main function:

              -
              void main() {
              -  auto X = fluid_cuda_read(...);
              -  auto W = fluid_cuda_create_tensor(...);
              -  auto Y = fluid_cuda_mult(X, W);
              -}
              -
              -
              -

              and the definitions of functions fluid_cuda_read, -fluid_cuda_create_tensor, and fluid_cuda_mult. Please be aware -that each function could just define a C++ instance of an operator and -run it. For example

              -
              paddle::Tensor fluid_cuda_read(...) {
              -  paddle::Tensor t;
              -  paddle::operator::Read r(&t, ...);
              -  r.Run();
              -  return t;
              -}
              -
              -
              -

              For computational operators that have multiple kernels, each for a -specific hardware platform, for example, the mult operator, the -generated code should call its CUDA kernel:

              -
              paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a, 
              -                               const paddle::Tensor& b) {
              -  paddle::Tensor t;
              -  paddle::operator::Mult m(a, b, ...);
              -  Mult.Run(cuda_context);
              -}
              -
              -
              -

              where cuda_context could be a global variable of type -paddle::CUDADeviceContext.

              -
              -
              -

              Multi-Block Code Generation

              -

              Most Fluid application programs may have more than one blocks. To -execute them, we need to trace scopes.

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/functions_operators_layers.html b/develop/doc/design/functions_operators_layers.html deleted file mode 100644 index fff2ecae7bd3d9bdb1bd4ae7218354b171e251e2..0000000000000000000000000000000000000000 --- a/develop/doc/design/functions_operators_layers.html +++ /dev/null @@ -1,331 +0,0 @@ - - - - - - - - - - - - - Design Doc: Functions, Operators, and Layers — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Functions, Operators, and Layers

              -

              In a DL system, we can compose one or more fine grained operators into a coarse grained one. For example, the FC layer can be composed of a multiplication operator and an add operator.

              -

              Historically, some fine grained operations are known as operators, and some coarse level ones are known as layers. But we need a well-defined separation.

              -

              In general, operators are those very fine grained operations, e.g., mul and add. In the implementation, we can write them as C++ functions:

              -
              template <typename T> T add(T x, T y) { return x + y; }
              -template <typename T> T mul(T x, T y) { return x * y; }
              -
              -
              -

              Then we can wrap them into operators which are C++ classes and can be created from Python bindings by name. A C macro can do this. For example, the following macro invocation

              -
              #define MAKE_FUNCTION_OPERATOR(mul);
              -
              -
              -

              generates

              -
              template <typename T> class mulOp : public OperatorBase {...};
              -REGISTER_OP(mulOp<float32>, "mul");
              -
              -
              -

              so that in Python we can create operator mul by:

              -
              X1 = Var()
              -X2 = Var()
              -Y = Var()
              -paddle.cpp.create_operator("mul", input=[X1, X2], output=Y)
              -
              -
              -

              Also, at the same time, we can compose a coarse level C++ operator class by composing functions mul and add:

              -
              template <typename T>
              -class FCOp : public OperatorBase {
              - public:
              -  void Run(...) {
              -    add(mul(Input<T>("X"), Input<T>("W")), Input<T>("b");
              -  }
              -};
              -REGISTER_OP(FCOp, "fc");
              -
              -
              -

              We need to support such composition in Python as well. To do so, we need a higher level Python wrapping of operator creation than paddle.cpp.create_operator. This higher level operator API should be compatible with the layer API.

              -

              Let’s explain using an example. Suppose that we are going to compose the FC using mul and add in Python, we’d like to have Python functions mul and add defined in module operator:

              -
              def operator.mul(X1, X2):
              -    O = Var()
              -    paddle.cpp.create_operator("mul", input={X1, Y1}, output=O)
              -    return O
              -
              -def operator.add(X1, X2):
              -    O = Var()
              -    paddle.cpp.create_operator("add", input={X1, X2}, output=O)
              -    return O
              -
              -
              -

              Above code snippets are automatically generated. Given them, users can define

              -
              def layer.fc(X):
              -    W = Var()
              -    b = Var()
              -    return operator.add(operator.mul(X, W), b)
              -
              -
              -

              If we don’t have operator.mul and operator.add, the definiton of layer.fc would be complicated:

              -
              def layer.fc(X):
              -    W = Var()
              -    b = Var()
              -    O1 = Var()
              -    paddle.cpp.create_operator("mul", input=[X, W], output=O1)
              -    O2 = Var()
              -    paddle.cpp.create_operator("add", input=[O1, b], output=O2)
              -    return O2
              -
              -
              -

              We’d like to have Python bindings to operators in package paddle.operator, and Python compositions of operators in package paddle.layer. So we have the following concepts in above illustrative example:

              -

              | C++ functions/functors | mul | add | | | -|————————|————–|————–|————-|———-| -| C++ operator class | mulOp | addOp | FCOp | | -| Python binding | operator.mul | operator.add | operator.fc | | -| Python function | | | | layer.fc |

              -

              This is how we differentiate layer and operators in PaddlePaddle:

              -
                -
              • those defined in C++ and have a lightweighted Python wrapper in module operators are operators; whereas
              • -
              • those who don’t have C++ implementations but a Python implementation that compose C++ operators are known as layers.
              • -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/gan_api.html b/develop/doc/design/gan_api.html deleted file mode 100644 index 462f930e90d45f3a6063d4e3f4205f36fda5c99f..0000000000000000000000000000000000000000 --- a/develop/doc/design/gan_api.html +++ /dev/null @@ -1,518 +0,0 @@ - - - - - - - - - - - - - Design for GAN — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design for GAN

              -

              GAN (General Adversarial Net [https://arxiv.org/abs/1406.2661]) is an important model for unsupervised learning and widely used in many areas.

              -

              It applies several important concepts in machine learning system design, including building and running subgraphs, dependency tracing, different optimizers in one executor and so forth.

              -

              In our GAN design, we wrap it as a user-friendly easily customized python API to design different models. We take the conditional DC-GAN (Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [https://arxiv.org/abs/1511.06434]) as an example due to its good performance on image generation.

              -

              -
              -Figure 1. The overall running logic of GAN. The black solid arrows indicate the forward pass; the green dashed arrows indicate the backward pass of generator training; the red dashed arrows indicate the backward pass of the discriminator training. The BP pass of the green (red) arrow should only update the parameters in the green (red) boxes. The diamonds indicate the data providers. d\_loss and g\_loss marked in red and green are the two targets we would like to run. -

              The operators, layers and functions required/optional to build a GAN demo is summarized in https://github.com/PaddlePaddle/Paddle/issues/4563.

              -

              -
              -Figure 2. Photo borrowed from the original DC-GAN paper. -

              -

              The Conditional-GAN might be a class.

              -

              This design we adopt the popular open source design in https://github.com/carpedm20/DCGAN-tensorflow and https://github.com/rajathkmp/DCGAN. It contains following data structure:

              -
                -
              • DCGAN(object): which contains everything required to build a GAN model. It provides following member functions methods as API:
              • -
              • init(...): Initialize hyper-parameters (like conv dimension and so forth), and declare model parameters of discriminator and generator as well.
              • -
              • generator(z, y=None): Generate a fake image from input noise z. If the label y is provided, the conditional GAN model will be chosen. -Returns a generated image.
              • -
              • discriminator(image): -Given an image, decide if it is from a real source or a fake one. -Returns a 0/1 binary label.
              • -
              • build_model(self): -build the whole GAN model, define training loss for both generator and discrimator.
              • -
              -
              -
              -

              Discussion on Engine Functions required to build GAN

              -
                -
              • Trace the tensor and variable dependency in the engine executor. (Very critical, otherwise GAN can’be be trained correctly)
              • -
              • Different optimizers responsible for optimizing different loss.
              • -
              -

              To be more detailed, we introduce our design of DCGAN as following:

              -
              -

              Class member Function: Initializer

              -
                -
              • Set up hyper-parameters, including condtional dimension, noise dimension, batch size and so forth.
              • -
              • Declare and define all the model variables. All the discriminator parameters are included in the list self.theta_D and all the generator parameters are included in the list self.theta_G.
              • -
              -
              class DCGAN(object):
              -  def __init__(self, y_dim=None):
              -  
              -    # hyper parameters  
              -    self.y_dim = y_dim # conditional gan or not
              -    self.batch_size = 100
              -    self.z_dim = z_dim # input noise dimension
              -
              -    # define parameters of discriminators
              -    self.D_W0 = pd.Variable(shape=[3,3, 1, 128], data=pd.gaussian_normal_randomizer())
              -    self.D_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a  numpy data
              -    self.D_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer())
              -    self.D_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a  numpy data
              -    self.D_W2 = pd.Varialble(np.random.rand(128, 1))
              -    self.D_b2 = pd.Variable(np.zeros(128))
              -    self.theta_D = [self.D_W0, self.D_b0, self.D_W1, self.D_b1, self.D_W2, self.D_b2]
              -
              -    # define parameters of generators
              -    self.G_W0 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer())
              -    self.G_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a  numpy data
              -    self.G_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer())
              -    self.G_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a  numpy data
              -    self.G_W2 = pd.Varialble(np.random.rand(128, 1))
              -    self.G_b2 = pd.Variable(np.zeros(128))
              -    self.theta_G = [self.G_W0, self.G_b0, self.G_W1, self.G_b1, self.G_W2, self.G_b2]
              -
              -
              -
              -
              -

              Class member Function: Generator

              -
                -
              • Given a noisy input z, returns a fake image.
              • -
              • Concatenation, batch-norm, FC operations required;
              • -
              • Deconv layer required, which is missing now...
              • -
              -
              class DCGAN(object):
              -  def generator(self, z, y = None):
              -    # input z: the random noise
              -    # input y: input data label (optional)
              -    # output G_im: generated fake images
              -    
              -    if not self.y_dim:
              -      z = pd.layer.concat(1, [z, y])
              -      
              -    G_h0 = pd.layer.fc(z, self.G_w0, self.G_b0)
              -    G_h0_bn = pd.layer.batch_norm(G_h0)
              -    G_h0_relu = pd.layer.relu(G_h0_bn)
              -    
              -    G_h1 = pd.layer.deconv(G_h0_relu, self.G_w1, self.G_b1)
              -    G_h1_bn = pd.layer.batch_norm(G_h1)
              -    G_h1_relu = pd.layer.relu(G_h1_bn)
              -    
              -    G_h2 = pd.layer.deconv(G_h1_relu, self.G_W2, self.G_b2))
              -    G_im = pd.layer.tanh(G_im)
              -    return G_im
              -
              -
              -
              -
              -

              Class member function: Discriminator

              -
                -
              • Given a noisy input z, returns a fake image.
              • -
              • Concatenation, Convolution, batch-norm, FC, Leaky-ReLU operations required;
              • -
              -
              class DCGAN(object):
              -  def discriminator(self, image):
              -    # input image: either generated images or real ones
              -    # output D_h2: binary logit of the label
              -
              -    D_h0 = pd.layer.conv2d(image, w=self.D_w0, b=self.D_b0)
              -    D_h0_bn = pd.layer.batchnorm(h0)
              -    D_h0_relu = pd.layer.lrelu(h0_bn)
              -    
              -    D_h1 = pd.layer.conv2d(D_h0_relu, w=self.D_w1, b=self.D_b1)
              -    D_h1_bn = pd.layer.batchnorm(D_h1)
              -    D_h1_relu = pd.layer.lrelu(D_h1_bn)
              -    
              -    D_h2 = pd.layer.fc(D_h1_relu, w=self.D_w2, b=self.D_b2)
              -    return D_h2
              -
              -
              -
              -
              -

              Class member function: Build the model

              -
                -
              • Define data readers as placeholders to hold the data;
              • -
              • Build generator and discriminators;
              • -
              • Define two training losses for discriminator and generator, respectively. -If we have execution dependency engine to back-trace all tensors, the module building our GAN model will be like this:
              • -
              -
              class DCGAN(object):
              -  def build_model(self):
              -    if self.y_dim:
              -        self.y = pd.data(pd.float32, [self.batch_size, self.y_dim])
              -    self.images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size])
              -    self.faked_images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size])
              -    self.z = pd.data(tf.float32, [None, self.z_size])
              -    
              -    # step 1: generate images by generator, classify real/fake images with discriminator
              -    if self.y_dim: # if conditional GAN, includes label
              -        self.G = self.generator(self.z, self.y)
              -        self.D_t = self.discriminator(self.images)
              -        # generated fake images
              -        self.sampled = self.sampler(self.z, self.y)
              -        self.D_f = self.discriminator(self.G)
              -    else: # original version of GAN
              -        self.G = self.generator(self.z)
              -        self.D_t = self.discriminator(self.images)
              -        # generate fake images
              -        self.sampled = self.sampler(self.z)
              -        self.D_f = self.discriminator(self.images)
              -    
              -    # step 2: define the two losses
              -    self.d_loss_real = pd.reduce_mean(pd.cross_entropy(self.D_t, np.ones(self.batch_size))
              -    self.d_loss_fake = pd.reduce_mean(pd.cross_entropy(self.D_f, np.zeros(self.batch_size))
              -    self.d_loss = self.d_loss_real + self.d_loss_fake
              -    
              -    self.g_loss = pd.reduce_mean(pd.cross_entropy(self.D_f, np.ones(self.batch_szie))
              -
              -
              -

              If we do not have dependency engine but blocks, the module building our GAN model will be like this:

              -
              class DCGAN(object):
              -  def build_model(self, default_block):
              -    # input data in the default block
              -    if self.y_dim:
              -        self.y = pd.data(pd.float32, [self.batch_size, self.y_dim])
              -    self.images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size])
              -    # self.faked_images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size])
              -    self.z = pd.data(tf.float32, [None, self.z_size])
              -
              -    # step 1: generate images by generator, classify real/fake images with discriminator
              -    with pd.default_block().g_block():
              -      if self.y_dim: # if conditional GAN, includes label
              -        self.G = self.generator(self.z, self.y)
              -        self.D_g = self.discriminator(self.G, self.y)
              -      else: # original version of GAN
              -        self.G = self.generator(self.z)
              -        self.D_g = self.discriminator(self.G, self.y)
              -      self.g_loss = pd.reduce_mean(pd.cross_entropy(self.D_g, np.ones(self.batch_szie))
              -    
              -    with pd.default_block().d_block():
              -      if self.y_dim: # if conditional GAN, includes label
              -        self.D_t = self.discriminator(self.images, self.y)
              -        self.D_f = self.discriminator(self.G, self.y)
              -      else: # original version of GAN
              -        self.D_t = self.discriminator(self.images)
              -        self.D_f = self.discriminator(self.G)
              -
              -      # step 2: define the two losses
              -      self.d_loss_real = pd.reduce_mean(pd.cross_entropy(self.D_t, np.ones(self.batch_size))
              -      self.d_loss_fake = pd.reduce_mean(pd.cross_entropy(self.D_f, np.zeros(self.batch_size))
              -      self.d_loss = self.d_loss_real + self.d_loss_fake
              -
              -
              -

              Some small confusion and problems with this design:

              -
                -
              • D_g and D_f are actually the same thing, but has to be written twice; i.e., if we want to run two sub-graphs conceptually, the same codes have to be written twice if they are shared by the graph.
              • -
              • Requires ability to create a block anytime, rather than in if-else or rnn only;
              • -
              -
              -
              -
              -

              Main function for the demo:

              -

              Generally, the user of GAN just need to the following things:

              -
                -
              • Define an object as DCGAN class;
              • -
              • Build the DCGAN model;
              • -
              • Specify two optimizers for two different losses with respect to different parameters.
              • -
              -
              # pd for short, should be more concise.
              -from paddle.v2 as pd
              -import numpy as np
              -import logging
              -
              -if __name__ == "__main__":
              -    # dcgan class in the default graph/block
              -    # if we use dependency engine as tensorflow
              -    # the codes, will be slightly different like:
              -    # dcgan = DCGAN()
              -    # dcgan.build_model()
              -    with pd.block() as def_block:
              -      dcgan = DCGAN()
              -      dcgan.build_model(def_block)
              -
              -    # load mnist data
              -    data_X, data_y = self.load_mnist()
              -    
              -    # Two subgraphs required!!!
              -    with pd.block().d_block():
              -      d_optim = pd.train.Adam(lr = .001, beta= .1)
              -      d_step = d_optim.minimize(dcgan.d_loss, dcgan.theta_D)
              -    with pd.block.g_block():
              -      g_optim = pd.train.Adam(lr = .001, beta= .1)
              -      g_step = pd.minimize(dcgan.g_loss, dcgan.theta_G)
              -
              -    # executor
              -    sess = pd.executor()
              -    
              -    # training
              -    for epoch in xrange(10000):
              -      for batch_id in range(N / batch_size):
              -        idx = ...
              -        # sample a batch
              -        batch_im, batch_label = data_X[idx:idx+batch_size], data_y[idx:idx+batch_size]
              -        # sample z
              -        batch_z = np.random.uniform(-1., 1., [batch_size, z_dim])
              -
              -        if batch_id % 2 == 0:
              -          sess.run(d_step, 
              -                   feed_dict = {dcgan.images: batch_im,
              -                                dcgan.y: batch_label,
              -                                dcgan.z: batch_z})
              -        else:
              -          sess.run(g_step,
              -                   feed_dict = {dcgan.z: batch_z})
              -
              -
              -
              -
              -
              -

              More thinking about dependency engine v.s. block design:

              -
                -
              • What if we just want to run an intermediate result? Do we need to run the whole block/graph?
              • -
              • Should we call eval() to get the fake images in the first stage? And then train the discriminator in the second stage?
              • -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/graph.html b/develop/doc/design/graph.html deleted file mode 100644 index f77f80cba169efb57cb7c8eb9da52a3d718c7512..0000000000000000000000000000000000000000 --- a/develop/doc/design/graph.html +++ /dev/null @@ -1,312 +0,0 @@ - - - - - - - - - - - - - Design Doc: Computations as a Graph — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Computations as a Graph

              -

              A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before.

              -

              This document explains that the construction of a graph as three steps:

              -
                -
              • construct the forward part
              • -
              • construct the backward part
              • -
              • construct the optimization part
              • -
              -
              -

              The Construction of a Graph

              -

              Let us take the problem of image classification as a simple example. The application program that trains the model looks like:

              -
              x = layer.data("images")
              -l = layer.data("label")
              -y = layer.fc(x)
              -cost = layer.mse(y, l)
              -optimize(cost)
              -train(cost, reader=mnist.train())
              -
              -
              -
              -

              Forward Part

              -

              The first four lines of above program build the forward part of the graph.

              -

              -

              In particular, the first line x = layer.data("images") creates variable x and a Feed operator that copies a column from the minibatch to x. y = layer.fc(x) creates not only the FC operator and output variable y, but also two parameters, W and b, and the initialization operators.

              -

              Initialization operators are kind of “run-once” operators – the Run method increments a class data member counter so to run at most once. By doing so, a parameter wouldn’t be initialized repeatedly, say, in every minibatch.

              -

              In this example, all operators are created as OpDesc protobuf messages, and all variables are VarDesc. These protobuf messages are saved in a BlockDesc protobuf message.

              -
              -
              -

              Backward Part

              -

              The fifth line optimize(cost) calls two functions, ConstructBackwardGraph and ConstructOptimizationGraph.

              -

              ConstructBackwardGraph traverses the forward graph in the BlockDesc protobuf message and builds the backward part.

              -

              -

              According to the chain rule of gradient computation, ConstructBackwardGraph would

              -
                -
              1. create a gradient operator G for each operator F,
              2. -
              3. make all inputs, outputs, and outputs’ gradient of F as inputs of G,
              4. -
              5. create gradients for all inputs of F, except for those who don’t have gradients, like x and l, and
              6. -
              7. make all these gradients as outputs of G.
              8. -
              -
              -
              -

              Optimization Part

              -

              For each parameter, like W and b created by layer.fc, marked as double circles in above graphs, ConstructOptimizationGraph creates an optimization operator to apply its gradient. Here results in the complete graph:

              -

              -
              -
              -
              -

              Block and Graph

              -

              The word block and graph are interchangable in the desgin of PaddlePaddle. A Block is a metaphore of the code and local variables in a pair of curly braces in programming languages, where operators are like statements or instructions. A graph of operators and variables is a representation of the block.

              -

              A Block keeps operators in an array BlockDesc::ops

              -
              message BlockDesc {
              -  repeated OpDesc ops = 1;
              -  repeated VarDesc vars = 2;
              -}
              -
              -
              -

              in the order that they appear in user programs, like the Python program at the beginning of this article. We can imagine that in ops, we have some forward operators, followed by some gradient operators, and then some optimization operators.

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/graph_survey.html b/develop/doc/design/graph_survey.html deleted file mode 100644 index 8f6722d0225ba6b7c20fe1543661d59a1bb39c64..0000000000000000000000000000000000000000 --- a/develop/doc/design/graph_survey.html +++ /dev/null @@ -1,445 +0,0 @@ - - - - - - - - - - - - - Survey on Graph — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Survey on Graph

              -

              Neural network framework often provides symbolic API for users to write network topology conveniently. This doc manily focus on symbolic API in most popular neural network frameworks, and try to find out how to parse symbolic configuration to a portable file, such as protobuf or json.

              -
              -

              Mxnet

              -

              The core concept of symbolic API is Symbol. Mxnet implements Symbol class in C++, and export to Python using C-API. Please refer to the comments in Mxnet:

              -

              Symbol is help class used to represent the operator node in Graph. -Symbol acts as an interface for building graphs from different components like Variable, Functor and Group. Symbol is also exported to python front-end (while Graph is not) to enable quick test and deployment. Conceptually, symbol is the final operation of a graph and thus including all the information required (the graph) to evaluate its output value.

              -

              A simple network topology wrote by Symbol is as follows:

              -
              def get_symbol(num_classes=10, **kwargs):
              -    data = mx.symbol.Variable('data')
              -    data = mx.symbol.Flatten(data=data)
              -    fc1  = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
              -    act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
              -    fc2  = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
              -    act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
              -    fc3  = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=num_classes)
              -    mlp  = mx.symbol.SoftmaxOutput(data = fc3, name = 'softmax')
              -    return mlp
              -
              -
              -

              Varible here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own NodeAttr. There is a op field in NodeAttr class, when a Symbol represents Variable(often input data), the op field is null.

              -

              Symbol contains a data member, std::vector outputs, and NodeEntry cantains a poniter to Node. We can follow the Node pointer to get all the Graph.

              -

              And Symbol can be saved to a Json file.

              -

              Here is a detailed example:

              -
              >>> import mxnet as mx
              ->>> data = mx.symbol.Variable('data')
              ->>> print data.debug_str()
              -Variable:data
              -
              ->>> data = mx.symbol.Flatten(data=data)
              ->>> print data.debug_str()
              -Symbol Outputs:
              -    output[0]=flatten0(0)
              -Variable:data
              ---------------------
              -Op:Flatten, Name=flatten0
              -Inputs:
              -    arg[0]=data(0) version=0
              -
              ->>> fc1  = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
              ->>> print fc1.debug_str()
              -Symbol Outputs:
              -    output[0]=fc1(0)
              -Variable:data
              ---------------------
              -Op:Flatten, Name=flatten0
              -Inputs:
              -    arg[0]=data(0) version=0
              -Variable:fc1_weight
              -Variable:fc1_bias
              ---------------------
              -Op:FullyConnected, Name=fc1
              -Inputs:
              -    arg[0]=flatten0(0)
              -    arg[1]=fc1_weight(0) version=0
              -    arg[2]=fc1_bias(0) version=0
              -Attrs:
              -    num_hidden=128
              -
              -
              -
              -
              -

              TensorFlow

              -

              The core concept of symbolic API is Tensor. Tensorflow defines Tensor in Python. Please refer to the comments in TensorFlow:

              -

              A Tensor is a symbolic handle to one of the outputs of an Operation. It does not hold the values of that operation’s output, but instead provides a means of computing those values in a TensorFlow Session.

              -

              A simple example is as follows:

              -
                # Build a dataflow graph.
              -  c = tf.constant([[1.0, 2.0], [3.0, 4.0]])
              -  d = tf.constant([[1.0, 1.0], [0.0, 1.0]])
              -  e = tf.matmul(c, d)
              -
              -  # Construct a `Session` to execute the graph.
              -  sess = tf.Session()
              -
              -  # Execute the graph and store the value that `e` represents in `result`.
              -  result = sess.run(e)
              -
              -
              -

              The main method of Tensor is as follows:

              -
              @property
              -def op(self):
              -  """The `Operation` that produces this tensor as an output."""
              -  return self._op
              -
              -@property
              -def dtype(self):
              -   """The `DType` of elements in this tensor."""
              -  return self._dtype
              -
              -@property
              -def graph(self):
              -  """The `Graph` that contains this tensor."""
              -  return self._op.graph
              -
              -@property
              -def name(self):
              -  """The string name of this tensor."""
              -  if not self._op.name:
              -    raise ValueError("Operation was not named: %s" % self._op)
              -  return "%s:%d" % (self._op.name, self._value_index)
              -
              -@property
              -def device(self):
              -  """The name of the device on which this tensor will be produced, or None."""
              -  return self._op.device
              -
              -
              -

              Tensor can be taken as target to run by session. Tensor contains all the information of Graph, and tracks data dependency.

              -

              Here is a detailed example:

              -
              >>> import tensorflow as tf
              ->>> c = tf.constant([[1.0, 2.0], [3.0, 4.0]])
              ->>> print c.graph
              -<tensorflow.python.framework.ops.Graph object at 0x10f256d50>
              ->>> d = tf.constant([[1.0, 1.0], [0.0, 1.0]])
              ->>> print d.graph
              -<tensorflow.python.framework.ops.Graph object at 0x10f256d50>
              ->>> e = tf.matmul(c, d)
              ->>> print e.graph
              -<tensorflow.python.framework.ops.Graph object at 0x10f256d50>
              -
              -
              -
              -
              -

              Dynet

              -

              The core concept of symbolic API is Expression, and Dynet defines Expression class in C++.

              -

              A simple example is as follows:

              -
              ComputationGraph cg;
              -Expression W = parameter(cg, pW);
              -
              -Expression in = input(cg, xs[i]);
              -Expression label = input(cg, ys[i]);
              -Expression pred = W * in;
              -Expression loss = square(pred - label);
              -
              -
              -

              The input data and parameter are also represented by Expression. Every basci Expression corresponds to a Node. And input data is also a Node.

              -

              Expression has a data member ComputationGraph, and ComputationGraph will be modified in users’ configuring process. Expression can be a running target, beacuse Expression contains all dependency.

              -

              Here is a detailed example:

              -

              write topology in C++

              -
              ComputationGraph cg;
              -Expression W = parameter(cg, pW);
              -cg.print_graphviz();
              -
              -Expression pred = W * xs[i];
              -cg.print_graphviz();
              -
              -Expression loss = square(pred - ys[i]);
              -cg.print_graphviz();
              -
              -
              -

              compile and print

              -
              # first print
              -digraph G {
              -  rankdir=LR;
              -  nodesep=.05;
              -  N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"];
              -}
              -# second print
              -digraph G {
              -  rankdir=LR;
              -  nodesep=.05;
              -  N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"];
              -  N1 [label="v1 = v0 * -0.98"];
              -  N0 -> N1;
              -}
              -# third print
              -digraph G {
              -  rankdir=LR;
              -  nodesep=.05;
              -  N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"];
              -  N1 [label="v1 = v0 * -0.98"];
              -  N0 -> N1;
              -  N2 [label="v2 = -1.88387 - v1"];
              -  N1 -> N2;
              -  N3 [label="v3 = -v2"];
              -  N2 -> N3;
              -  N4 [label="v4 = square(v3)"];
              -  N3 -> N4;
              -}
              -
              -
              -
              -
              -

              Conclusion

              -

              Actually, Symbol/Tensor/Expression in Mxnet/TensorFlow/Dynet are the same level concepts. We use a unified name Expression here, this level concept has following features:

              -
                -
              • Users wirte topoloy with symbolic API, and all return value is Expression, including input data and parameter.
              • -
              • Expression corresponds with a global Graph, and Expression can also be composed.
              • -
              • Expression tracks all dependency and can be taken as a run target
              • -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/if_else_op.html b/develop/doc/design/if_else_op.html deleted file mode 100644 index 0db369a450389f849f2a8406e0821c537ca49664..0000000000000000000000000000000000000000 --- a/develop/doc/design/if_else_op.html +++ /dev/null @@ -1,301 +0,0 @@ - - - - - - - - - - - - - The IfElse Operator — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              The IfElse Operator

              -

              PaddlePaddle’s IfElse operator differs from TensorFlow’s:

              -
                -
              • the TensorFlow version takes a scalar boolean value as the condition so that the whole mini-batch goes to either the true or the false branch, whereas
              • -
              • the PaddlePaddle version takes a vector of boolean value as the condition, and instances corresponding to true values go to the true branch, those corresponding to false values go to the false branch.
              • -
              -
              -

              Example

              -

              The following PaddlePaddle program shows the usage of the IfElse operator:

              -
              import paddle as pd
              -
              -x = minibatch([10, 20, 30]) # shape=[None, 1]
              -y = var(1) # shape=[1], value=1
              -z = minibatch([10, 20, 30]) # shape=[None, 1]
              -cond = larger_than(x, 15) # [false, true, true]
              -
              -ie = pd.ifelse()
              -with ie.true_block():
              -    d = pd.layer.add(x, y)
              -    ie.output(d, pd.layer.softmax(d))
              -with ie.false_block():
              -    d = pd.layer.fc(z)
              -    ie.output(d, d+1)
              -o1, o2 = ie(cond)
              -
              -
              -

              A challenge to implement the IfElse operator is to infer those variables to be split, or, say, to identify the variable of the mini-batch or those derived from the mini-batch.

              -

              An equivalent C++ program is as follows:

              -
              namespace pd = paddle;
              -
              -int x = 10;
              -int y = 1;
              -int z = 10;
              -bool cond = false;
              -int o1, o2;
              -if (cond) {
              -  int d = x + y;
              -  o1 = z;
              -  o2 = pd::layer::softmax(z);
              -} else {
              -  int d = pd::layer::fc(z);
              -  o1 = d;
              -  o2 = d+1;
              -}
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/infer_var_type.html b/develop/doc/design/infer_var_type.html deleted file mode 100644 index 7bc87b29f6bfe0b56a075f25e3da14435ae16cae..0000000000000000000000000000000000000000 --- a/develop/doc/design/infer_var_type.html +++ /dev/null @@ -1,318 +0,0 @@ - - - - - - - - - - - - - Design Doc: InferVarType — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: InferVarType

              -
              -

              The Problem Posed

              -

              The variable in our design can hold variant types. Such as LoDTensor and SelectedRows. An operator should be able to inference the variable types of its output.

              -

              For example, a lookup table operator takes two LoDTensor; one is a float tensor as the embedding table, the other is an int tensor as word ID. The gradient operator of lookup table will generate a SelectedRows as its output. A sum operator can take both LoDTensor and SelectedRows as its inputs and will generate a LoDTensor if any of its inputs is LoDTensor, otherwise, the sum operator will generate SelectedRows as its output.

              -

              The variable type will be constant at runtime. Every variable’s type can either be set by the user (input data and parameter) or be inferred by the operator in compile time.

              -
              -
              -

              Proposed Solution

              -

              The InferVarType is a compile-time function which is registered to each operator. The inferface of that function is:

              -
              using InferVarTypeFN = std::function<
              -    void (const OpDescBind& /*op_desc*/, BlockDescBind* /*block*/)>;
              -
              -
              -

              It takes an operator description as its input and will write the output variable type and store them in block description.

              -

              The InferVarTypeFN will be registered in OpInfo, to replace infer_var_type_ field. The OpInfo should be

              -
              struct OpInfo {
              -  InferVarTypeFN infer_var_type_;
              -  ...
              -};
              -
              -
              -

              The default InferVarType will set output type as LoDTensor. It can be done by GetInferVarType().

              -
              void DefaultInferVarType(const OpDescBind& op_desc, BlockDescBind* block) {
              -  // set the output type of variable as `LoDTensor`.
              -  // ...
              -}
              -
              -struct OpInfo {
              -  InferVarTypeFN infer_var_type_;
              -  InferVarTypeFN GetInferVarType() const {
              -    if (infer_var_type_) {
              -      return infer_var_type_;
              -    } else {
              -      return DefaultInferVarType;
              -    }
              -  }
              -};
              -
              -
              -
              -
              -

              Register InferVarType

              -

              We provide a thin base class for registering an InferVarTypeFN. To use a base class will ease the implementation of registry since we can detect the registry entry is an InferVarTypeFN or not.

              -
              class VarTypeInferer {
              -public:
              -  virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const = 0;
              -}
              -
              -
              -

              Operator developers can write the specialize VarTypeInferer as follow.

              -
              class SpecialVarTypeInferer : public VarTypeInferer {
              -public:
              -  virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const {
              -    // .. own logic
              -  }
              -}
              -
              -
              -

              Then user can register the InferVarType just like GradOpDescMaker and OpInfoMaker.

              -
              REGISTER_OPERATOR(some_op, OpType, SpecialVarTypeInferer, ...);
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/kernel_hint_design.html b/develop/doc/design/kernel_hint_design.html deleted file mode 100644 index 1d2c38bb902ec30a0e141e67e20ab1354ac6f230..0000000000000000000000000000000000000000 --- a/develop/doc/design/kernel_hint_design.html +++ /dev/null @@ -1,306 +0,0 @@ - - - - - - - - - - - - - Problem — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Problem

              -

              In PaddlePaddle’s Design, one Operator may have multiple kernels. Users may have some personal preference to choose a certain type of kernel for an operator, such as force_cpu to choose a CPU kernel, use_cudnn to choose a CUDNN kernel, we need to provide a way for users to do this.

              -

              In the current design, we use KernelType to describe one kernel.

              -
              struct KernelType {
              -  Place place_;
              -  DataType data_type_;
              -  LayoutType layout_;
              -};
              -
              -
              -

              place_ data_type_ and layout_ can be got from the input tensors of the operator, GetActualKernelType(inputs) use inputs to infer the proper kernel key that fit the incoming data, but users can not directly configure it.

              -

              The design also provides a virtual method GetExpectedKernelType that user can overload and use to choose the KernelType they want to use.

              -

              So we should send the information user defined in proto to GetExpectedKernelType for choosing a kernel.

              -

              The problem is, how should we define and send the information for GetExpectedKernelType to use?

              -
              -
              -

              Solution

              -
              -

              Potential choice

              -
                -
              1. Do nothing, let the user add the information they want to operator‘s attribute and get them inside GetExpectedKernelType, this can work properly. But there is a little problem that users may define many kinds of hints for the same purpose, such as force_cpu, use_cpu, cpu_kernel to choose CPU kernel, and use_cudnn, force_cudnn, cudnn_kernel to choose CUDNN kernel.
              2. -
              3. Pre-define all the needed option and use a single attr key such as kernel_hint for the user, this is not so flexible if the user wants to define some more kind of hint.
              4. -
              -
              -
              -

              Final choice

              -

              To provide enough flexibility while avoiding confusion definition, we can define some global constants for these attribute names, such as force_cpu, use_cudnn, use_mkldnn for a user to choose.

              -

              In C++

              -
              const std::string kForceCPU = "force_cpu";
              -const std::string kUseCUDNN = "use_cudnn";
              -const std::string kUseMKLDNN = "use_mkldnn";
              -
              -KernelType GetExpectedKernelType() {
              -  if (Attr<bool>(kForceCPU)) {
              -    return KernelType(CPUPlace, ...)
              -  } else {
              -    ...
              -  }
              -}
              -
              -
              -

              In Python code

              -
              FORCE_CPU = core.kForceCPU()
              -
              -def xx_layer(..., force_cpu=false):
              -  layer_helper = LayerHelper(...)
              -  layer_helper.append_op(
              -    type="xx",
              -    attr={FORCE_CPU: force_cpu})
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/kernel_selection.html b/develop/doc/design/kernel_selection.html deleted file mode 100644 index 027db66a6b1699034efa0adfb406bfcba1f25491..0000000000000000000000000000000000000000 --- a/develop/doc/design/kernel_selection.html +++ /dev/null @@ -1,337 +0,0 @@ - - - - - - - - - - - - - Background — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Background

              -

              Every operator has many kernels because there are multiple data types, places, data layout, library type that Fluid supports. We use the OpKernelType to describe kernel types that operators can hold.

              -

              The OpKernelType is as follows:

              -
              struct OpKernelType {
              -  Place place_;
              -  DataType data_type_;
              -  DataLayout data_layout_;
              -  LibraryType library_type_;
              -};
              -
              -
              -
                -
              • The place_ is a descriptor of the device, e.g., CPUPlace, CUDAPlace.
              • -
              • The data_type_ is the data type that this kernel performs on, e.g., FP32, INT64. Note that one kernel may have inputs with different data types. However, it will be a major data_type. For example, the cross_entropy takes int64 as it label, and double/float as its input logit and output cost. The major data_type of cross_entropy is float or double.
              • -
              • The data_layout_ is useful for some computational library. One example is that MKLDNN uses many kinds of layout, such as nChw8c. Each kind of layout will invoke the different kernel.
              • -
              • The library_type_ describes the computational library, e.g., MKLDNN, CUDNN.
              • -
              -
              -
              -

              Problem

              -

              We register a kernel for every operator and every kernel type ideally. However, it is impracticable for the following situations.

              -
                -
              1. Some operators, like CRF, are complicated and inefficient to be implemented on GPU. The CRF operator will only have a CPU kernel.
              2. -
              3. Some operators will take too many memory. It is better to force them into CPU. However, the rest of operators in this neural network will be performed on GPU, i.e., model parallel problem.
              4. -
              5. Some layout and place are particular. One example is that MKLDNN uses nChw8 and there is no other library uses nChw8c.
              6. -
              -

              Take one situation to give a detailed explanation, if we have two Operators: OP1 and OP2, OP1 has one output op1_to_op2, and op1_to_op2 is the input of OP2.

              -

              If OP1 and OP2 run on the same place(for example CPUPlace), then op1_2_op2 can be used directly by OP2.

              -
              OP1(CPUPlace)
              -     |
              - op1_2_op2
              -     |
              -OP2(CPUPlace)
              -
              -
              -

              If OP1 and OP2 run one different place, then OP2 cannot use op1_2_op2 directly.

              -

              Problems under these situations are similar. We can formalize this problem as follow.

              -

              We register kernels with types $KT = {kt_1, kt_2, kt_3, ...}$ for one operator. The inputs of this operator should be run on kernel type $kt_{?}$, which the $kt_{?} \notin KT$. How to cast the input of this operator from $kt_{?}$ to any of kernel type in $KT$.

              -
              -
              -

              Solution: data transform

              -

              It is clear that transforming inputs of an operator to adapt another kernel type is not related to the particular operator. So we should register these transformation methods as global methods.

              -

              We can infer kernel type for each input of an operator. We let this kernel type as actual kernel type for var, which means this kernel type is the kernel type that can process this input variable.

              -

              We can get a kernel type by 1) The configuration of operator description. (Users may want to force use MKL for conv operator). 2) The place of the current executor. (Executor is running on GPU). This kernel type is what we expect the operator will be performed on. We let this kernel type as expect kernel type.

              -

              We transform the input data from actual to expect if the actual kernel type is not as same as expect kernel type.

              -

              The algorithm is described as following

              -
              void OperatorWithKernel::Run(
              -        const Scope& scope,
              -        const platform::Place& place) const {
              -  ExecutionContext ctx(...);
              -  auto expected_kernel_key = this->GetExpectedKernelType(ctx);
              -
              -  Scope& new_scope = scope.NewScope();
              -
              -  for (auto& var_name : this->Inputs()) {
              -    auto* tensor_in = GetTensor(var_name);
              -    auto kernel_type_for_var = this->GetKernelTypeForVar(...);
              -    if (kernel_type_for_var.place_ != expected_kernel_key.place_) {
              -      auto* trans_var = new_scope.Var(var_name);
              -      auto* out = DataTransform(expected_kernel_key,
              -                                kernel_type_for_var,
              -                                *tensor_in);
              -      CopyVariableWithTensor(...);
              -    }
              -  }
              -
              -  auto kernel = kernels.find(expected_kernel_key);
              -  kernel->Compute(ExecutionContext(...));
              -}
              -
              -
              -

              then the actual process for the multi-device above will be:

              -
              OP1(CPUPlace)
              -     |
              -op1_2_op2(on CPU)
              -     |
              -[transform](from CPU to GPU)
              -     |
              -op1_2_op2(on GPU)
              -     |
              -OP2(CUDAPlace)
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/memory_optimization.html b/develop/doc/design/memory_optimization.html deleted file mode 100644 index 3262578c739fc055614b1780e67559f75dbee4cc..0000000000000000000000000000000000000000 --- a/develop/doc/design/memory_optimization.html +++ /dev/null @@ -1,446 +0,0 @@ - - - - - - - - - - - - - Memory Optimization — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Memory Optimization

              -
              -

              Problem

              -

              In a lecture from Andrew Ng, he attributes the recent sucess of AI due to a combination of these:

              -
                -
              • Availability of Big Data
              • -
              • Supercomputing power to process this Big Data over very large neural networks
              • -
              • Modern algorithms
              • -
              -

              Following graph shows the details:

              -

              -

              Larger model usually bring better performance. However, GPU memory is limited. For example, the memory size of a GTX TITAN X is only 12GB. To train complex and large models, we have to take care of memory usage. Besides, memory optimization is also necessary in both online/mobile inference.

              -
              -
              -

              Solution

              -
              -

              Basic Strategy

              -

              There are some basic strategies to improve memory usage, including in-place operations and memory sharing.

              -
              -

              In-place Operation

              -

              In a relu activation operator:

              -

              $y = \max(x, 0)$

              -

              If the variable x is not used in any other operator, we can make an in-place operation. In other words, the memory block of variable y and variable x will be the same. In-place operations will save 50% memory occupancy immediately.

              -
              -
              -

              Memory Sharing

              -

              Not all operators support in-place operations. Memory sharing is a more general strategy.

              -

              Following is an example:

              -
              a = op1(b, c);
              -d = op2(a)
              -e = op3(d, f)
              -
              -
              -

              In this case, variable a is no longer used, and op2 does not support in-place operation. After op2 finishes, we can put the memory of variable a to a memory pool. Then, variable e can share the memory of variable a from the pool.

              -
              -
              -
              -

              Live Variable Analysis

              -

              It’s not enough to only have some basic strategies. The pre-requisite of memory optimization is to know if a variable is still “live” after an operation.

              -

              In our design, the neural network topology is defined as a program. Luckily, live variable analysis is a classic problem in compilers which can be used in many stages, such as register allocation.

              -

              In compilers, the front end of the compiler translates programs into an intermediate language with an unbounded number of temporary variables. This program must run on a machine with a bounded number of registers. Two temporary variables a and b can fit into the same register, if a and b are never “in use” at the same time. Thus, many temporary variables can fit in few registers; if they don’t all fit, the excess tempory variables can be kept in memory.

              -

              Therefore, the compiler needs to analyze the intermediate-representation program to determine which temporary variables are in use at the same time. We say a variable is “live” if it holds a value that may be needed in the future, so this analysis is called liveness analysis.

              -

              We can leran these techniques from compilers. There are mainly two stages to make live variable analysis:

              -
                -
              • construct a control flow graph
              • -
              • solve the dataflow equations
              • -
              -
              -

              Control Flow Graph

              -

              To perform analysis on a program, it is often useful to make a control flow graph. A control flow graph (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. Each statement in the program is a node in the flow graph; if statemment x can be followed by statement y, there is an egde from x to y.

              -

              Following is the flow graph for a simple loop.

              -

              -
              -
              -

              Dataflow Analysis

              -

              Liveness of variable “flows” around the edges of the control flow graph; determining the live range of each variable is an example of a dataflow problem. Dataflow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program.

              -

              A simple way to perform data-flow analysis of programs is to set up dataflow equations for each node of the control flow graph and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes.

              -
                -
              • Flow Graph Terminology
              • -
              -

              A flow graph node has out-edges that lead to sucessor nodes, and in-edges that come from predecessor nodes. The set pred[n] is all the predecessors of node n, and succ[n] is the set of sucessors. -In former control flow graph, the out-edges of node 5 are 5 –> 6 and 5 –> 2, and succ[5] = {2, 6}. The in-edges of 2 are 5 –> 2 and 1 –> 2, and pred[2] = {1, 5}.

              -
                -
              • Uses and Defs
              • -
              -

              An assignmemt to a variable or temporary defines that variable. An occurence of a variable on the right-hand side of an assginment(or in other expressions) uses the variable. We can define the def of a variable as the set of graph nodes that define it; or the def of a graph node as the set of variables that it defines; and the similarly for the use of a variable or graph node. In former control flow graph, def(3) = {c}, use(3) = {b, c}.

              -
                -
              • Liveness
              • -
              -

              A variable is live on an edge if there is a directed path from that edge to a use of the variable that does not go through any def. A variable is live-in at a node if it is live on any of the in-edges of that node; it is live-out at a node if it is live on any of the out-edges of the node.

              -

              The calcution of liveness can be solved by iteration until a fixed pointer is reached. Following is the recursive formula:

              -

              -
              -
              -
              -

              Memory optimization transpiler

              -

              At last, we take basic strategy and liveness analysis techniques learning from compilers to implement our memory optimization transpiler.

              -
              -

              add in-place attribute

              -

              In-place is a built-in attribute of an operator. Since we treat in-place and other operators differently, we have to add an in-place attribute for every operator.

              -
              -
              -

              contruct control flow graph

              -

              Following is the ProgramDesc protobuf of machine translation example.

              -
                -
              • Block0:
              • -
              -
              lookup_table
              -mul
              -...
              -while(sub-block idx 1)
              -...
              -array_to_lod_tensor
              -cross_entropy
              -...
              -while_grad(sub-block idx 2)
              -read_from_array
              -array_to_lod_tensor
              -...
              -
              -
              -
                -
              • Block1
              • -
              -
              read_from_array
              -read_from_array
              -...
              -write_to_array
              -increment
              -write_to_array
              -less_than
              -
              -
              -
                -
              • Block2
              • -
              -
              read_from_array
              -increment
              -...
              -write_to_array
              -write_to_array
              -
              -
              -

              We can transfer all the operators and variables in ProgramDesc to build a control flow graph.

              -
              class ControlFlowGraph(object):
              -    def __init__(self, Program):
              -        self._sucessors = defaultdict(set)
              -        self._presucessors = defaultdict(set)
              -        self._uses = defaultdict(set)
              -        self._defs = defaultdict(set)
              -        self._live_in = defaultdict(set)
              -        self._live_out = defaultdict(set)
              -        self._program = Program
              -    
              -    def build(self):
              -        pass
              -    
              -    def dataflow_analysis(self):
              -        pass
              -        
              -    def memory_optimization(self):
              -        pass
              -        
              -    def get_program(self):
              -        return self._program
              -
              -
              -
              -
              -

              Make dataflow analysis

              -

              We follow the guide from compilers and try to solve the dataflow equation to get liveness of every variable. If the live-in of an operator node is different from the live-out, then we can make memory sharing.

              -

              For example:

              -
              a = op1(b, c);
              -d = op2(a)
              -e = op3(d, f)
              -
              -
              -

              The dataflow analysis result is:

              -
              live_in(op1) = {b, c, f}
              -live_out(op1) = {a, f}
              -
              -live_in(op2) = {a, f}
              -live_out(op2) = {d, f}
              -
              -live_in(op3) = {d, f}
              -live_out(op3) = {}
              -
              -
              -

              After op1, we can process variable b and variable c; After op2, we can process variable a. After op3, we can process variable d and variable f.

              -
              -
              -

              memory sharing policy

              -

              A memory pool will be mantained in the stage of memory optimization. Each operator node will be scanned to determine memory optimization is done or not. If an operator satifies the requirement, following policy will be taken to handle input/output variables.

              -
              if op.support_inplace():
              -    i --> pool
              -    pool --> o
              -else:
              -    pool --> o
              -    i --> pool
              -
              -
              -
              -
              -
              - -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/mkl/mkl_packed.html b/develop/doc/design/mkl/mkl_packed.html deleted file mode 100644 index bf4d80cf6b8aae8a0b75b01ddc5513938b85c7e1..0000000000000000000000000000000000000000 --- a/develop/doc/design/mkl/mkl_packed.html +++ /dev/null @@ -1,380 +0,0 @@ - - - - - - - - - - - - - Intel® MKL Packed on PaddlePaddle: Design Doc — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Intel® MKL Packed on PaddlePaddle: Design Doc

              - -
              -

              Overview

              -

              我们计划将 Intel® MKL 中引入的 GEMM Packed APIs[1] 集成到 PaddlePaddle 中,充分发挥英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。 -现阶段的优化主要针对 Recurrent Neural Network(以下简称RNN)相关层(包括RecurrentLayer, GatedRecurrentLayerLstmLayer), 以及 PaddlePaddle V1 API。

              -
              -
              -

              Key Points

              -
              -

              Background

              -

              目前PaddlePaddle采用了 Intel® MKL库的cblas_?gemm函数,这个函数本身会在计算前将原数据转换为更适合英特尔平台的内部格式。

              -
                -
              1. 转换耗时 这一数据格式的转换操作(Packing),在问题本身的计算量比较小的时候,显得相对来说较为耗时。例如在DeepSpeech2 [2] 的Vanilla RNN部分中,矩阵大小是batch_size * 2048
              2. -
              3. 转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
              4. -
              -

              为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:

              -
              - -
              -

              通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。

              -
              -
              -

              Solution

              -

              在RNN的情况下,同一次前向、后向(forward/backward)过程中所有时间步(time step)共享同一个权重(weight)。当只做推断(inference)时,各次前向之间也都使用了相同的权重,没有必要在每次前向中每个时间步的计算时对权重进行重复的Packing操作。

              -

              我们通过使用新引入的GEMM Packed APIs,在层初始化的时候,先完成对权重的Packing操作,然后在前向,后向时复用已经转换过的权重,并在每次权重更新后,对新的权重进行转换用于下次迭代。

              -
                -
              • 优化前,对于序列长度(sequence length)为T的网络模型(model), N次迭代执行的转换次数为:
                  -
                • inferenceN * T
                • -
                • training2 * N * T
                • -
                -
              • -
              • 优化后,对于同样设置的网络模型,其转换次数减少至:
                  -
                • inference1
                • -
                • training2 * N
                • -
                -
              • -
              -
              -
              -
              -

              Actions

              -

              添加的相关文件和目录结构如下:

              -
              PaddlePaddle/Paddle
              -├── ...
              -└── paddle/
              -    ├── ...
              -    └── gserver/
              -        ├── ...
              -        ├── layers/
              -        │   ├── ...
              -        │   ├── MKLPackedRecurrentLayer.*
              -        |   ├── MKLPackedGatedRecurrentLayer.*
              -        |   ├── MKLPackedLstmLayer.*
              -        |   └── MKLPackedGemm.h
              -        └── tests/
              -            ├── ...
              -            └── test_MKLPacked.cpp
              -
              -
              -
              -

              CMake

              -

              在对应的CMakeLists.txt中根据WITH_MKL是否打开,来决定是否开启MKL Packed相关功能。

              -
              -
              -

              Layers

              -

              所有的MKLPacked*Layer都继承于PaddlePaddle的基类Layer, 并添加头文件 MKLPackedGemm.h,该文件对相关GEMM Packed APIs做了封装。

              -
              -
              -

              Unit Tests

              -

              我们会添加test_MKLPacked.cpp用于MKL Packed优化后layer的测试。 -对于每一个新加的RNN layer,我们会对比如下2个方面:

              -
                -
              1. 对比优化后layer自身,sequence mode(rnn_use_batch=false)与batch mode(rnn_use_batch=true)的结果。
              2. -
              3. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
              4. -
              -
              -
              -

              Python API

              -

              计划在paddle/utils.Flags中添加use_mkl_packed的flag,用于选择是否使用相关功能,并且当编译时WITH_MKL=ON的情况下,默认设置为true

              -

              同时,在python/paddle/trainer/config_parser.py中对应的layer处,添加use_mkl_packed这个选择,方便用户在Python端选择是否启用这个功能。

              -

              具体实现方式比如:

              -
              use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0)))
              -if use_mkl_packed:
              -    self.layer_type = mkl_packed_*
              -
              -
              -

              所有相关的layer_type会以*mkl_packed_*开头,这些会在MKLPacked*Layer注册layer的时候保证,以示区分。

              -
              -
              -

              Benchmarking

              -

              会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。

              -
              -
              - -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/mkl/mkldnn.html b/develop/doc/design/mkl/mkldnn.html deleted file mode 100644 index 3344b42fbedb2c0c31f85df1a41f88c4300bbee9..0000000000000000000000000000000000000000 --- a/develop/doc/design/mkl/mkldnn.html +++ /dev/null @@ -1,461 +0,0 @@ - - - - - - - - - - - - - Intel® MKL-DNN on PaddlePaddle: Design Doc — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Intel® MKL-DNN on PaddlePaddle: Design Doc

              -

              我们计划将英特尔深度神经网络数学库Intel MKL-DNN -(Intel Math Kernel Library for Deep Neural Networks)集成到PaddlePaddle, -充分展现英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。

              -
              -
              -Figure 1. PaddlePaddle on IA -

              近期目标

              -
                -
              • 完成常用Layer的MKL-DNN实现。
              • -
              • 完成常见深度神经网络VGG,GoogLeNet 和 ResNet的MKL-DNN实现。
              • -
              -

              目前的优化,主要针对PaddlePaddle在重构之前的代码框架以及V1的API。 -具体的完成状态可以参见这里

              - -
              -

              Overview

              -

              我们会把MKL-DNN会作为第三方库集成进PaddlePaddle,与其他第三方库一样,会在编译PaddlePaddle的时候下载并编译MKL-DNN。

              -

              同时,为了进一步提升PaddlePaddle在基本数学运算的计算速度,我们也将MKLML即(MKL small library[1]) -作为另一个第三方库集成进PaddlePaddle,它只会包括生成好的动态库和头文件。

              -

              MKL,MKLML以及MKL-DNN三者关系如下表:

              -

              | Name | Open Source | License | Descriptions | -| :———- | :————— | :———- | :———— | -| MKL | No | Proprietary | Accelerate math processing routines | -| MKLML | No | Proprietary | Small package of MKL, especially for Machine Learning | -| MKL-DNN | Yes | Apache 2.0 | Accelerate primitives processing routines especially for Deep Neural Networks |

              -

              MKLML可以与MKL-DNN共同使用,以此达到最好的性能。

              -
              -
              -Figure 2. PaddlePaddle with MKL Engines -
              -
              -

              Actions

              -

              添加的相关文件和目录结构如下:

              -
              PaddlePaddle/Paddle
              -├── ...
              -├── cmake/
              -│   ├── external/
              -│   │   ├── ...
              -│   │   ├── mkldnn.cmake
              -│   │   └── mklml.cmake
              -└── paddle/
              -    ├── ...
              -    ├── math/
              -    │   ├── ...
              -    │   └── MKLDNNMatrix.*
              -    └── gserver/
              -        ├── ...
              -        ├── layers/
              -        │   ├── ...
              -        │   └── MKLDNN*Layer.*
              -        ├── activations/
              -        │   ├── ...
              -        │   └── MKLDNNActivations.*
              -        └── tests/
              -            ├── ...
              -            ├── MKLDNNTester.*
              -            └── test_MKLDNN.cpp
              -
              -
              -
              -

              CMake

              -

              CMakeLists.txt中提供一个与MKL有关的总开关:WITH_MKL,它负责决定编译时是否使用MKLML和MKL-DNN

              -
                -
              • WITH_MKLML 控制是否使用MKLML库。 -当打开WITH_MKL时,会自动使用MKLML库作为PaddlePaddle的CBLAS和LAPACK库,同时会开启Intel OpenMP用于提高MKLML的性能。 -编译时会把对应的头文件和库放在build/third_party/install/mklml/*目录下对应的地方。 -MKLML的库目前都是动态库,主要包括libiomp5.solibmklml_intel.so
              • -
              • WITH_MKLDNN 控制是否使用MKL-DNN。 -当开启WITH_MKL时,会自动根据硬件配置[2]选择是否编译MKL-DNN。 -编译时会把对应的头文件和库放在build/third_party/install/mkldnn/*目录下对应的地方。 -MKL-DNN的库目前只有动态库libmkldnn.so
              • -
              -
              -
              -

              Matrix

              -

              目前在PaddlePaddle中数据都是以NCHW的格式存储,但是在MKL-DNN中的排列方式不止这一种。 -所以我们定义了一个MKLDNNMatrix用于管理MKL-DNN数据的不同格式以及相互之间的转换。

              -
              -
              -Figure 3. MKLDNNMatrix -
              -
              -

              Layers

              -

              所有MKL-DNN的Layers都会继承于MKLDNNLayer,该类继承于PaddlePaddle的基类Layer。 -在MKLDNNLayer中会提供一些必要的接口和函数,并且会写好forwardbackward的基本逻辑, -子类只需要使用定义好的接口,实现具体的函数功能即可。

              -
              -
              -Figure 4. MKLDNNLayer -

              每个MKLDNNLayer都包含用于内部存储和外部存储的一系列MKLDNNMatrix:

              -
                -
              • 内部存储(internel memory):inVal_,inGrad_,outVal_outGrad_,分别代表输入数据,输入梯度,输出数据和输出梯度。
              • -
              • 外部存储(external memory):都是以ext开头,比如extInVal_extInGrad_,它们主要是用于, -当数据格式与PaddlePaddle默认的NCHW格式不匹配时,转换内存的工作。 -需要注意的是,PaddlePaddle的activation会直接使用output_.valueoutput_.grad, -所以extOutVal_extOutGrad_必须分别与output_.valueoutput_.grad共享内存, -如果不需要外部存储用于转换,那么对应的内部存储也会与它们共享内存。
              • -
              • 转换函数(resetXXX): 包括resetInValueresetInGradresetOutValueresetOutGrad, -表示对输入数据,输入梯度,输出数据和输出梯度的转换。 -这些函数会根据输入参数重新设置内部和外部存储,当然这两者也可以相等,即表示不需要转换。
              • -
              -

              注意:每个MKLDNNlayer的子类只需要使用内部存储就可以了,所有外部的转换工作都会在reset系列函数中都准备好。

              -
              -
              -

              Activations

              -

              在重构前的PaddlePaddle中,激活函数是独立于Layer的概念,并且输入输出都是共用一块内存, -所以添加了对应的MKLDNNActivation来实现,方式类似于MKLDNNLayer

              -
              -
              -

              Parameters

              -

              对于有参数的层,我们会保证MKLDNNLayer使用的参数与PaddlePaddle申请的buffer共用一块内存。 -如果存在数据排列格式不一样的情况时,我们会在网络训练之前把格式转换为MKL-DNN希望的格式, -在训练结束的时候再保存为PaddlePaddle的格式,但是整个训练过程中不需要任何转换。 -这样既使得最终保存的参数格式与PaddlePaddle一致,又可以避免不必要的转换。

              -
              -
              -

              Gradients

              -

              由于MKL-DNN的操作都是直接覆盖的形式,也就是说输出的结果不会在原来的数据上累加, -这样带来的好处就是不需要一直清空memory,节省了不必要的操作。 -但是注意的是,当网络出现分支且在backward的时候,需要累加不同Layer传过来的梯度。 -所以在MKLDNNlayer中实现了一个merge的方法,此时每个小分支的Input Gradient -会先临时保存在MKLDNNMatrix中,由分支处的Layer负责求和,并把结果放到当前层的output_.grad中。 -所以整体上,在实现每个子类的时候就不需要关心分支的事情了。

              -
              -
              -Figure 5. Merge Gradients -
              -
              -

              Unit Tests

              -

              我们会添加test_MKLDNN.cppMKLDNNTester.*用于MKL-DNN的测试。 -测试分为每个Layer(或Activation)的单元测试和简单网络的整体测试。 -每个测试会对比PaddlePaddle中CPU算出的结果与MKL-DNN的结果,小于某个比较小的阈值认为通过。

              -
              -
              -

              Python API

              -

              目前只考虑v1 API

              -

              计划在python/paddle/trainer/config_parser.py里面添加use_mkldnn这个选择,方便用户选择使用MKL-DNN的layers。

              -

              具体实现方式比如:

              -
              use_mkldnn = bool(int(g_command_config_args.get("use_mkldnn", 0)))
              -if use_mkldnn
              -    self.layer_type = mkldnn_*
              -
              -
              -

              所有MKL-DNN的layer_type会以*mkldnn_*开头,这些会在MKLDNN*Layer注册layer的时候保证,以示区分。

              -

              同时,会在paddle/utils.Flags中添加一个use_mkldnn的flag,用于选择是否使用MKL-DNN的相关功能。

              -
              -
              -

              Benchmarking

              -

              会添加相应的脚本在这里,用于测试和对比在使用MKL-DNN前后的CNN网络性能。 -测试的性能对比结果会在IntelOptimizedPaddle.md

              -
              -
              -

              Others

              -
                -
              1. 如果在使用MKL-DNN的情况下,会把CPU的Buffer对齐为4096,具体可以参考MKL-DNN中的memory
              2. -
              3. 深入PaddlePaddle,寻找有没有其他可以优化的可能,进一步优化。比如可能会用OpenMP改进SGD的更新性能。
              4. -
              -
              -
              -
              -

              Design Concerns

              -

              为了更好的符合PaddlePaddle的代码风格[3],同时又尽可能少的牺牲MKL-DNN的性能[4]。

              -

              我们总结出一些特别需要注意的点:

              -
                -
              1. 使用**deviceId_**。为了尽可能少的在父类Layer中添加变量或者函数, -我们决定使用已有的deviceId_变量来区分layer的属性,定义-2MKLDNNLayer特有的设备ID。
              2. -
              3. 重写父类Layer的init函数,修改deviceId_-2,代表这个layer是用于跑在MKL-DNN的环境下。
              4. -
              5. 创建MKLDNNBase,定义一些除了layer和memory相关的类和函数。 -包括MKL-DNN会用到MKLDNNStreamCPUEngine,和未来可能还会用到FPGAEngine等。
              6. -
              7. 如果MKL-DNN layer的后面接有cpu device,那么就会使output_.valueextOutVal_共享内存, -同时数据格式就是NCHW,这样下一个cpu device就能拿到正确的数据。 -在有普通的CPU layer时, extOutVal_extOutGrad_的格式始终是NCHW或者NC
              8. -
              -
              -
              -

              References

              -
                -
              1. MKL small libraryIntel MKL的一个子集。 -主要包括了深度学习相关的数学原语与操作,一般由MKL-DNN在发布新版本时一起更新。
              2. -
              3. MKL-DNN System Requirements。 -目前在PaddlePaddle中,仅会在支持AVX2指令集及以上的机器才使用MKL-DNN。
              4. -
              5. 原来的方案会引入nextLayer的信息。 -但是在PaddlePaddle中,无论是重构前的layer还是重构后的op,都不会想要知道next layer/op的信息。
              6. -
              7. MKL-DNN的高性能格式与PaddlePaddle原有的NCHW不同(PaddlePaddle中的cuDNN部分使用的也是NCHW,所以不存在这个问题)。 -所以需要引入一个转换方法,并且只需要在必要的时候转换这种格式,才能更好的发挥MKL-DNN的性能。
              8. -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/mkl/mkldnn_fluid.html b/develop/doc/design/mkl/mkldnn_fluid.html deleted file mode 100644 index 498613830b6cc59a209dc1b1aa623739311e43bb..0000000000000000000000000000000000000000 --- a/develop/doc/design/mkl/mkldnn_fluid.html +++ /dev/null @@ -1,405 +0,0 @@ - - - - - - - - - - - - - Design Doc: Add MKLDNN Kernel in Fluid Operator — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Add MKLDNN Kernel in Fluid Operator

              -
              -

              Principles

              -

              First of all, we should follow some basical principles like:

              -
                -
              1. How to write a new operator. We are trying to add a new kind of kernel into operators, so basically we should follow this doc.
              2. -
              3. Supporting new Device/Library. Since MKLDNN is a new library to fluid, we should add MKLDNNDeviceContext and maybe mkldnn_helper.h, just like cudnn_helper.h.
              4. -
              5. Switch Kernel. Another important point is that we should ensure the data synchronization between different kernel types, which is this topic. So basically we should override GetExpectedKernelType and trans functions to support switching kernels.
              6. -
              7. The Keys of Operator Kernel Type. Kernel Type is a pivotal conception which can record the Place, Library, DataType and Layout.
              8. -
              -
              -
              -

              Sulution

              -

              In general, there are four parts we should follow to run a MKL-DNN primitive.

              -
                -
              • Create a primitive descriptor that describe this operator
              • -
              • Create a primitive itself by primitive descriptor and the engine
              • -
              • Create all memory buffers that primitive needed
              • -
              • Launch a stream to execute the primitive created -More details can refer to here.
              • -
              -

              It’s better to avoid reinitialization of primitives and memory handles in the first three stages in every iteration. So we plan to create a map to record all the primitive and memory, which should not take too much memories as discussed here.

              -

              It’s assumed that following three conditions should be satisfied.

              -
                -
              1. there is a unique key for each operator instance. May be the actual name of Output Tensor.
              2. -
              3. the Input Tensor inside Compute function is the one after converted.
              4. -
              5. we can get the phase(eg. is_test) inside Compute function, otherwise we need to expose this attribue to user.
              6. -
              -
              -

              Compute

              -

              The algorithm of Compute would be described as follow, let’s take conv like an example.

              -
                PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), "It must use CPUPlace.");
              -  PADDLE_ENFORCE(platform::is_mkldnn_library(ctx.GetLibrary()), "It must use MKLDNN Library.");
              -
              -  auto& dev_ctx = ctx.template device_context<platform::MKLDNNDeviceContext>();
              -
              -  // find primitive by unique key from mkldnn context
              -  // the op_key should be a unique name of this op instance
              -  auto& p = dev_ctx.findPrimitive(op_key + "_fwd");
              -
              -  // assuming the input tensor inside this compute function is the one after converted
              -  // this point should be guarantee by another mechanism
              -  auto& i = dev_ctx.findMemory(op_key + "_input");
              -  
              -  if (p == nullptr || i == nullptr || inputSizeChanged(p, i))  {
              -    auto fwd_primitive_desc = createPrimitiveDesc(ctx);
              -    auto* input = ctx.Input<Tensor>("Input");
              -    auto* filter = ctx.Input<Tensor>("Filter");
              -    auto* output = ctx.Output<Tensor>("Output");
              -    shared_ptr<mkldnn::memory> in(new mkldnn::memory(fwd_primitive_desc->src_primitive_desc(), input->data<T>()));
              -    shared_ptr<mkldnn::memory> wgt(new mkldnn::memory(fwd_primitive_desc->weights_primitive_desc(), filter->data<T>()));
              -    shared_ptr<mkldnn::memory> out(new mkldnn::memory(fwd_primitive_desc->dst_primitive_desc(), output->mutable_data<T>(ctx.GetPlace())));
              -    shared_ptr<mkldnn::conv_fwd> fwd_primitive(new mkldnn::conv_fwd(*fwd_primitive_desc, *in, *wgt, *out));
              -
              -    dev_ctx.addMemory(op_key+"_input", in);
              -    dev_ctx.addMemory(op_key+"_output", out);
              -    dev_ctx.addMemory(op_key+"_filer", wgt);
              -    dev_ctx.addPrimitive(op_key+"_fwd", fwd_primitive);
              -    dev_ctx.addPrimitiveDesc(op_key+"_fwd_PD", fwd_primitive_desc);
              -  }
              -
              -  p = dev_ctx.findPrimitive(op_key + "_fwd");
              -
              -  PADDLE_ENFORCE(p, "Should have forward Primitive");
              -  PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_input"), "Should have input memory");
              -  PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_output"), "Should have output memory");
              -  PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_filter"), "Should have filter memory");
              -  PADDLE_ENFORCE(dev_ctx.findPrimitiveDesc(op_unique_key+"_fwd_PD"), "Should have forward PrimitiveDesc");
              -  dev_ctx.submit(p);
              -  dev_ctx.execute();  // the convert primitive should have already contained.
              -
              -
              -

              The createPrimitiveDesc returns the primitive descripotor of this operator, would be like this:

              -
                auto* input = ctx.Input<Tensor>("Input");
              -  auto* filter = ctx.Input<Tensor>("Filter");
              -  auto* output = ctx.Output<Tensor>("Output");
              -  std::vector<int> strides = ctx.Attr<std::vector<int>>("strides");
              -  std::vector<int> paddings = ctx.Attr<std::vector<int>>("paddings");
              -  std::vector<int> dilations = ctx.Attr<std::vector<int>>("dilations");
              -  int groups = ctx.Attr<int>("groups");
              -  algorithm algo = static_cast<algorithm>(ctx.Attr<int>("convolution_algorithm_option"));
              -  prop_kind pk = ctx.Attr<bool>("is_test") ? prop_kind::forward_inference : prop_kind::forward_training;
              -    
              -  auto fwd_desc = mkldnn::conv_fwd::desc(/* all the setting above*/);
              -  shared_ptr<mkldnn::conv_fwd::primitive_desc> fwd_primitive_desc(new mkldnn::conv_fwd::primitive_desc(fwd_desc, ctx.getEngine()));
              -
              -  return fwd_primitive_desc;
              -  }
              -
              -
              -
              -
              -

              MKLDNNDeviceContext

              -

              MKLDNNDeviceContext, which is very straightforward, should contain some base information like: stream, engine and the map needed.

              -
              -
              -

              mkldnn_helper

              -

              Some functions would be put in paddle/platform/mkldnn_helper.h.

              -
                -
              • create MKLDNN memories
              • -
              • create MKLDNN primitives
              • -
              • error check function
              • -
              • etc
              • -
              -
              -
              -

              Kernel Switch

              -

              We should reorder the different Layout from other device or to other device. GetExpectedKernelType and trans functions can help us to implement it.

              -

              GetExpectedKernelType should get the context, and this operator can return the best KernelType. -trans would be like this:

              -
              void trans(inputs, ctx) override {
              -  if (NoNeedTrans()) {
              -    return;
              -  }
              -  // find reorder primitive by op_key from context
              -  auto& dev_ctx = ctx.template device_context<platform::MKLDNNDeviceContext>();
              -  auto& p = dev_ctx.findPrimitive(op_key + "_reorder_input");
              -  auto& i = dev_ctx.findMemory(op_key + "_src_input");
              -
              -  if (p == nullptr || i == nullptr || changeSized(i, input)) {
              -    auto prim = createPrimitiveDesc(ctx);
              -    auto src = createMemory(memoryDesc(input->dims(), actual_layout), input->data);
              -    auto newbuffer = paddle::memory::Alloc(ctx.GetPlace(), input->size_in_bytes());
              -    auto dst = createMemory(p->expected_desc(), newbuffer->data);
              -    auto reorder_primitive(new mkldnn::reorder(src, dst));
              -
              -    dev_ctx.addMemory(op_key+"_src_input", src);
              -    dev_ctx.addMemory(op_key+"_input", dst);
              -    dev_ctx.addPrimitive(op_key+"_reorder_input", reorder_primitive);
              -  }
              -
              -  p = dev_ctx.findPrimitive(op_key + "_reorder_input");
              -  PADDLE_ENFORCE(p, "Should have Reorder Primitive");
              -  dev_ctx.submit(p);
              -  if (! this->isMKLDNNKernel()) {
              -    // execute immediately only if this is not mkldnn kernel function.
              -    // otherwise, it can be executed with the operator primitive in Compute
              -    dev_ctx.stream();
              -  }
              -  // after submit, the input tensor in ExecutionContext should be changed as the converted one
              -  // there should be another mechanism to ensure this
              -}
              -
              -
              -
              -
              -

              Unit Test

              -

              All the functions should be tested corresponding. -TBD

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/model_format.html b/develop/doc/design/model_format.html deleted file mode 100644 index 2e1fbe31105827eab64196978c8ab9ace04983d4..0000000000000000000000000000000000000000 --- a/develop/doc/design/model_format.html +++ /dev/null @@ -1,285 +0,0 @@ - - - - - - - - - - - - - Design Doc: Model Format — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Model Format

              -
              -

              Motivation

              -

              A model is an output of the training process. One complete model consists of two parts, the topology and the parameters. In order to support industrial deployment, the model format must be self-complete and must not expose any training source code.

              -

              As a result, In PaddlePaddle, the topology is represented as a ProgramDesc, which describes the model structure. The parameters contain all the trainable weights in the model. We must support large size parameters and efficient serialization/deserialization of parameters.

              -
              -
              -

              Implementation

              -

              The topology is saved as a plain text in a detailed self-contain protobuf file.

              -

              The parameters are saved as a binary file. As we all know, the protobuf message has a limit of 64M size. We have done a benchmark experiment, which shows that protobuf is not fit for the task.

              -

              As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a LoDTensor, and has a description information proto of LoDTensorDesc. We save the DescProto as the byte string header. It contains all the necessary information, such as the dims, and the LoD information in LoDTensor. A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,

              -

              The table below shows a tensor’s byte view in detail. Note that all the signed values are written in the little-endian format.

              -

              |field name | type | description | -| — | — | — | -| version | uint32_t | Version of saved file. Always 0 now. | -| tensor desc length | uint32_t | TensorDesc(Protobuf message) length in bytes. | -| tensor desc | void* | TensorDesc protobuf binary message | -| tensor data | void* | Tensor’s data in binary format. The length of tensor_data is decided by TensorDesc.dims() and TensorDesc.data_type() | -| lod_level | uint64_t | Level of LoD | -| length of lod[0] | uint64_t | [Optional] length of lod[0] in bytes. | -| data of lod[0] | uint64_t* | [Optional] lod[0].data() | -| ... | ... | ... |

              -
              -
              -

              Summary

              -
                -
              • We introduce a model format.
              • -
              • The model represented by its forward-pass computation procedure is saved in a ProgramDesc protobuf message.
              • -
              • A bunch of specified format binary tensors describe the parameters.
              • -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/multi_language_interface/00.why_plain_c.html b/develop/doc/design/multi_language_interface/00.why_plain_c.html deleted file mode 100644 index 49bc6e27917a58f641e26ce9d570579a4d11e222..0000000000000000000000000000000000000000 --- a/develop/doc/design/multi_language_interface/00.why_plain_c.html +++ /dev/null @@ -1,391 +0,0 @@ - - - - - - - - - - - - - Paddle多语言接口实现 — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Paddle多语言接口实现

              -
              -

              背景

              -

              Paddle需要一个多语言接口,这个接口需要做到:

              -
                -
              • 有标准的,良好的文档
                  -
                • 例如Python可以使用Sphinx生成API文档,golang可以使用GoDoc生成文档。这都需要这个接口按照约定俗成的规则来注释完备。
                • -
                -
              • -
              • 不同语言的接口适应不同语言的特性
                  -
                • 例如Java与Python的错误处理是直接扔出来Exception,而对于golang错误处理应该使用返回值。
                • -
                -
              • -
              -
              -
              -

              基本要求

              -

              Paddle的多语言接口实现包括一下几个方面:

              -
                -
              • 我们使用动态库来分发Paddle。在这个动态库中不嵌入任何其他语言的解释器,也不使用其他动态库。
              • -
              • 这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号。
              • -
              • 不导出Paddle内部的结构体、类,仅仅使用void*指针作为类型的句柄(handler)。
              • -
              • 不使用SWIG这种代码生成器,而是手写多语言绑定。
              • -
              -
              -
              -

              原因

              -
              -

              使用动态库来分发Paddle

              -
                -
              • Paddle的链接方式比较复杂
                  -
                • 如果用户要把Paddle的静态库(libpaddle.a)链接到自己的程序里,得使用 --whole-archive (for GCC) 或者 --force_load (for Clang) 参数,来确保把 libpaddle.a 里所有的符号都写入自己的程序的二进制文件里。这是因为 Paddle 的源码里使用了object factory design pattern
                • -
                -
              • -
              • 编译型语言,例如C/C++使用静态库和动态库难度差不多。但是解释性语言,例如Python或者Java,只能调用Paddle的动态库,否则得把Paddle静态库链接到解释器里。
                  -
                • 解释性语言实际运行的二进制是解释器本身,如果调用静态库只能将静态库与解释器链接。例如对于Java来说,便是将静态库加入JVM中。这对于通常的Java的开发者来说,是不常见的做法。
                • -
                -
              • -
              -
              -
              -

              动态库中不嵌入任何其他语言的解释器

              -
                -
              • 目前Paddle的进程模型是C++内部驱动Python解释器进行模型配置解析和数据读取
              • -
              • 我们最终的动态库中不嵌入Python或者其他任何语言的解释器。模型配置解析,数据读取均交由其他语言完成
              • -
              -

              现阶段Paddle有一个问题是,Paddle内嵌的Python解释器和外部使用的Python如果版本不同,会直接报错退出。

              -
              -
              -

              Paddle动态库中,不引用其他动态库

              -
                -
              • 即这个动态库是不依赖于其他任何文件的,可以在任何机器上执行的。
              • -
              -
              -
              -

              这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号

              -
                -
              • 由于C++编译器没有名字修饰的规范,不同版本的编译器之间,对于同一段C++代码生成的符号可能不一致。而多语言接口需要直接读取生成的二进制(动态库),需要有稳定的导出符号。
              • -
              • C语言是有导出符号的标准的,并且在常见的平台上,都是ABI调用标准的。
              • -
              • 大多数语言都支持使用C语言API
              • -
              • 使用C99而不使用C89,是因为C99支持Fixed-width integer typesBoolean type
              • -
              • 使用C99而不使用C11的原因是,C11并没有Paddle特别需要的特性,且C99相对于C11使用更加广泛。
              • -
              -
              -
              -

              不导出Paddle内部的结构体、类,仅仅使用void*指针作为类型的句柄(handler)

              -
                -
              • Paddle内部的类为C++书写,直接导出到C的接口比较困难。
              • -
              • 在C-API中使用void*来表示Paddle内部类。再在每一个API中自己检查类型。
              • -
              -

              在C的头文件 paddle_matrix.h 中:

              -
              typedef void* paddle_matrix;
              -typedef int paddle_error;
              -
              -extern "C"
              -paddle_error paddle_matrix_get_shape(paddle_matrix matrix,
              -                                     uint64_t* width,
              -                                     uint64_t* height);
              -
              -
              -

              而在CPP里面实现这个C的接口,文件 paddle_matrix.cpp

              -
              #include "paddle/math/matrix.h"
              -extern "C"
              -paddle_error paddle_matrix_shape(paddle_matrix matrix,
              -                                 uint64_t *width,
              -                                 uint64_t *height) {
              -  auto m = (paddle::capi::CMatrix*)(matrix);
              -  *width = m->width();
              -  *height = m->height();
              -}
              -
              -
              -

              其中paddle/capi/CMatrix.hpp文件内容为:

              -
              namespace paddle {
              -namespace math {  
              -
              -class CMatrix {
              -  std::shared_ptr<paddle::Matrix> mat;
              -};
              -
              -}  // namespace math
              -}  // namespace paddle
              -
              -
              -
              -
              -

              不使用SWIG这种代码生成器,而是手写多语言绑定

              -
                -
              • SWIG是一个多语言接口的代码生成器。他的目标是使用C/C++写代码,SWIG直接读取C/C++的头文件,生成各种语言的绑定代码。
                  -
                • 对于多语言接口,SWIG需要写一个interface文件。这个文件具有独特的语法,学习成本高。且增加一个第三方语言,就需要对这个第三方语言增加一些定义。有的时候,interface文件的写法非常tricky。社区贡献代码学习成本高。
                • -
                • SWIG暴露的接口保留了C++的接口样式,很难保证多语言代码风格的一致性。(函数命名,错误处理)
                    -
                  • 因为SWIG在第三方语言中暴露的函数名,类名和C++中完全一致。C++的命名风格并不能适应其他第三方语言。如果使用SWIG我们需要将在interface文件里,将大量的SomeCppClass重命名成some_python_class,或者SomeGoTypes
                  • -
                  • 对于不同语言,错误处理的方式也不尽相同。例如对于Java或者Python,最常见的错误处理方式是Exception,而对于Golang,错误处理方式是返回值。而SWIG只能简单的暴露C++接口,无法做到对于各种语言错误处理方式的适配。
                  • -
                  -
                • -
                • 对于大多数语言,直接使用C语言的.h并不困难。例如Python的cffi或者Cython, golang的cgo
                • -
                • SWIG支持的语言或者解释器有局限。例如对于Python,使用SWIG只支持CPython解释器,而不支持PyPy解释器。
                • -
                -
              • -
              -
              -
              -
              -

              原因列表

              -

              | 结论 | 对比 | 原因 | -|—| — | — | -| 使用动态库 | 不使用静态库 | 解释型语言只能调用动态库,Paddle静态库链接复杂 | -| 不嵌入其他语言解释器 | 不嵌入Python解释器 | Paddle C++目前嵌入Python解释器,会导致不同版本Python在一个进程里的bug | -| 不引用其他动态库 | | Paddle一个动态库可以在任何Linux系统上运行 | -| 使用C99做接口 | 不使用C++做接口 | C有标准的ABI,C99是目前C最广泛的使用标准,且C99支持bool类型和定长整数(uint64_t等)类型 | -| 使用void*作为类句柄 | 不显示的写每个类具体包含什么| 实现简单,并且让接口脱离实现细节 | -| 手写多语言绑定 | 不使用SWIG | 使用SWIG需要多语言绑定的开发人员熟练掌握SWIG配置,社区参与困难。SWIG生成的代码不能保证多语言代码风格的一致性 |

              -
              -
              -

              实现

              -

              参考Inference implementation

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/multi_language_interface/01.inference_implementation.html b/develop/doc/design/multi_language_interface/01.inference_implementation.html deleted file mode 100644 index 968376c8a1fc18eb4b14149324d91d20eb922a13..0000000000000000000000000000000000000000 --- a/develop/doc/design/multi_language_interface/01.inference_implementation.html +++ /dev/null @@ -1,390 +0,0 @@ - - - - - - - - - - - - - C-API 模型推断实现文档 — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              C-API 模型推断实现文档

              -

              本文档描述Paddle C-API的实现细节。Paddle C-API是多语言API的基础部分。Paddle需要暴露的API很多。先实现模型推断的API,通过模型推断API的实现作为一个样例,来进行讨论。至于为什么需要C-API,请参考Why Plain C

              - -
              -

              暴露接口原则

              -
                -
              1. 所有的接口均为C接口。即使用extern "C"
              2. -
              3. 除构造某种类型的函数(paddle_matrix_create等),其他函数均返回paddle_error。且调用时不能抛出异常或出现运行时错误。
              4. -
              5. 所有类型名为paddle_类型名,所有与类型相关的函数,函数名为paddle_类型名_函数名
              6. -
              7. 如果某一个Paddle Core概念(GradientMachine/Matrix)需要被暴露到其他语言,那么
                  -
                • 为了暴露的接口尽量简单。只暴露概念的接口,而不暴露概念的实现。即暴露GradientMachine或者Matrix但不暴露RecurrentGradientMachineCpuSparseMatrix
                • -
                • 暴露这个概念必要函数。必要是指,即完成某一个任务的最少函数。
                • -
                -
              8. -
              9. 不在capi接口层做过多封装。
                  -
                • 如果某一个Paddle概念必须要暴露,但是又过于琐碎。不在capi这一层进行封装,而是直接修改Paddle Core。让Paddle核心中,这一概念不再琐碎。
                • -
                -
              10. -
              -
              -
              -

              目录结构

              -
              Paddle
              -  `-- paddle
              -        `-- capi
              -              `-- examples  # The example project for C-API.
              -              `-- tests  # unittests for C-API
              -              `-- capi.h  # C-API header file.
              -              `-- capi_private.h  # The shared header file between implementation sources.
              -              `-- matrix.{h, cpp}
              -              `-- gradient_machine.{h, cpp}
              -              `-- ...
              -
              -
              -

              Paddle的C-API目录结构如上图表所示。这个目录中除了capi_private.h之外的所有头文件,均会被安装到include/paddle路径下。C-API生成的二进制文件会被安装到lib目录下。即,安装后的目录结构为

              -
              `-- include
              -      `-- paddle
              -             `-- capi.h
              -             `-- matrix.h
              -             `-- gradient_machine.h
              -             `-- ...
              -`-- lib
              -     `-- libpaddle_capi_shared.{so, dylib}  # In mac, dynamic libary's file name extention is `dylib`
              -     `-- libpaddle_capi_whole.a  # static library for all symbols of Paddle.
              -
              -
              -
              -
              -

              实现方式

              -

              下面分别介绍某一类文件的实现方式。

              -
              -

              capi.h

              -

              capi.h是用户使用C-API时所唯一需要引入的头文件。在capi.h中,引入了类型的头文件,matrix.h, gradient_machine.h。在引入其他类型的头文件时,使用相对路径的引用方式。即#include "matrix.h"

              -
              -
              -

              具体某种类型的头文件

              -

              具体某种类型的头文件,即例如matrix.hgradient_machine.h等。在这些头文件中,包含了某种类型的类型定义和暴露的全部函数。

              -

              这个头文件不假设其他文件的引用顺序,即使用户直接引用某种类型的头文件,也不应该报错(虽然不鼓励这样)。如果某一个类型需要引用另一个类型,例如gradient_machine需要引用matrix,则直接引入另一种类型的头文件,即#include "matrix.h"

              -
              -
              -

              capi_private.h

              -

              capi_prviate.h是各个实现中共享的头文件,他主要包含了实际暴露的类型结构。在用户使用C-API时,Paddle的类型全部退化成void *,即typedef paddle_matrix void*。但,对于每种C-API暴露的类型,均是在capi_private.h中实现的结构体。

              -
              struct CMatrix {
              -   int type = MatrixType;
              -   std::shared_ptr<paddle::Matrix> mat;
              -};
              -
              -
              -

              通常,这个结构体包含两个项目。

              -
                -
              • type是一个类型的标志。对于每种类型,type字段均不尽相同。这样,即使C-API接受的类型全是void *,我们也可以确定每一个参数的类型。

                -
                void some_c_api_function(void* some_instance) {
                -   int* type = (int *) some_instance;
                -   switch (*type) {
                -     case MatrixType:
                -       CMatrix* mat = (CMatrix *) some_instance;
                -       ...
                -     ...
                -   }
                -}
                -
                -
                -
              • -
              • 这个结构体中的另一个项目是,Paddle Core中这一类型接口的智能指针(shared_ptr)。

                -
                  -
                • 使用智能指针的原因是: 用户可以安全的释放某个C-API的实例,而不必在意Paddle Core是否还在使用这个实例。
                • -
                • 例如,用户通过C-API获得了神经网络的参数实例。当用户使用完这个参数后,直接删除这个参数即可。即便Paddle Core中的模型还在使用这个参数,这个参数也不会一并删除。
                • -
                -
              • -
              -
              -
              -

              具体某种类型的实现文件

              -

              具体某种类型的实现文件,即matrix.cpp, gradient_machine.cpp等文件。在这些文件中,使用C++ 11实现了C-API的接口,并且使用extern "C"导出这些接口。在实现过程中,对输入参数的安全性进行了必要的判断,并将C-API接口的参数转发给Paddle Core

              -
              -
              -

              libpaddle_capi_shared.{so, dylib}

              -

              libpaddle_capi_shared是C-API导出的动态库。这个动态库的连接参数与Paddle的其他二进制(例如paddle_trainer)类似。用户可以直接使用这个动态库来引入Paddle C-API。具体使用方法为-lpaddle_capi_shared

              -
              -
              -

              libpaddle_capi_whole.a

              -

              libpaddle_capi_whole是C-API导出的静态库。这个静态库包含了Paddle的全部符号。他是将libpaddle_gserver.a, libpaddle_math.a, libpaddle_capi.a等全部静态库中的目标文件全部打包后产生的文件。具体使用方法为--whole-archive -lpaddle_capi_whole --no-whole-archive

              -
              -
              -

              examples

              -

              在样例中,使用C99开发了模型预测的样例代码。具体请参考example/README.md

              -
              -
              -
              -

              编译选项

              -

              C-API的编译选项默认关闭,打开这个编译选项,需要在cmake的时候,设置

              -
              cmake ${YOUR_SOURCE_ROOT} -DWITH_C_API=ON -DWITH_PYTHON=OFF -DWITH_SWIG_PY=OFF
              -
              -
              -

              编译C-API的时候推荐Paddle不嵌入Python解释器,也不生成SWIG接口,具体原因参考Why Plain C

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/operator_kernel_type.html b/develop/doc/design/operator_kernel_type.html deleted file mode 100644 index d45a4297fd52a5ee4a9d731d1550920d63b1d151..0000000000000000000000000000000000000000 --- a/develop/doc/design/operator_kernel_type.html +++ /dev/null @@ -1,327 +0,0 @@ - - - - - - - - - - - - - Design Doc: The Keys of Operator Kernel Type — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: The Keys of Operator Kernel Type

              -
              -

              Problem

              -

              An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses OpKernelType as a key to identify a unique kernel. Before an operator runs, a certain type of kernel must be chosen via a key of OpKernelType. Currently, OpKernelType is defined as follows:

              -
              struct OpKernelType {
              -  platform::Place place_;
              -  proto::DataType data_type_;
              -};
              -
              -
              -

              For more details, please refer to codes in github.

              -

              It contains two keys, Place and DataType. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys do not provide enough information. We need a more complete representation of OpKernelType.

              -

              We often implement a kernel of an operator with some computing library on certain device(place). Please note that computing library and device do not have a one-to-one correspondence. A device can have a lot of computing libraries and a computing library can also support different devices.

              -

              For example, Eigen library supports Nvidia GPU/AMD GPU/CPU and MKLDNN library supports Intel CPU/Intel FPGA. Both Place and Library should be a key of OpKernelType.

              -

              Different DataTypes, such as fp64/fp32/int8, will obviously have different kernels. But different data layout of a Tensor will also lead to different implementations. Please refer to the batch norm operator kernels as an example. Data layout should also be taken into consideration.

              -
              -
              -

              Solution

              -

              There are four keys to determine a kernel type of an operator: Place/Library/DataType/Layout.

              -
              struct OpKernelType {
              -  platform::Place place_;
              -  platform::Library library_;
              -  proto::DataType data_type_;
              -  framework::Layout layout_;
              -};
              -
              -
              -

              The details are as follows:

              -
              -

              Place

              -

              Place is defined as:

              -
              typedef boost::variant<CUDAPlace, ROCmPlace, FPGAPlace, CPUPlace> Place;
              -
              -
              -

              Place represents the device memory where data is located.

              -
              -
              -

              Library

              -

              One operator kernel is usually implemented based on one library. Library is defined as a enum variable:

              -
              enum Library { Plain, MKLDNN, CUDNN };
              -
              -
              -

              We use Plain enumerator to represent default library. Since most operators in Fluid are implemented based on the Eigen library, we take Eigen library as the Plain enumerator. -A library usually has a corresponding DeviceContext which contains some handles needed for computation. Fluid now has two default DeviceContexts for CPU and CUDA, namely, CPUDeviceContext and CUDADeviceContext. CPUDeviceContext contains an Eigen library handle and CDUADeviceContext contains an Eigen library handle and a cuBLAS handle.

              -

              If we want to support new library, a new enumerator need to be added to Library and a corresponding new LibraryDeviceContext need to be created.

              -
              -
              -

              DataType

              -

              DataType is defined in framework.proto. Currently, int32/int64/fp32/fp64 are supported.

              -
              -
              -

              Layout

              -

              Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout.

              -

              Different layout leads to different implementation of the operator kernel. There are mainly 4 principles we have to follow to support layout in our Fluid framework.

              -
                -
              • We take layout as a data member of Tensor. Layout is actually a enum variable. If Fluid is built with MKLDNN, then the memory format in MKLDNN will also be added into this enum variable.
              • -
              • Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout for generating data. Of course, we can have some default layout, like NCHW.
              • -
              • The inference of Layout is at run-time, not at compile-time.
              • -
              • Every operator has to implement different kernels for different layouts. Let’s take MKLDNN as an example. If we want to implement an MKLDNN convolution operator, we have to implement all the kernels for different layouts, which are listed here. And we will have a special macro to register kernels for MKLDNN operators.
              • -
              -

              Layout is also defined as a enum variable:

              -
              enum Layout {
              -  kNCHW,
              -  kNHWC,
              -#ifdef PADDLE_WITH_MKLDNN
              -  knChw8c
              -  ...
              -#endif
              -};
              -
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/ops/rnn.html b/develop/doc/design/ops/rnn.html deleted file mode 100644 index a14d6e7c309168df3361618c4dba966ac462dfa5..0000000000000000000000000000000000000000 --- a/develop/doc/design/ops/rnn.html +++ /dev/null @@ -1,386 +0,0 @@ - - - - - - - - - - - - - RNNOp design — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              RNNOp design

              -

              This document describes the RNN (Recurrent Neural Network) operator and how it is implemented in PaddlePaddle. The RNN op requires that all instances in a mini-batch have the same length. We will have a more flexible dynamic RNN operator in the future.

              -
              -

              RNN Algorithm Implementation

              -

              - -

              The above diagram shows an RNN unrolled into a full network.

              -

              There are several important concepts here:

              -
                -
              • step-net: the sub-graph that runs at each step.
              • -
              • memory, $h_t$, the state of the current step.
              • -
              • ex-memory, $h_{t-1}$, the state of the previous step.
              • -
              • initial memory value, the memory of the first (initial) step.
              • -
              -
              -

              Step-scope

              -

              There could be local variables defined in each step-net. PaddlePaddle runtime realizes these variables in step-scopes which are created for each step.

              -

              -
              -Figure 2 illustrates the RNN's data flow -

              Please be aware that every step runs the same step-net. Each step does the following:

              -
                -
              1. Creates the step-scope.
              2. -
              3. Initializes the local variables including step-outputs, in the step-scope.
              4. -
              5. Runs the step-net, which uses the above mentioned variables.
              6. -
              -

              The RNN operator will compose its output from step outputs in each of the step scopes.

              -
              -
              -

              Memory and Ex-memory

              -

              Let’s give more details about memory and ex-memory using a simple example:

              -

              $$ -h_t = U h_{t-1} + W x_t -$$,

              -

              where $h_t$ and $h_{t-1}$ are the memory and ex-memory (previous memory) of step $t$ respectively.

              -

              In the implementation, we can make an ex-memory variable either “refer to” the memory variable of the previous step, -or copy the memory value of the previous step to the current ex-memory variable.

              -
              -
              -

              Usage in Python

              -

              For more information on Block, please refer to the design doc.

              -

              We can define an RNN’s step-net using a Block:

              -
              import paddle as pd
              -
              -X = some_op() # x is some operator's output and is a LoDTensor
              -a = some_op()
              -
              -# declare parameters
              -W = pd.Variable(shape=[20, 30])
              -U = pd.Variable(shape=[20, 30])
              -
              -rnn = pd.create_rnn_op(output_num=1)
              -with rnn.stepnet():
              -    x = rnn.add_input(X)
              -    # declare a memory (rnn's step)
              -    h = rnn.add_memory(init=a)
              -    # h.pre_state(), the previous memory of rnn
              -    new_state = pd.add_two( pd.matmul(W, x) + pd.matmul(U, h.pre_state()))
              -    # update current memory
              -    h.update(new_state)
              -    # indicate that h variables in all step scopes should be merged
              -    rnn.add_outputs(h)
              -
              -out = rnn()
              -
              -
              -

              Python API functions in above example:

              -
                -
              • rnn.add_input: indicates that the parameter is a variable that will be segmented into step-inputs.
              • -
              • rnn.add_memory: creates a variable used as the memory.
              • -
              • rnn.add_outputs: marks the variables that will be concatenated across steps into the RNN output.
              • -
              -
              -
              -

              Nested RNN and LoDTensor

              -

              An RNN whose step-net includes other RNN operators is known as an nested RNN.

              -

              For example, we could have a 2-level RNN, where the top level corresponds to paragraphs, and the lower level corresponds to sentences. Each step of the higher level RNN also receives an input from the corresponding step of the lower level, and additionally the output from the previous time step at the same level.

              -

              The following figure illustrates feeding in text into the lower level, one sentence at a step, and the feeding in step outputs to the top level. The final top level output is about the whole text.

              -

              - -

              import paddle as pd
              -
              -W = pd.Variable(shape=[20, 30])
              -U = pd.Variable(shape=[20, 30])
              -
              -W0 = pd.Variable(shape=[20, 30])
              -U0 = pd.Variable(shape=[20, 30])
              -
              -# a is output of some op
              -a = some_op()
              -
              -# chapter_data is a set of 128-dim word vectors
              -# the first level of LoD is sentence
              -# the second level of LoD is a chapter
              -chapter_data = pd.Variable(shape=[None, 128], type=pd.lod_tensor, level=2)
              -
              -def lower_level_rnn(paragraph):
              -    '''
              -    x: the input
              -    '''
              -    rnn = pd.create_rnn_op(output_num=1)
              -    with rnn.stepnet():
              -        sentence = rnn.add_input(paragraph, level=0)
              -        h = rnn.add_memory(shape=[20, 30])
              -        h.update(
              -            pd.matmul(W, sentence) + pd.matmul(U, h.pre_state()))
              -        # get the last state as sentence's info
              -        rnn.add_outputs(h)
              -    return rnn
              -
              -top_level_rnn = pd.create_rnn_op(output_num=1)
              -with top_level_rnn.stepnet():
              -    paragraph_data = rnn.add_input(chapter_data, level=1)
              -    low_rnn = lower_level_rnn(paragraph_data)
              -    paragraph_out = low_rnn()
              -
              -    h = rnn.add_memory(init=a)
              -    h.update(
              -        pd.matmul(W0, paragraph_data) + pd.matmul(U0, h.pre_state()))
              -    top_level_rnn.add_outputs(h)
              -
              -# output the last step
              -chapter_out = top_level_rnn(output_all_steps=False)
              -
              -
              -

              In the above example, the construction of the top_level_rnn calls lower_level_rnn. The input is an LoD Tensor. The top level RNN segments input text data into paragraphs, and the lower level RNN segments each paragraph into sentences.

              -

              By default, the RNNOp will concatenate the outputs from all the time steps. -If the output_all_steps is set to False, it will only output the final time step.

              -

              - -

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/ops/sequence_decoder.html b/develop/doc/design/ops/sequence_decoder.html deleted file mode 100644 index d0a58d5352fceffb4f7b9b6ddd8acbbef50a9b0e..0000000000000000000000000000000000000000 --- a/develop/doc/design/ops/sequence_decoder.html +++ /dev/null @@ -1,461 +0,0 @@ - - - - - - - - - - - - - Design: Sequence Decoder Generating LoDTensors — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design: Sequence Decoder Generating LoDTensors

              -

              In tasks such as machine translation and visual captioning, -a sequence decoder is necessary to generate sequences, one word at a time.

              -

              This documentation describes how to implement the sequence decoder as an operator.

              -
              -

              Beam Search based Decoder

              -

              The beam search algorithm is necessary when generating sequences. It is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.

              -

              In the old version of PaddlePaddle, the C++ class RecurrentGradientMachine implements the general sequence decoder based on beam search, due to the complexity involved, the implementation relies on a lot of special data structures that are quite trivial and hard to be customized by users.

              -

              There are a lot of heuristic tricks in the sequence generation tasks, so the flexibility of sequence decoder is very important to users.

              -

              During the refactoring of PaddlePaddle, some new concepts are proposed such as: LoDTensor and TensorArray that can better support the sequence usage, and they can also help make the implementation of beam search based sequence decoder more transparent and modular .

              -

              For example, the RNN states, candidates IDs and probabilities of beam search can be represented all as LoDTensors; -the selected candidate’s IDs in each time step can be stored in a TensorArray, and Packed to the sentences translated.

              -
              -
              -

              Changing LoD’s absolute offset to relative offsets

              -

              The current LoDTensor is designed to store levels of variable-length sequences. It stores several arrays of integers where each represents a level.

              -

              The integers in each level represent the begin and end (not inclusive) offset of a sequence in the underlying tensor, -let’s call this format the absolute-offset LoD for clarity.

              -

              The absolute-offset LoD can retrieve any sequence very quickly but fails to represent empty sequences, for example, a two-level LoD is as follows

              -
              [[0, 3, 9]
              - [0, 2, 3, 3, 3, 9]]
              -
              -
              -

              The first level tells that there are two sequences:

              -
                -
              • the first’s offset is [0, 3)
              • -
              • the second’s offset is [3, 9)
              • -
              -

              while on the second level, there are several empty sequences that both begin and end at 3. -It is impossible to tell how many empty second-level sequences exist in the first-level sequences.

              -

              There are many scenarios that rely on empty sequence representation, for example in machine translation or visual captioning, one instance has no translation or the empty candidate set for a prefix.

              -

              So let’s introduce another format of LoD, -it stores the offsets of the lower level sequences and is called relative-offset LoD.

              -

              For example, to represent the same sequences of the above data

              -
              [[0, 3, 6]
              - [0, 2, 3, 3, 3, 9]]
              -
              -
              -

              the first level represents that there are two sequences, -their offsets in the second-level LoD is [0, 3) and [3, 5).

              -

              The second level is the same with the relative offset example because the lower level is a tensor. -It is easy to find out the second sequence in the first-level LoD has two empty sequences.

              -

              The following examples are based on relative-offset LoD.

              -
              -
              -

              Usage in a simple machine translation model

              -

              Let’s start from a simple machine translation model that is simplified from the machine translation chapter to draw a blueprint of what a sequence decoder can do and how to use it.

              -

              The model has an encoder that learns the semantic vector from a sequence, and a decoder which uses the sequence encoder to generate new sentences.

              -

              Encoder

              -
              import paddle as pd
              -
              -dict_size = 8000
              -source_dict_size = dict_size
              -target_dict_size = dict_size
              -word_vector_dim = 128
              -encoder_dim = 128
              -decoder_dim = 128
              -beam_size = 5
              -max_length = 120
              -
              -# encoder
              -src_word_id = pd.data(
              -    name='source_language_word',
              -    type=pd.data.integer_value_sequence(source_dict_dim))
              -src_embedding = pd.embedding(size=source_dict_size, size=word_vector_dim)
              -
              -src_word_vec = pd.lookup(src_embedding, src_word_id)
              -
              -encoder_out_seq = pd.gru(input=src_word_vec, size=encoder_dim)
              -
              -encoder_ctx = pd.last_seq(encoder_out_seq)
              -# encoder_ctx_proj is the learned semantic vector
              -encoder_ctx_proj = pd.fc(
              -    encoder_ctx, size=decoder_dim, act=pd.activation.Tanh(), bias=None)
              -
              -
              -

              Decoder

              -
              def generate():
              -    decoder = pd.while_loop()
              -    with decoder.step():
              -        decoder_mem = decoder.memory(init=encoder_ctx)  # mark the memory
              -        generated_ids = decoder.memory() # TODO init to batch_size <s>s
              -        generated_scores = decoder.memory() # TODO init to batch_size 1s or 0s
              -
              -        target_word = pd.lookup(trg_embedding, gendrated_ids)
              -        # expand encoder_ctx's batch to fit target_word's lod
              -        # for example
              -        # decoder_mem.lod is
              -        # [[0 1 3],
              -        #  [0 1 3 6]]
              -        # its tensor content is [a1 a2 a3 a4 a5]
              -        # which means there are 2 sentences to translate
              -        #   - the first sentence has 1 translation prefixes, the offsets are [0, 1)
              -        #   - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)
              -        # the target_word.lod is
              -        # [[0, 1, 6]
              -        #  [0, 2, 4, 7, 9 12]]
              -        # which means 2 sentences to translate, each has 1 and 5 prefixes
              -        # the first prefix has 2 candidates
              -        # the following has 2, 3, 2, 3 candidates
              -        # the encoder_ctx_expanded's content will be
              -        # [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5]
              -        encoder_ctx_expanded = pd.lod_expand(encoder_ctx, target_word)
              -        decoder_input = pd.fc(
              -            act=pd.activation.Linear(),
              -            input=[target_word, encoder_ctx_expanded],
              -            size=3 * decoder_dim)
              -        gru_out, cur_mem = pd.gru_step(
              -            decoder_input, mem=decoder_mem, size=decoder_dim)
              -        scores = pd.fc(
              -            gru_out,
              -            size=trg_dic_size,
              -            bias=None,
              -            act=pd.activation.Softmax())
              -        # K is an config
              -        topk_scores, topk_ids = pd.top_k(scores, K)
              -        topk_generated_scores = pd.add_scalar(topk_scores, generated_scores)
              -
              -        selected_ids, selected_generation_scores = decoder.beam_search(
              -            topk_ids, topk_generated_scores)
              -
              -        # update the states
              -        decoder_mem.update(cur_mem)  # tells how to update state
              -        generated_ids.update(selected_ids)
              -        generated_scores.update(selected_generation_scores)
              -
              -        decoder.output(selected_ids)
              -        decoder.output(selected_generation_scores)
              -
              -translation_ids, translation_scores = decoder()
              -
              -
              -

              The decoder.beam_search is an operator that, given the candidates and the scores of translations including the candidates, -returns the result of the beam search algorithm.

              -

              In this way, users can customize anything on the input or output of beam search, for example:

              -
                -
              1. Make the corresponding elements in topk_generated_scores zero or some small values, beam_search will discard this candidate.
              2. -
              3. Remove some specific candidate in selected_ids.
              4. -
              5. Get the final translation_ids, remove the translation sequence in it.
              6. -
              -

              The implementation of sequence decoder can reuse the C++ class: RNNAlgorithm, -so the python syntax is quite similar to that of an RNN.

              -

              Both of them are two-level LoDTensors:

              -
                -
              • The first level represents batch_size of (source) sentences.
              • -
              • The second level represents the candidate ID sets for translation prefix.
              • -
              -

              For example, 3 source sentences to translate, and has 2, 3, 1 candidates.

              -

              Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape, and an lod_expand operator is used to expand the LoD of the previous state to fit the current state.

              -

              For example, the previous state:

              -
                -
              • LoD is [0, 1, 3][0, 2, 5, 6]
              • -
              • content of tensor is a1 a2 b1 b2 b3 c1
              • -
              -

              the current state is stored in encoder_ctx_expanded:

              -
                -
              • LoD is [0, 2, 7][0 3 5 8 9 11 11]
              • -
              • the content is
                  -
                • a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates)
                • -
                • a2 a2
                • -
                • b1 b1 b1
                • -
                • b2
                • -
                • b3 b3
                • -
                • None (c1 has 0 candidates, so c1 is dropped)
                • -
                -
              • -
              -

              The benefit from the relative offset LoD is that the empty candidate set can be represented naturally.

              -

              The status in each time step can be stored in TensorArray, and Packed to a final LoDTensor. The corresponding syntax is:

              -
              decoder.output(selected_ids)
              -decoder.output(selected_generation_scores)
              -
              -
              -

              The selected_ids are the candidate ids for the prefixes, and will be Packed by TensorArray to a two-level LoDTensor, where the first level represents the source sequences and the second level represents generated sequences.

              -

              Packing the selected_scores will get a LoDTensor that stores scores of each translation candidate.

              -

              Packing the selected_generation_scores will get a LoDTensor, and each tail is the probability of the translation.

              -
              -
              -

              LoD and shape changes during decoding

              -

              - -

              According to the image above, the only phase that changes the LoD is beam search.

              -
              -
              -

              Beam search design

              -

              The beam search algorithm will be implemented as one method of the sequence decoder and has 3 inputs:

              -
                -
              1. topk_ids, the top K candidate ids for each prefix.
              2. -
              3. topk_scores, the corresponding scores for topk_ids
              4. -
              5. generated_scores, the score of the prefixes.
              6. -
              -

              All of these are LoDTensors, so that the sequence affiliation is clear. Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.

              -

              It will return three variables:

              -
                -
              1. selected_ids, the final candidate beam search function selected for the next step.
              2. -
              3. selected_scores, the scores for the candidates.
              4. -
              5. generated_scores, the updated scores for each prefix (with the new candidates appended).
              6. -
              -
              -
              -

              Introducing the LoD-based Pack and Unpack methods in TensorArray

              -

              The selected_ids, selected_scores and generated_scores are LoDTensors that exist at each time step, -so it is natural to store them in arrays.

              -

              Currently, PaddlePaddle has a module called TensorArray which can store an array of tensors. It is better to store the results of beam search in a TensorArray.

              -

              The Pack and UnPack in TensorArray are used to pack tensors in the array to an LoDTensor or split the LoDTensor to an array of tensors. -It needs some extensions to support the packing or unpacking an array of LoDTensors.

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/optimizer.html b/develop/doc/design/optimizer.html deleted file mode 100644 index c594f2642d29d7fdb231b8755de2fcc4fe09167c..0000000000000000000000000000000000000000 --- a/develop/doc/design/optimizer.html +++ /dev/null @@ -1,341 +0,0 @@ - - - - - - - - - - - - - Optimizer Design — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Optimizer Design

              -
              -

              The Problem

              -

              A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works:

              -
                -
              1. the forward pass, which computes intermediate results and the cost(s),
              2. -
              3. the backward pass, which derives gradients from intermediate results and costs, and
              4. -
              5. the optimization pass, which update model parameters to optimize the cost(s).
              6. -
              -

              These works rely on three kinds of operators:

              -
                -
              1. forward operators,
              2. -
              3. gradient operators, and
              4. -
              5. optimization operators.
              6. -
              -

              It’s true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically.

              -

              In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass.

              -
              -
              -

              High-level Python API to describe the training process

              -
                -
              1. User write code to describe the network:

                -
                images = layer.data("images")
                -labels = layer.data("labels")
                -w1 = pd.var("w1")
                -b1 = pd.var("b1")
                -hidden = layer.fc(images, w=w1, b=b1)
                -cost = layer.mse(hidden, labels)
                -
                -
                -

                The above code snippet will create forward operators in Block.

                -
              2. -
              -
                -
              1. Users create a certain kind of Optimizer with some argument.

                -
                optimizer = AdagradOptimizer(learing_rate=0.001)
                -
                -
                -
              2. -
              3. Users use the optimizer to minimize a certain cost through updating parameters in parameter_list.

                -
                opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1])
                -
                -
                -

                The above code snippet will create gradient and optimization operators in Block. The return value of minimize() is list of optimization operators that will be run by session.

                -
              4. -
              5. Users use Session/Executor to run this opt_op_list as target to do training.

                -
                sess.run(target= opt_op_list, ...)
                -
                -
                -
              6. -
              -
              -

              Optimizer Python interface:

              -
              class Optimizer(object):
              -    """Optimizer Base class.
              -
              -    """
              -
              -    def __init__(self):
              -        pass
              -
              -    def create_optimization_pass(self, parameters_and_grads):
              -        """Add optimization operators to update gradients to variables.
              -
              -        Args:
              -          parameters_and_grads: a list of (variable, gradient) pair to update.
              -
              -        Returns:
              -          optmization_op_list: a list of optimization operator that will update parameter using gradient.
              -        """
              -        return None
              -
              -    def minimize(self, loss, parameter_list):
              -        """Add operations to minimize `loss` by updating `parameter_list`.
              -
              -        This method combines interface `append_backward()` and
              -        `create_optimization_pass()` into one.
              -        """
              -        params_grads = self.create_backward_pass(loss, parameter_list)
              -        update_ops = self.create_optimization_pass(params_grads)
              -        return update_ops
              -
              -
              -
              -

              Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer.

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/paddle_nccl.html b/develop/doc/design/paddle_nccl.html deleted file mode 100644 index a3b4f33e2945802cd655dd6a77c5270d161a51b4..0000000000000000000000000000000000000000 --- a/develop/doc/design/paddle_nccl.html +++ /dev/null @@ -1,319 +0,0 @@ - - - - - - - - - - - - - Design Doc: NCCL support in Paddle Fluid — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: NCCL support in Paddle Fluid

              -
              -

              Abstract

              -

              This Design Doc refers to the NCCL feature in paddle. We propose an approach to support NCCL library both on a single machine and multiple machines. We wrapper the NCCL primitives Broadcast, Allreduce, Reduce as operators to utilize Multi-GPU powers in one script.

              -
              -
              -

              Motivation

              -

              NCCL is a NVIDIA library support Multi-GPU communicating and optimized for NVIDIA GPUs, it provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, that can achieve high bandwidth over PCIe and NVLink high-speed interconnect. With NCCL library, we can easily accelerate the training in parallel.

              -
                -
              • Pros
              • -
              -
                -
              1. easily plug-in with NCCL2 library.
              2. -
              3. high performance in NVIDIA GPUs.
              4. -
              5. MPI like primitives, which have low learning cost for users.
              6. -
              -
                -
              • Cons
              • -
              -
                -
              1. Only design for NVIDIA GPUs, not a general multi-device solution.
              2. -
              3. Although NCCL1 is opensourced under BSD license, but NCCL2 is not opensourced anymore.
              4. -
              -

              At the beginning of training, the framework needs to distribute the same parameters to every GPU, and merge the gradients at any time user interests.

              -

              As a result, during training, we need the operations of peer to peer copy between different GPUs, aggregating gradients/parameters from GPUs, and broadcasting parameters to GPUs. Every GPU only need to run the operator with correct place information.

              -

              Besides, it needs interfaces to synchronize model update with each different GPU Cards.

              -
              -
              -

              Implementation

              -

              As mentioned above, we wrap the NCCL routines as several kinds of operators. Need to note that NCCL need to create Communicator between gpu at the beginning, so there is a NCCLInit operator created.

              -
              -

              Transpiler

              -

              To be compatible with parameter server design doc, the transpiler compiles the user defined operation graph into sub-graphs to be executed on different devices.

              -
                -
              1. The user-defined model will be a single device program

                -
              2. -
              3. Broadcast/Reduce operators between GPUs will be inserted into the program, even for the multi-node, may insert the Send, Recv operator.

                -

                Broadcast, AllReduce in a single machine. And Broadcast, AllReduce, Send, Recv in multiple machines

                -

                -
              4. -
              -

              After compiling, the graph as shows

              -

              -

              Operators are added to the sub-graphs. Every GPU assigned a role of rank0, rank1 etc.

              -
                -
              • Broadcast. Broadcast operator distribute initialized parameter to all the GPUs from the GPU who owns it. e.g. fromrank0 GPU.
              • -
              • AllReduce. AllReduce operator synchronizes parameters/gradients between GPUs. AllReduce implemented in the Ring-Based communicating method, avoid of the bottle neck in a single GPU.
              • -
              -

              Need to notice that AllReduce operator force GPUs synchronized at that point. The whole training process in asynchronous or synchronous mode depends on the AllReduce point in the graph.

              -

              As it shown in the picture, when each GPU compute the gradient of W, followed with a AllReduce operator, accumulate the dW to full batch of data, then run the optimize process individually and apply the gradient to its W.

              -
                -
              • AllReduce -Need to note that our AllReduce operator is a ring-base AllReduce implementation. If we use the NCCL2 AllReduce primitive, every GPU optimized full batch of data, wasted (n-1) GPU compute resources. In addition, NCCL2 built-in AllReduce will only utilize the communicating resource during synchronization, then update the gradient will be a subsequent phase. In fact, we can amortize the update gradient time cost into the communicating phase. The process is
              • -
              -
                -
              1. Every parameter has its root card. That card will responsible for aggregating the gradients from GPUs.
              2. -
              3. The whole model’s parameter will be hashed to different root card, ensure the load balance between GPUs.
              4. -
              5. Logically neighberhood card will start send parameter to the next one. After one round, the parameter main card will aggregate the full gradients.
              6. -
              7. Then the root card will optimize the parameter.
              8. -
              9. This parameter card will send its optimized result to its neighberhood, then the neighberhood will send parameter to its next one.
              10. -
              11. Finish the sychronization round.
              12. -
              -

              The total time cost will be 2 * (n-1) * per-parameter-send-time, we reach the goal of amortize the upgrade time into communicating phase.

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/parallel_do.html b/develop/doc/design/parallel_do.html deleted file mode 100644 index fd76e820d83d4a6937c9b78b224d475daf109b5b..0000000000000000000000000000000000000000 --- a/develop/doc/design/parallel_do.html +++ /dev/null @@ -1,410 +0,0 @@ - - - - - - - - - - - - - Design Doc: Parallel_Do in PaddlePaddle — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Parallel_Do in PaddlePaddle

              -

              In PaddlePaddle, we use parallel_do primitive to represent multithread data parallel processing.

              -
              -

              Design overview

              -

              The definition of a parallel_do op looks like the following

              -
              AddInput(kInputs, "Inputs needed to be split onto different devices").AsDuplicable();
              -AddInput(kParameters, "Parameters are duplicated over different devices")
              -    .AsDuplicable();
              -AddInput(kPlaces, "Devices used for parallel processing");
              -AddOutput(kOutputs, "Outputs needed to be merged from different devices").AsDuplicable();
              -AddOutput(kParallelScopes,
              -          "Scopes for all local variables in forward pass. One scope for each device");
              -AddAttr<framework::BlockDesc *>(kParallelBlock,
              -                                "List of operaters to be executed in parallel");
              -
              -
              -

              A vanilla implementation of parallel_do can be shown as the following (| means single thread and -|||| means multiple threads)

              -
              In the forward pass
              -  |      Split input onto different devices
              -  |      Copy parameter onto different devices
              -  ||||   Compute forward pass in parallel
              -  |      Merge output from different devices
              -
              -In the backward pass
              -  |      Split output@grad onto different devices
              -  ||||   Compute backward pass in parallel
              -  |      accumulate param@grad from different devices to the first device
              -  |      Merge input@grad from different devices
              -  |      Copy param@grad to the place of parallel_do_op
              -
              -
              -

              This implementation allows to write mixed device program like this

              -
              W1 = fluid.tensor(size=[100,20], parameter=true)
              -W2 = fluid.tensor(size=[20,15], parameter=true)
              -
              -data = layers.data()
              -
              -gpu_places = layers.get_place(use_gpu=True)
              -# parallel processing on multiple GPUs
              -pd = ParallelDo(gpu_places)
              -with pd.do(input=data):
              -    prediction = softmax(fc(fc(data, W1), W2))
              -    write_output(prediction)
              -prediction = pd()
              -loss = cross_entropy(prediction, label)
              -
              -
              -

              And the programDesc are like the following

              -
              # start_program will be run by executor(CPUPlace), all w1, w2 will be allocated on CPU
              -start_program
              -{
              -  vars: w1, w2
              -  ops: init(w1), init(w2)
              -}
              -
              -main_program
              -{
              -block0 {
              -  vars: data, places, w1, w2, w1_grad, w2_grad,
              -  ops: data, get_place, parallel_do(block1),
              -       parallel_do_grad(block2),
              -       sgd(w2, w2_grad),
              -       sgd(w1, w1_grad)
              -}
              -block1 { # the forward pass
              -  parent_block: 0
              -  vars: data, h1, h2, loss
              -  ops: fc, fc, softmax
              -}
              -block2 { # the backward pass
              -  parent_block: 1
              -  vars: data_grad, h1_grad, h2_grad, loss_gard, local_w1_grad, local_w2_grad
              -  ops: softmax_grad,
              -       fc_grad
              -       fc_grad
              -}
              -}
              -
              -
              -
              -
              -

              Performance Imporvement

              -

              There are serial places we can make this parallel_do faster.

              -
              -

              forward: split input onto different devices

              -

              If the input of the parallel_do is independent from any prior opeartors, we can avoid this step by -prefetching the input onto different devices in a seperate background thread. And the python code -looks like this.

              -
              pd = ParallelDo(gpu_places)
              -with pd.do():
              -    feature = get_data_from_prefetch_queue(gpu_places)
              -    prediction = my_net(feature)
              -    write_output(activation)
              -
              -
              -
              -
              -

              forward: Copy parameter to onto different devices

              -

              We can avoid this step by making each device have a copy of the parameter. This requires:

              -
                -
              1. fluid.default_start_up_program() to be run on all devices
              2. -
              3. In the backward, allreduce param@grad at different devices, this requires
                  -
                1. backward.py add allreduce operators at parallel_do_grad
                2. -
                3. allreduce operators need to be called in async mode to achieve maximum throughput
                4. -
                -
              4. -
              5. apply gradients related op(i.e. cliping, normalization, decay, sgd) on different devices in parallel
              6. -
              -

              By doing so, we also avoided “backward: accumulate param@grad from different devices to the first device”. -And the ProgramDesc looks like the following

              -
              # w1, w2 will be allocated on all GPUs
              -start_program
              -{
              -block0 {
              -  parallel_do(block1)
              -}
              -block1 {
              -  parent_block: 0
              -  vars: w1, w2
              -  ops: init(w1), init(w2)
              -}
              -}
              -
              -main_program
              -{
              -block0 {
              -  vars: data, places, w1, w2
              -  ops: data, get_place, parallel_do(block1),
              -       parallel_do_grad(block2),      # append_backward
              -       parallel_do(block3)            # append_optimization
              -       
              -}
              -block1 {
              -  parent_block: 0
              -  vars: data, h1, h2, loss
              -  ops: fc, fc, softmax
              -}
              -block2 {
              -  parent_block: 1
              -  vars: data_grad, h1_grad, h2_grad, loss_gard, w1_grad, w2_grad
              -  ops: softmax_grad,
              -       fc_grad, allreduce(places, scopes, w1_grad),
              -       fc_grad, allreduce(places, scopes, w2_grad)
              -}
              -block3 {
              -  parent_block: 0
              -  vars: lr
              -  ops: sgd(w2, w2_grad),
              -       sgd(w1, w1_grad)
              -}
              -}
              -
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/parameter_average.html b/develop/doc/design/parameter_average.html deleted file mode 100644 index 35e5e47ac3b851f2164838843f6eadd0baf7a5d8..0000000000000000000000000000000000000000 --- a/develop/doc/design/parameter_average.html +++ /dev/null @@ -1,331 +0,0 @@ - - - - - - - - - - - - - Averaging Parameter in PaddlePaddle — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Averaging Parameter in PaddlePaddle

              -
              -

              Why Averaging

              -

              In a large scale machine learning setup where the size of the training data is huge, it could take us a large number of iterations over the training data before we can achieve the optimal values of parameters of our model. Looking at the problem setup, it is desirable if we can obtain the optimal values of parameters by going through the data in as few passes as we can.

              -

              Polyak and Juditsky (1992) showed that the test performance of simple average of parameters obtained by Stochastic Gradient Descent (SGD) is as good as that of parameter values that are obtained by training the model over and over again, over the training dataset.

              -

              Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for
              . The averaging is done as follows:

              -


              -

              We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above.

              -
              -

              How to perform Parameter Averaging in PaddlePaddle

              -

              Parameter Averaging in PaddlePaddle works in the following way during training :

              -
                -
              1. It will take in an instance of a normal optimizer as an input, e.g. RMSPropOptimizer
              2. -
              3. The optimizer itself is responsible for updating the parameters.
              4. -
              5. The ParameterAverageOptimizer maintains a separate copy of the parameters for itself:
                  -
                1. In concept, the values of this copy are the average of the values of the parameters in the most recent N batches.
                2. -
                3. However, saving all the N instances of the parameters in memory is not feasible.
                4. -
                5. Therefore, an approximation algorithm is used.
                6. -
                -
              6. -
              -

              Hence, overall we have have two copies of the parameters: one for the optimizer itself, and one for the ParameterAverageOptimizer. The former should be used in back propagation, while the latter should be used during testing and should be saved.

              -

              During the testing/ saving the model phase, we perform the following steps:

              -
                -
              1. Perform the delayed operations.
              2. -
              3. Save current values of the parameters to a temporary variable.
              4. -
              5. Replace the values of the parameters with the averaged values.
              6. -
              7. Perform testing and/or save the parameters.
              8. -
              9. Restore the values of the parameters once done.
              10. -
              -
              -
              -

              How to implement Averaging of Parameter in PaddlePaddle

              -

              We can add the ParameterAverageOptimizer op to the graph through Python API. Using this approach, we manually add this op to the graph and direct the output of the optimizer op to this op during training.

              -
              **Advantages**:
              -- Allows for greater flexibility to the users of PaddlePaddle. Using this approach, the users can plug different optimizers into ParameterAverageOptimizer by passing in the optimizer to the op.
              -- Makes it easy for the users to customize and extend the framework.
              -
              -**Disadvantages**:
              -- Implementation requires re-writing the averaging methodology in Python.  
              -
              -
              -
              -
              -

              Low-Level implementation

              -

              In the new design, we propose to create a new operation for averaging parameter updates (ParameterAverageOptimizer). For now, we can add an op that takes in the following as input:

              -
                -
              • the optimizer
              • -
              • the window_size to keep the updates
              • -
              -

              The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for Operators. We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU.

              -

              The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the layer functions in Python API.

              -
              -
              -

              Python API implementation for ParameterAverageOptimizer

              -

              Based on Polyak and Juditsky (1992), we can generalize the averaging of updates to any optimizer. The input to the op would be the following:

              -
                -
              • Any optimizer (RMSProp , AdaGrad etc.)
              • -
              • A window size. The op keeps accumulating updated parameter values over a window of N batches and takes an average. Move the averaged value to a buffer when window is full to avoid loss of precision.
              • -
              -

              Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle Python API design, the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions. -We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other Optimizers

              -
              -

              Creation of the ParameterAverageOptimizer operator

              -

              There are two ways for creating the ParameterAverageOptimizer op:

              -
                -
              1. We create the op immediately while building the computation graph.
              2. -
              3. We add the op in a lazy manner, just before the backward pass, similar to the way the optimization ops are added.
              4. -
              -

              The proposal is to add the op immediately while building the computation graph.

              -
              -
              -

              High-level API

              -

              In PaddlePaddle Python API, users will primarily rely on layer functions to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions.

              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/parameters_in_cpp.html b/develop/doc/design/parameters_in_cpp.html deleted file mode 100644 index ca1f3e481c8fbb2802dee392417b0995c1411670..0000000000000000000000000000000000000000 --- a/develop/doc/design/parameters_in_cpp.html +++ /dev/null @@ -1,290 +0,0 @@ - - - - - - - - - - - - - Design Doc: The C++ Class Parameters — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: The C++ Class Parameters

              -

              Parameters is a concept we designed in PaddlePaddle V2 API. Parameters is a container of parameters, which makes PaddlePaddle capable of sharing parameter between topologies. We described usages of Parameter in api.md.

              -

              We used Python to implement Parameters when designing V2 API before. There are several defects for the current implementation:

              -
                -
              • We just use memcpy to share Parameters between topologies, but this is very inefficient.
              • -
              • We did not support sharing Parameters while training. We just trigger memcpy when start training.
              • -
              -

              It is necessary that we implement Parameters in CPP side. However, it could result a code refactoring for PaddlePaddle, because PaddlePaddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current PaddlePaddle implementation, there are three concepts associated with Parameters:

              -
                -
              1. paddle::Parameter. A Parameters is a container for paddle::Parameter. -It is evident that we should use paddle::Parameter when developing Parameters. -However, the Parameter class contains many functions and does not have a clear interface. -It contains create/store Parameter, serialize/deserialize, optimize(i.e SGD), randomize/zero. -When we developing Parameters, we only use create/store Parameter functionality. -We should extract functionalities of Parameter into many classes to clean PaddlePaddle CPP implementation.
              2. -
              3. paddle::GradientMachine and its sub-classes, e.g., paddle::MultiGradientMachine, paddle::NeuralNetwork. -We should pass Parameters to paddle::GradientMachine when forward/backward to avoid memcpy between topologies. -Also, we should handle multi-GPU/CPU training, because forward and backward would perform on multi-GPUs and multi-CPUs. -Parameters should dispatch the parameter value to each device, and gather the parameter gradient from each device.
              4. -
              5. paddle::ParameterUpdater. The ParameterUpdater is used to update parameters in Paddle. -So Parameters should be used by paddle::ParameterUpdater, and paddle::ParameterUpdater should optimize Parameters (by SGD).
              6. -
              -

              The step by step approach for implementation Parameters in PaddlePaddle C++ core is listed below. Each step should be a PR and could be merged into PaddlePaddle one by one.

              -
                -
              1. Clean paddle::Parameter interface. Extract the functionalities of paddle::Parameter to prepare for the implementation of Parameters.
              2. -
              3. Implementation a Parameters class. It just stores the paddle::Parameter inside. Make GradientMachine uses Parameters as a class member.
              4. -
              5. Make Parameters support Multi-CPU and Multi-GPU training to prepare for sharing Parameter between topologies. -Because we need share Parameters between topologies, it is Parameters‘s response to exchange Parameters between GPUs. -GradientMachine should not handle how to exchange Parameters because GradientMachine only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one Parameters.
                  -
                • We should use a global function to exchange Parameters between GPUs, not a member function in Parameters. The MultiGradientMachine invoke this function, which uses Parameters as this function inputs.
                • -
                • The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
                • -
                -
              6. -
              7. Make Parameters as an argument for forward/backward function, not a data member for GradientMachine. For example, forward could be forward(const Parameters& params, ...) and backward could be backward(Parameters* params, ...). After this step, Paddle could share Parameters between topologies.
              8. -
              9. ParameterUpdater is invoked by GradientMachine and Trainer, but it updates Parameters. In the end of this code refactoring, we could change ParameterUpdater directly uses Parameters to make ParameterUpdater‘s implementation clear.
              10. -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/profiler.html b/develop/doc/design/profiler.html deleted file mode 100644 index 2c75b03fc2e37f301a6e71cdfd862250236cc45a..0000000000000000000000000000000000000000 --- a/develop/doc/design/profiler.html +++ /dev/null @@ -1,343 +0,0 @@ - - - - - - - - - - - - - Introduction — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Introduction

              -

              There are many performance analysis tools for different programming languages and different software frameworks. For most popular deep learning frameworks, they use several programming languages and adapt to heterogeneous platforms. Similar to most of the deep learning frameworks, PaddlePaddle also uses C++, CUDA and Python as the basic programming languages to adapt to run on CPU and GPU devices. The nvprof tools is usually used to analyse the CUDA program. We have a document to profile CPU and Python program by yep and Google’s perftools to profile only the CPU and Python program. But for PaddlePaddle fluid, the operator is the basic computing unit. The developers usually want to collect the time of each operator and locate bottlenecks. The nvprof usually collect the timeline of CUDA-related activities on both CPU and GPU, including kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels. And the yep and Google's perftools can’t collect the timeline for CUDA program. All these tools can’t collect time in the operator level. So we design this profiling tool.

              -
              -
              -

              Architecture

              -

              The work flow for most task is as follows. Each operator will run many times in the all iterations. So the profiler must collect the total time of each operator during the iteration. For more, sometimes, the developers may want to collect more detailed time span inside the operator or record time span for elsewhere, this requires that the profiler must support to record the nested time span. And in order to speedup training, all the deep learning frameworks support parallel computing, including multiple threads on CPU and multiple GPUs. So the profiler must be able to collect the timeline for each thread. In addition, the profiler also occupies certain resources. It must can be easily to be enabled or disabled by the developers. At last, the profiler should present a human-readable report.

              -
              for i in xrange(M):  # M is  the iteration number
              -  for op in operator_lists: # The `operator_lists` contains all the operators in the network.
              -    op.run();
              -
              -
              -

              In summary, the proflier should have following features:

              -
                -
              • records time span in loop.
              • -
              • supports nested time span.
              • -
              • supports multiple threads/multiple GPUs.
              • -
              • supports to be enabled and disabled by users.
              • -
              -

              But how to record the time for the mixed C++ and CUDA program? There many C++ APIs to get the current calendar time in host program. But for GPU, the CUDA kernels may be executed concurrently if they are in different streams and the CUDA kernels is asynchronous with the host program if there is no the synchronous aftern the CUDA kernels. CUDA provides event to monitor the device and perform accurate timing. Inspired by PyTorch and CUDA event, we also design and apply the events to record the timeline. Then summarize and present statistics based on these events.

              -

              The overall flow is shown as the following figure.

              -


              -
              -

              Event

              -

              In above work flow, a pair of events are needed before and after the piece of code to collect time. So the event has a flag to mark whether it is a starting event or an ending event. Except this two kinds of event, sometime, a only marker with a text message is needed, for example, a marker to specify the profiling start or end. There are three kinds of event:

              -
              enum EventKind {
              -  kMark,
              -  kPushRange,
              -  kPopRange};
              -
              -
              -
                -
              • kMark: only a marker without time range.
              • -
              • kPushRange: mark the starting event for time range.
              • -
              • kPopRange: mark the ending event for time range.
              • -
              -

              For the CPU code, the events only need to record the current time. For the CUDA code, the event management functions of CUDA are used. For many pieces of code, an event lists are used to record each piece.

              -
              class Event {
              - public:
              -  // The DeviceContext is used to get current  CUDA stream.
              -  Event(EventKind kind, std::string name, uint32_t thread_id,
              -        const platform::DeviceContext* dev_ctx = nullptr);
              -  double CpuElapsedUs(const Event& e) const;
              -  double CudaElapsedUs(const Event& e) const;
              -
              - private:
              -  EventKind kind_;
              -  std::string name_;
              -  uint32_t thread_id_;
              -  int64_t cpu_ns_;
              -#ifdef PADDLE_WITH_CUDA
              -  cudaEvent_t event_ = nullptr;
              -  int device_ = -1;
              -#endif
              -};
              -
              -struct EventList {
              -  std::forward_list<std::vector<Event>> event_blocks;
              -};
              -
              -
              -

              As mentioned above, there is no need to record the timeline when disabling the profiler. So there is a global state to enable or disable the profiler.

              -
              enum ProfilerState {
              -  kDisabled, 
              -  kCPU,
              -  kCUDA
              -};
              -ProfilerState g_state;
              -
              -
              -
                -
              • kDisabled: the disabled state.
              • -
              • kCPU: CPU profiling state.
              • -
              • kCUDA: GPU profiling state.
              • -
              -

              A pair of starting and ending events are pushed to event lists in constructor and destructor of RecordEvent. So the timeline is recorded for the code in the lifecycle of an object of RecordEvent.

              -
              struct RecordEvent {
              -  explicit RecordEvent(const std::string name,
              -                       platform::DeviceContext* dev_ctx = nullptr) {
              -    if (kState == ProfilerState::kDisabled) return;
              -    // push the starting event to the event lists.
              -  }
              -  ~RecordEvent() {
              -    if (kState == ProfilerState::kDisabled) return;
              -    // push the ending event to the event lists.
              -  }
              -};
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/program.html b/develop/doc/design/program.html deleted file mode 100644 index 932ee717b11e0bf9df90549fb402df48dce1799e..0000000000000000000000000000000000000000 --- a/develop/doc/design/program.html +++ /dev/null @@ -1,378 +0,0 @@ - - - - - - - - - - - - - Design Doc: PaddlePaddle Programs — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: PaddlePaddle Programs

              -
              -

              Compile and Execution

              -

              A PaddlePaddle program consists of two parts – the first generates a ProgramDesc protobuf message that describes the program, and the second runs this message using a C++ class Executor.

              -

              A simple example PaddlePaddle program can be found in graph.md:

              -
              x = layer.data("images")
              -l = layer.data("label")
              -y = layer.fc(x)
              -cost = layer.mse(y, l)
              -optimize(cost)
              -train(cost, reader=mnist.train())
              -
              -
              -

              The first five lines of the following PaddlePaddle program generates, or, compiles, the ProgramDesc message. The last line runs it.

              -
              -
              -

              Programs and Blocks

              -

              The basic structure of a PaddlePaddle program is some nested blocks, as a C++ or Java program.

              -
                -
              • program: some nested blocks
              • -
              • block:
                  -
                • some local variable definitions, and
                • -
                • a sequence of operators
                • -
                -
              • -
              -

              The concept of block comes from usual programs. For example, the following C++ program has three blocks:

              -
              int main() { // block 0
              -  int i = 0;
              -  if (i < 10) { // block 1
              -    for (int j = 0; j < 10; j++) { // block 2
              -    }
              -  }
              -  return 0;
              -}
              -
              -
              -

              The following PaddlePaddle program has three blocks:

              -
              import paddle as pd  // block 0
              -
              -x = minibatch([10, 20, 30]) # shape=[None, 1]
              -y = var(1) # shape=[1], value=1
              -z = minibatch([10, 20, 30]) # shape=[None, 1]
              -cond = larger_than(x, 15) # [false, true, true]
              -
              -ie = pd.ifelse()
              -with ie.true_block():  // block 1
              -    d = pd.layer.add_scalar(x, y)
              -    ie.output(d, pd.layer.softmax(d))
              -with ie.false_block():  // block 2
              -    d = pd.layer.fc(z)
              -    ie.output(d, d+1)
              -o1, o2 = ie(cond)
              -
              -
              -
              -
              -

              BlockDesc and ProgramDesc

              -

              All protobuf messages are defined in framework.proto.

              -

              BlockDesc is straight-forward – it includes local variable definitions, vars, and a sequence of operators, ops.

              -
              message BlockDesc {
              -  required int32 parent = 1;
              -  repeated VarDesc vars = 2;
              -  repeated OpDesc ops = 3;
              -}
              -
              -
              -

              The parent ID indicates the parent block so that operators in a block can refer to variables defined locally and also those defined in their ancestor blocks.

              -

              All hierarchical blocks in a program are flattened and stored in an array. The block ID is the index of the block in this array.

              -
              message ProgramDesc {
              -  repeated BlockDesc blocks = 1;
              -}
              -
              -
              -
              -

              Global Block

              -

              The global block is the first one in the above array.

              -
              -
              -
              -

              Operators that Use Blocks

              -

              In the above example, the operator IfElseOp has two blocks – the true branch and the false branch.

              -

              The definition of OpDesc shows that an operator could have some attributes:

              -
              message OpDesc {
              -  AttrDesc attrs = 1;
              -  ...
              -}
              -
              -
              -

              and an attribute could be of type block, which is, in fact, a block ID as described above:

              -
              message AttrDesc {
              -  required string name = 1;
              -
              -  enum AttrType {
              -    INT = 1,
              -    STRING = 2,
              -    ...
              -    BLOCK = ...
              -  }
              -  required AttrType type = 2;
              -
              -  optional int32 block = 10; // when type == BLOCK
              -  ...
              -}
              -
              -
              -
              -
              -

              InferShape

              -

              With this design, the InferShape function should take the following parameters:

              -
              void InferShape(int current_block,
              -                int current_operator,
              -                ProgramDesc* program // might change VarDesc values.
              -                ) {
              -  ...
              -}
              -
              -
              -

              where

              -
                -
              • current_block indices into ProgramDesc::blocks,
              • -
              • current_operator indices into BlockDesc::ops.
              • -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/prune.html b/develop/doc/design/prune.html deleted file mode 100644 index 01e876d86d8e71cb3a9422b056ad2bbe23245b4e..0000000000000000000000000000000000000000 --- a/develop/doc/design/prune.html +++ /dev/null @@ -1,312 +0,0 @@ - - - - - - - - - - - - - Prune — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Prune

              -
              -

              Motivation

              -

              We want to support running inference, training and checkpointing in one ProgramDesc. We implement -void Prune(const ProgramDesc* input, ProgramDesc* output) function, which takes a ProgramDesc -and generate a pruned ProgramDesc.

              -
              -
              -

              Challenge

              -

              Pruning need to support both variables and operators being evaluation targets. Consider the following -different situations.

              -
              # Case 1: run foward pass.
              -cost_np = session.run(target=cost)
              -# Case 2: run backward passing.
              -opts_np, _ = session.run(target=[cost, opt])
              -# Case 3: run checkpointing
              -_ = session.run(target=checkpoint)
              -
              -
              -
              -
              -

              Solution

              -

              To support evaluation of operators, we add is_target field in the OpDesc.

              -
              message OpDesc {
              -  required string type = 3;
              -  repeated Var inputs = 1;
              -  repeated Var outputs = 2;
              -  repeated Attr attrs = 4;
              -  optional bool is_target = 5 [ default = false ];
              -};
              -
              -
              -

              To support evaluation of variables, we add fetch_op. -For each variable in the target, we insert a fetch_op into the ProgramDesc with variable being -fetch_op‘s input. Then we also set fetch_op is a target.

              -
              -

              Algorithm

              -

              If an operator needs to be run, it must fall into one of the following cases:

              -
                -
              1. It is the target.
              2. -
              3. It is depended by some other ops, meaning its output is some other op’s input.
              4. -
              -

              The first case can be checked by op_desc.is_traget() . The second case can be implement as

              -
              bool HasDependentVar(const OpDesc& op_desc, const std::set<string>& dependent_vars) {
              -  for (auto& var : op_desc.outputs()) {
              -    for (auto& argu : var.arguments()) {
              -      if (dependent_vars.count(argu) != 0) {
              -        return true;
              -      }
              -    }
              -  }
              -  return false;
              -}
              -
              -
              -

              Then the whole algorithm can be implemented as the following code.

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/python_api.html b/develop/doc/design/python_api.html deleted file mode 100644 index 5a364d04b9e946f6264c4c3d3c9f25caf45f1750..0000000000000000000000000000000000000000 --- a/develop/doc/design/python_api.html +++ /dev/null @@ -1,526 +0,0 @@ - - - - - - - - - - - - - Design Doc: Python API — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Python API

              -

              Due to the refactorization of the PaddlePaddle core, we need Python classes to construct corresponding protobuf messages that describe a DL program.

              -

              | Python classes | Protobuf messages | -| — | — | -| Program | ProgramDesc | -| Block | BlockDesc | -| Operator | OpDesc | -| Variable | VarDesc |

              -

              Please be aware that these Python classes need to maintain some construction-time information, which are not part of the protobuf messages.

              -
              -

              Core Concepts

              -
              -

              Program

              -

              A ProgramDesc describes a DL program, which is composed of an array of BlockDescs. The BlockDescs in a ProgramDesc can have a tree-like hierarchical structure. However, the ProgramDesc onlys stores a flattened array of BlockDescs. A BlockDesc refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator need to be able to access variables in its ancestor blocks.

              -

              Whenever we create a block, we need to set its parent block to the current block, hence the Python class Program needs to maintain a data member current_block.

              -
              class Program(objects):
              -    def __init__(self):
              -        self.desc = core.NewProgram() # a C++ ProgramDesc pointer.
              -        self.blocks = vector<Block>()
              -        self.blocks.append(Block(self, -1)) # the global block
              -        self.current_block = 0          # initialized to the global block
              -
              -    def global_block():
              -        return self.blocks[0]
              -
              -    def current_block():
              -        return self.get_block(self.current_block)
              -
              -    def rollback():
              -        self.current_block = self.current_block().parent_idx
              -
              -    def create_block():
              -        new_block_idx = len(self.block)
              -        self.blocks.append(Block(self, self.current_block))
              -        self.current_block = new_block_idx
              -        return current_block()
              -
              -
              -

              Program is an accessor to the protobuf message ProgramDesc, which is created in C++ space, because the InferShape function is in C++, which manipulates VarDesc messages, which are in turn members of BlockDesc, which is a member of ProgramDesc.

              -

              Program creates the first block as the global block in its constructor. All parameters and their initializer operators are in the global block.

              -
              -
              -

              Block

              -

              A Block includes

              -
                -
              1. a map from variable names to an instance of the Python Variable class, and
              2. -
              3. a list of Operator instances.
              4. -
              -
              class Block(objects):
              -    def __init__(self, program, parent_idx):
              -        self.desc = core.NewBlock(program.desc)
              -        self.program = program
              -        self.vars = map<string, Variable>()
              -        self.ops = vector<Operator>()
              -        self.parent_idx = parent_idx
              -
              -    def create_var(self, ...):
              -        return Variable(self, ...)
              -
              -    def _create_global_var(self, ...):
              -        program.global_block().create_var(...)
              -
              -    def create_parameter(self, name, ...):
              -        # Parameter is a subclass of variable. See Parameter section for details.
              -        self.vars[name] = Parameter(self._create_global_var(...), ...)
              -        return self.vars[name]
              -
              -    def append_operator(self, ...):
              -        self.ops.append(Operator(self, ...))
              -
              -    def prepend_operator(self, ...): # Parameter's ctor prepands initialize operators.
              -       self.ops.prepend(Operator(self, ...))
              -
              -
              -

              create_parameter is necessary because parameters are global variables, defined in the global block, but can be created in some sub-blocks. For example, an FC layer in the step block of an RNN operator.

              -

              prepend_operator is necessary because the constructor of Parameter needs to create the initialize (or load) operator of the parameter, and would like to put it in the preamble of the global block.

              -
              -
              -

              Operator

              -

              The Operator class fills in the OpDesc message and calls the C++ function InferShape to infer the output shapes from the input shapes.

              -
              class Operator(object):
              -    def __init__(self,
              -                 block,  # Block
              -                 type,   # string
              -                 inputs, # dict<string, Variable>
              -                 outputs,# dict<stirng, Variable>
              -                 attrs   # dict<string, Any>
              -                 ):
              -        self.desc = core.NewOpDesc(block.desc, type, inputs, outputs, attrs)
              -        core.infer_shape(self.desc, inputs, outputs)
              -
              -    def type(self):
              -        return self.desc.type()
              -
              -
              -

              Operator creates the OpDesc message in C++ space, so that it can call the InferShape function, which is in C++.

              -
              -
              -

              Variable

              -

              Operators take Variables as its inputs and outputs.

              -
              class Variable(object):
              -    def __init__(self,
              -                 block=None,      # Block
              -                 name=None,       # string
              -                 shape,           # tuple
              -                 dtype="float32", # string
              -                 lod_level=None   # int
              -                 ):
              -        if name is None:
              -            name = unique_name_generator()
              -        self.name = name
              -        self.block = block
              -        self.desc = core.NewVarDesc(block.desc, name, shape, lod_level)
              -        self.writer = None
              -
              -
              -

              Please be aware of self.writer, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each write to a variable is represented by a Variable class. This is guaranteed by the fact that core.NewVarDesc must NOT create a new VarDesc message if its name already exists in the specified block.

              -
              -
              -

              Parameter

              -

              A parameter is a global variable with an initializer (or load) operator.

              -
              class Parameter(Variable):
              -    def __init__(self,
              -                 block=None,      # Block
              -                 name=None,       # string
              -                 shape,           # tuple
              -                 dtype="float32", # string
              -                 lod_level=None   # int
              -                 trainable,       # bool
              -                 initialize_op_attrs,
              -                 optimize_op_attrs):
              -        super(Parameter, self).__init__(block, name, shape, dtype, lod_level)
              -        self.trainable = trainable
              -        self.optimize_op_attrs = optimize_op_attrs
              -        block.prepend(Operator(block,  # Block
              -                               initialize_op_attrs['type'],   # string
              -                               None,   # no inputs
              -                               self,   # output is the parameter
              -                               initialize_op_attrs)
              -
              -
              -

              When users create a parameter, they can call

              -
              program.create_parameter(
              -  ...,
              -  init_attr={
              -    type: "uniform_random",
              -    min: -1.0,
              -    max: 1.0,
              -  })
              -)
              -
              -
              -

              In above example, init_attr.type names an initialize operator. It can also name the load operator

              -
              init_attr={
              - type: "load",
              - filename: "something.numpy",
              -}
              -
              -
              -

              optimize_op_attrs is not in the VarDesc message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator’s OpDesc, and will be in the OpDesc message.

              -
              -
              -
              -

              Layer Function

              -

              A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.

              -

              Layer functions take Variable and configuration parameters as its input and return the output variable(s).

              -

              For example, FullyConnected take one or more variable as its input. The input could be input data or another layer’s output. There are many configuration options for a FullyConnected layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The FullyConnected layer will return an output variable.

              -
              -

              Necessity for reusing code between layer functions

              -

              There are a lot of code that can be reused. Such as

              -
                -
              • Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with min = -1.0, max = 1.0. and default initialize strategy for bias is to fill zero.
              • -
              • Append the activation operator.
              • -
              • Create a temporary variable.
              • -
              • Create parameter.
              • -
              • Generate a unique name.
              • -
              • Add a bias.
              • -
              • ...
              • -
              -

              A mechanism to reuse code between layer functions is necessary. It will be around 150 lines of code if we write a FullyConnected layer without any helper functions.

              -
              -
              -

              Comparision between global functions and helper class

              -

              The FullyConnected layer will be as follow when we provide global functions:

              -
              def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None):
              -  if name is None:
              -    name = unique_name("fc")
              -  input = multiple_input(input)
              -  param_attr = default_param_attr(param_attr)
              -  param_attr = multiple_param_attr(param_attr, len(input))
              -
              -  # mul
              -  mul_results = []
              -  for ipt, attr in zip(input, param_attr):
              -    shape = ipt.shape[1:] + [size]
              -    w = g_program.global_block().create_parameter(shape, ipt.dtype, name, attr)
              -    tmp = create_tmp_var(name)
              -    g_program.current_block().append_op("mul", {ipt, w}, {tmp})
              -  mul_results.append(tmp)
              -
              -  # add sum
              -  ...
              -  # add bias
              -  ...
              -  # add activation
              -  ...
              -  return out
              -
              -
              -

              We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions:

              -
                -
              1. We need a namespace for these methods, then layer developers can quickly figure out what method they can use.
              2. -
              3. Global functions will force layer developers to pass its parameter time by time.
              4. -
              -

              So we provide a helper class, LayerHelper, to share code between layer functions. The FullyConnected Layer will be as follow.

              -
              def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None):
              -  helper = LayerHelper(locals())  # pass all parameter to LayerHelper
              -
              -  mul_results = []
              -  for ipt, param in helper.iter_multiple_input_and_param():
              -    w = helper.create_parameter(shape=ipt.shape[1:] + [size], dtype = ipt.dtype)
              -    tmp = helper.create_tmp_variable()
              -    helper.append_op('mul', {ipt, w}, {tmp})
              -    mul_results.append(tmp)
              -
              -  pre_bias = helper.add_sum(mul_results)
              -  pre_activation = helper.add_bias(pre_bias)
              -  return helper.add_activation(pre_activation)
              -
              -
              -

              We not only use the fewer lines of code to write fc_layer but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing helper. in a python editor.

              -
              -
              -

              Implementation of layer helper

              -

              We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The activation is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of add_activation are:

              -
              class LayerHelper(object):
              -  def __init__(self, **kwargs):  # kwargs is short for `keyword arguments`
              -    self.kwargs = kwargs
              -
              -  def add_activation(self, input_var):
              -    act = self.kwargs.get("act", None)  # default value is None
              -    if act is None:  # do nothing if no act
              -      return input_var
              -
              -    tmp = self.create_tmp_var(self)
              -    self.append_op(type=act, input=input_var, output=tmp)
              -    return tmp
              -
              -
              -
              -
              -

              Return value of layer functions

              -

              The layer will return a Variable, which is also the output of an operator. However, outputs of a layer function have more attributes than an operator. There are parameter variables, and their gradient variables need to return. To return them is useful. For example,

              -
                -
              1. Users can debug the network by printing parameter gradients.
              2. -
              3. Users can append attributes to a parameter, such as, param.stop_gradient=True will make a parameter stop generate the gradient. We can fix the parameter value during training by using this attribute.
              4. -
              -

              However, it is good to return a Variable for layers, since all layers and operators use Variables as their parameters. We can just append a param field and a grad field for layer function since the Python is dynamic typing.

              -

              The sample usage is

              -
              data = fluid.layers.data(...)
              -hidden = fluid.layers.fc(data, ...)
              -...
              -
              -executor.run(fetch_list=[hidden.param, hidden.param.grad], ...)
              -
              -
              -
              -
              -
              -

              Optimizer

              -

              Optimizer Design Doc

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/reader/README.html b/develop/doc/design/reader/README.html deleted file mode 100644 index d97481c00d8a244d6ad0dad2521ab49c2da74800..0000000000000000000000000000000000000000 --- a/develop/doc/design/reader/README.html +++ /dev/null @@ -1,438 +0,0 @@ - - - - - - - - - - - - - Python Data Reader Design Doc — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Python Data Reader Design Doc

              -

              During the training and testing phases, PaddlePaddle programs need to read data. To help the users write code that performs reading input data, we define the following:

              -
                -
              • A reader: A function that reads data (from file, network, random number generator, etc) and yields the data items.
              • -
              • A reader creator: A function that returns a reader function.
              • -
              • A reader decorator: A function, which takes in one or more readers, and returns a reader.
              • -
              • A batch reader: A function that reads data (from reader, file, network, random number generator, etc) and yields a batch of data items.
              • -
              -

              and also provide a function which can convert a reader to a batch reader, frequently used reader creators and reader decorators.

              -
              -

              Data Reader Interface

              -

              Data reader doesn’t have to be a function that reads and yields data items. It can just be any function without any parameters that creates an iterable (anything can be used in for x in iterable) as follows:

              -
              iterable = data_reader()
              -
              -
              -

              The item produced from the iterable should be a single entry of data and not a mini batch. The entry of data could be a single item or a tuple of items. Item should be of one of the supported types (e.g., numpy 1d array of float32, int, list of int etc.)

              -

              An example implementation for single item data reader creator is as follows:

              -
              def reader_creator_random_image(width, height):
              -    def reader():
              -        while True:
              -            yield numpy.random.uniform(-1, 1, size=width*height)
              -    return reader
              -
              -
              -

              An example implementation for multiple item data reader creator is as follows:

              -
              def reader_creator_random_image_and_label(width, height, label):
              -    def reader():
              -        while True:
              -            yield numpy.random.uniform(-1, 1, size=width*height), label
              -    return reader
              -
              -
              -
              -
              -

              Batch Reader Interface

              -

              Batch reader can be any function without any parameters that creates an iterable (anything can be used in for x in iterable). The output of the iterable should be a batch (list) of data items. Each item inside the list should be a tuple.

              -

              Here are some valid outputs:

              -
              # a mini batch of three data items. Each data item consist three columns of data, each of which is 1.
              -[(1, 1, 1),
              -(2, 2, 2),
              -(3, 3, 3)]
              -
              -# a mini batch of three data items, each data item is a list (single column).
              -[([1,1,1],),
              -([2,2,2],),
              -([3,3,3],)]
              -
              -
              -

              Please note that each item inside the list must be a tuple, below is an invalid output:

              -
               # wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],).
              - # Otherwise it is ambiguous whether [1,1,1] means a single column of data [1, 1, 1],
              - # or three columns of data, each of which is 1.
              -[[1,1,1],
              -[2,2,2],
              -[3,3,3]]
              -
              -
              -

              It is easy to convert from a reader to a batch reader:

              -
              mnist_train = paddle.dataset.mnist.train()
              -mnist_train_batch_reader = paddle.batch(mnist_train, 128)
              -
              -
              -

              It is also straight forward to create a custom batch reader:

              -
              def custom_batch_reader():
              -    while True:
              -        batch = []
              -        for i in xrange(128):
              -            batch.append((numpy.random.uniform(-1, 1, 28*28),)) # note that it's a tuple being appended.
              -        yield batch
              -
              -mnist_random_image_batch_reader = custom_batch_reader
              -
              -
              -
              -
              -

              Usage

              -

              Following is how we can use the reader with PaddlePaddle: -The batch reader, a mapping from item(s) to data layer, the batch size and the number of total passes will be passed into paddle.train as follows:

              -
              # two data layer is created:
              -image_layer = paddle.layer.data("image", ...)
              -label_layer = paddle.layer.data("label", ...)
              -
              -# ...
              -batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128)
              -paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...)
              -
              -
              -
              -
              -

              Data Reader Decorator

              -

              The Data reader decorator takes in a single reader or multiple data readers and returns a new data reader. It is similar to a python decorator, but it does not use @ in the syntax.

              -

              Since we have a strict interface for data readers (no parameters and return a single data item), a data reader can be used in a flexible way using data reader decorators. Following are a few examples:

              -
              -

              Prefetch Data

              -

              Since reading data may take some time and training can not proceed without data, it is generally a good idea to prefetch the data.

              -

              Use paddle.reader.buffered to prefetch data:

              -
              buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100)
              -
              -
              -

              buffered_reader will try to buffer (prefetch) 100 data entries.

              -
              -
              -

              Compose Multiple Data Readers

              -

              For example, if we want to use a source of real images (say reusing mnist dataset), and a source of random images as input for Generative Adversarial Networks.

              -

              We can do the following :

              -
              def reader_creator_random_image(width, height):
              -    def reader():
              -        while True:
              -            yield numpy.random.uniform(-1, 1, size=width*height)
              -    return reader
              -
              -def reader_creator_bool(t):
              -    def reader:
              -        while True:
              -            yield t
              -    return reader
              -
              -true_reader = reader_creator_bool(True)
              -false_reader = reader_creator_bool(False)
              -
              -reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader)
              -# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry.
              -# And we don't care about the second item at this time.
              -paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...)
              -
              -
              -
              -
              -

              Shuffle

              -

              Given the shuffle buffer size n, paddle.reader.shuffle returns a data reader that buffers n data entries and shuffles them before a data entry is read.

              -

              Example:

              -
              reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512)
              -
              -
              -
              -
              -
              -

              Q & A

              -
              -

              Why does a reader return only a single entry, and not a mini batch?

              -

              Returning a single entry makes reusing existing data readers much easier (for example, if an existing reader returns 3 entries instead if a single entry, the training code will be more complicated because it need to handle cases like a batch size 2).

              -

              We provide a function: paddle.batch to turn (a single entry) reader into a batch reader.

              -
              -
              -

              Why do we need a batch reader, isn’t is sufficient to give the reader and batch_size as arguments during training ?

              -

              In most of the cases, it would be sufficient to give the reader and batch_size as arguments to the train method. However sometimes the user wants to customize the order of data entries inside a mini batch, or even change the batch size dynamically. For these cases using a batch reader is very efficient and helpful.

              -
              -
              -

              Why use a dictionary instead of a list to provide mapping?

              -

              Using a dictionary ({"image":0, "label":1}) instead of a list (["image", "label"]) gives the advantage that the user can easily reuse the items (e.g., using {"image_a":0, "image_b":0, "label":1}) or even skip an item (e.g., using {"image_a":0, "label":2}).

              -
              -
              -

              How to create a custom data reader creator ?

              -
              def image_reader_creator(image_path, label_path, n):
              -    def reader():
              -        f = open(image_path)
              -        l = open(label_path)
              -        images = numpy.fromfile(
              -            f, 'ubyte', count=n * 28 * 28).reshape((n, 28 * 28)).astype('float32')
              -        images = images / 255.0 * 2.0 - 1.0
              -        labels = numpy.fromfile(l, 'ubyte', count=n).astype("int")
              -        for i in xrange(n):
              -            yield images[i, :], labels[i] # a single entry of data is created each time
              -        f.close()
              -        l.close()
              -    return reader
              -
              -# images_reader_creator creates a reader
              -reader = image_reader_creator("/path/to/image_file", "/path/to/label_file", 1024)
              -paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...)
              -
              -
              -
              -
              -

              How is paddle.train implemented

              -

              An example implementation of paddle.train is:

              -
              def train(batch_reader, mapping, batch_size, total_pass):
              -    for pass_idx in range(total_pass):
              -        for mini_batch in batch_reader(): # this loop will never end in online learning.
              -            do_forward_backward(mini_batch, mapping)
              -
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/refactorization.html b/develop/doc/design/refactorization.html deleted file mode 100644 index e2d59e486e154d08876adf1591d4070ee7c4bfbf..0000000000000000000000000000000000000000 --- a/develop/doc/design/refactorization.html +++ /dev/null @@ -1,583 +0,0 @@ - - - - - - - - - - - - - Design Doc: Refactorization Overview — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Refactorization Overview

              -

              The goals of refactoring include:

              -
                -
              1. Making it easy for external contributors to write new elementary computation operations.
              2. -
              3. Making the codebase clean and readable.
              4. -
              5. Designing a new computation representation – a computation graph of operators and variables.
              6. -
              7. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs.
              8. -
              -
              -

              Computation Graphs

              -
                -
              1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs.
              2. -
              3. Please refer to computation graphs for a concrete example.
              4. -
              5. Users write Python programs to describe the graphs and run them (locally or remotely).
              6. -
              7. A graph is composed of variables and operators.
              8. -
              9. The description of graphs must be serializable/deserializable, so that:
                  -
                1. It can be sent to the cloud for distributed execution, and
                2. -
                3. It can be sent to clients for mobile or enterprise deployment.
                4. -
                -
              10. -
              11. The Python program does two things
                  -
                1. Compilation runs a Python program to generate a protobuf message representation of the graph and send it to
                    -
                  1. the C++ library libpaddle.so for local execution,
                  2. -
                  3. the master process of a distributed training job for training, or
                  4. -
                  5. the server process of a Kubernetes serving job for distributed serving.
                  6. -
                  -
                2. -
                3. Execution executes the graph by constructing instances of class Variable and OperatorBase, according to the protobuf message.
                4. -
                -
              12. -
              -
              -
              -

              Description and Realization of Computation Graph

              -

              At compile time, the Python program generates a protobuf message representation of the graph, or a description of the graph.

              -

              At runtime, the C++ program realizes the graph and runs it.

              -

              | | Representation (protobuf messages) | Realization (C++ class objects) | -|—|—|—| -|Data|VarDesc|Variable| -|Operation|OpDesc|Operator| -|Block|BlockDesc|Block|

              -

              The word graph is interchangeable with block in this document. A graph consists of computation steps and local variables similar to a C++/Java program block, or a pair of parentheses({ and }).

              -
              -
              -

              Compilation and Execution

              -
                -
              1. Run a Python program to describe the graph. In particular, the Python application program does the following:
                  -
                1. Create VarDesc to represent local/intermediate variables,
                2. -
                3. Create operators and set attributes,
                4. -
                5. Validate attribute values,
                6. -
                7. Infer the type and the shape of variables,
                8. -
                9. Plan memory-reuse for variables,
                10. -
                11. Generate the backward graph
                12. -
                13. Add optimization operators to the computation graph.
                14. -
                15. Optionally, split the graph for distributed training.
                16. -
                -
              2. -
              3. The invocation of train or infer methods in the Python program does the following:
                  -
                1. Create a new Scope instance in the scope hierarchy for each run of a block,
                    -
                  1. realize local variables defined in the BlockDesc message in the new scope,
                  2. -
                  3. a scope is similar to the stack frame in programming languages,
                  4. -
                  -
                2. -
                3. Create an instance of class Block, in which,
                    -
                  1. realize operators in the BlockDesc message,
                  2. -
                  -
                4. -
                5. Run the Block by calling
                    -
                  1. Block::Eval(vector<Variable>* targets) for forward and backward computations, or
                  2. -
                  3. Block::Eval(vector<Operator>* targets) for optimization.
                  4. -
                  -
                6. -
                -
              4. -
              -
              -
              -

              Intermediate Representation (IR)

              -
              Compile Time -> IR -> Runtime
              -
              -
              -
              -

              Benefits of IR

              -
                -
              • Optimization

                -
                Compile Time -> IR -> Optimized IR -> Runtime
                -
                -
                -
              • -
              • Automatically send partitioned IR to different nodes.

                -
                  -
                • Automatic Data Parallelism

                  -
                  Compile Time
                  -|-> Single GPU IR
                  -    |-> [trainer-IR-0, trainer-IR-1, pserver-IR]
                  -        |-> Node-0 (runs trainer-IR-0)
                  -        |-> Node-1 (runs trainer-IR-1)
                  -        |-> Node-2 (runs pserver-IR)
                  -
                  -
                  -
                • -
                • Automatic Model Parallelism (planned for future)

                  -
                • -
                -
              • -
              -
              -
              -
              -
              -
              -

              Operator/OpWithKernel/OpKernel

              -

              class_diagram

              -
              -
              -
              -

              Operator

              -

              class_diagram

              -
                -
              • Operator is the fundamental building block of the user interface.
                  -
                • Operator stores input/output variable names and attributes.
                • -
                • The InferShape interface is used to infer the shape of the output variables based on the shapes of the input variables.
                • -
                • Use Run to compute the output variables from the input variables.
                • -
                -
              • -
              -
              -
              -
              -

              OpWithKernel/Kernel

              -

              class_diagram

              -
                -
              • OpWithKernel inherits Operator.
              • -
              • OpWithKernel contains a Kernel map.
                  -
                • OpWithKernel::Run get device’s kernel, and invoke OpKernel::Compute.
                • -
                • OpKernelKey is the map key. Only device place now, but may be data type later.
                • -
                -
              • -
              -
              -
              -
              -

              Why separate Kernel and Operator

              -
                -
              • Separate GPU and CPU code.
                  -
                • Make Paddle capable of running without GPU.
                • -
                -
              • -
              • Make one operator (which is a user interface) and create many implementations.
                  -
                • For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel.
                • -
                -
              • -
              -
              -
              -
              -

              Libraries for Kernel development

              -
                -
              • Eigen::Tensor contains basic math and element-wise functions.
                  -
                • Note that Eigen::Tensor has broadcast implementation.
                • -
                • Limit the number of tensor.device(dev) = in your code.
                • -
                -
              • -
              • thrust::transform and std::transform.
                  -
                • thrust has the same API as C++ standard library. Using transform, one can quickly implement customized element-wise kernels.
                • -
                • thrust, in addition, supports more complex APIs, like scan, reduce, reduce_by_key.
                • -
                -
              • -
              • Hand-writing GPUKernel and CPU code
                  -
                • Do not write in header (.h) files. CPU Kernel should be in cpp source (.cc) and GPU kernels should be in cuda (.cu) files. (GCC cannot compile GPU code.)
                • -
                -
              • -
              -
              -
              -
              -

              Operator Registration

              -
              -

              Why is registration necessary?

              -

              We need a method to build mappings between Op type names and Op classes.

              -
              -
              -

              How is registration implemented?

              -

              Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.

              -
              -
              -
              -
              -

              The Registry Map

              -
              -

              OpInfoMap

              -

              op_type(string) -> OpInfo

              -

              OpInfo:

              -
                -
              • creator: The Op constructor.
              • -
              • grad_op_type: The type of the gradient Op.
              • -
              • proto: The Op’s Protobuf, including inputs, outputs and required attributes.
              • -
              • checker: Used to check attributes.
              • -
              -
              -
              -
              - -
              -
              -

              Registration Process

              -
                -
              1. Write an Op class and its gradient Op class, if required.
              2. -
              3. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.
              4. -
              5. Invoke the macro REGISTER_OP. This macro will
                  -
                1. Call maker class to complete proto and checker
                2. -
                3. Using the completed proto and checker, it will add a new key-value pair to the OpInfoMap
                4. -
                -
              6. -
              -
              -
              -
              -

              Backward Module (1/2)

              -
              -

              Create Backward Operator

              -
                -
              • Mapping from forward Op to backward Op -backward
              • -
              -
              -
              -
              -
              -

              Backward Module (2/2)

              -
              -

              Build Backward Network

              -
                -
              • Input: a graph of forward operators
              • -
              • Output: a graph of backward operators
              • -
              • Corner cases in construction
                  -
                • Shared Variables => insert an Add operator to combine gradients
                • -
                • No Gradient => insert a fill_zero_grad operator
                • -
                • Recursive NetOp => call Backward recursively
                • -
                • RNN Op => recursively call Backward on stepnet
                • -
                • RNN Op => recursively call Backward on stepnet
                • -
                -
              • -
              -
              -
              -
              -
              -

              Scope, Variable, Tensor

              -
                -
              • Tensor is an n-dimension array with type.
                  -
                • Only dims and data pointers are stored in Tensor.
                • -
                • All operations on Tensor are written in Operator or global functions.
                • -
                • Variable length Tensor design LoDTensor
                • -
                -
              • -
              • Variable instances are the inputs and the outputs of an operator, not just Tensor.
                  -
                • step_scopes in RNN is a variable and not a tensor.
                • -
                -
              • -
              • Scope is where variables are stored.
                  -
                • map<string var name, Variable>
                • -
                • Scope has a hierarchical structure. The local scope can get variables from its parent scope.
                • -
                -
              • -
              -
              -
              -
              -

              Block (in design)

              -
              -

              the difference between original RNNOp and Block

              -
                -
              • As an operator is more intuitive than RNNOp,
              • -
              • Offers a new interface Eval(targets) to deduce the minimal block to Run,
              • -
              • Fits the compile-time/ runtime separation design paradigm.
                  -
                • During the compilation, SymbolTable stores VarDescs and OpDescs and serialize to a BlockDesc
                • -
                • When graph executes, a Block with BlockDesc is passed. It then creates Op and Var instances and then invokes Run.
                • -
                -
              • -
              -
              -
              -
              -
              -

              Milestone

              -
                -
              • Take Paddle/books as the main line, the requirement of the models motivates framework refactoring,
              • -
              • Model migration
                  -
                • Framework development gives priority support to model migration, for example,
                    -
                  • the MNIST demo needs a Python interface,
                  • -
                  • the RNN models require the framework to support LoDTensor.
                  • -
                  -
                • -
                • Determine some timelines,
                • -
                • Frequently used Ops need to be migrated first,
                • -
                • Different models can be migrated in parallel.
                • -
                -
              • -
              • Improve the framework at the same time
              • -
              • Accept imperfection, concentrate on solving the specific problem at the right price.
              • -
              -
              -
              -
              -

              Control the migration quality

              -
                -
              • Compare the performance of migrated models with old ones.
              • -
              • Follow the google C++ style guide.
              • -
              • Build the automatic workflow of generating Python/C++ documentations.
                  -
                • The documentation of layers and ops should be written inside the code.
                • -
                • Take the documentation quality into account when submitting pull requests.
                • -
                • Preview the documentations, read and improve them from a user’s perspective.
                • -
                -
              • -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/register_grad_op.html b/develop/doc/design/register_grad_op.html deleted file mode 100644 index ef87753f2f0e4bcd1341a1091671b0afb1ae13fa..0000000000000000000000000000000000000000 --- a/develop/doc/design/register_grad_op.html +++ /dev/null @@ -1,328 +0,0 @@ - - - - - - - - - - - - - Design Doc: Gradient Operators Registration — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Gradient Operators Registration

              -
              -

              The Problem Posed

              -

              Currently, for each C++ operator class definition, a gradient operator creator function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.

              -

              However, we noticed two problems with the current design:

              -
                -
              1. As we decided to separate the compilation and the execution phases, we need to change the creator to take an OpDesc protobuf message in a ProgramDesc and inserts corresponding OpDesc messages into the ProgramDesc message.
              2. -
              3. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of minus operator consists of two operators – an identity operator followed by a scale operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.
              4. -
              -
              -
              -

              The Current Implementation

              -

              Instances of the C++ class OpInfo are stored an associative map whose key is the operator type. The grad_op_type indicates the associated gradient operator type. An operator can create the gradient operator by invoking OpInfo::creator_ of the gradient operator. The pseudo code is as follows

              -
              struct OpInfo {
              -  std::function<OperatorBase*(...)> creator_;
              -  std::string grad_op_type_;
              -  ...
              -};
              -
              -map<string, OpInfo> OpInfoMap;
              -
              -OperatorBase* CreateGradientOperator(const OperatorBase& op) {
              -  return OpInfoMap.at(op.Type()).creator_(...);
              -}
              -
              -
              -
              -
              -

              Proposed Solution

              -

              The mapping relationship between an operator and its gradient operators is a function. The interface of this function is:

              -
              // (OpDesc) --> vector<OpDesc>
              -std::function<std::vector<OpDescBind>(const OpDescBind&)>;
              -
              -
              -

              The function takes an OpDescBind of the forward operator and returns one or many gradient operator descriptions. OpDescBind is a C++ wrapper for the protobuf message OpDesc for rapid manipulation of OpDesc.

              -

              The GradOpDescMaker will be registered in OpInfo and will replace the grad_op_type_ field. The OpInfo should look like

              -
              struct OpInfo {
              -  std::function<std::vector<std::unique_ptr<OpDescBind>>(const OpDescBind&)>  grad_op_maker_;
              -  ...
              -};
              -
              -
              -

              The grad_op_maker_ is a nullptr if the operator does not have any associated gradient operators.

              -

              We propose a base class called GradOpDescMakerBase to let operator developers generate Gradient Operators easily. The public interface of that class is

              -
              class GradOpDescMakerBase {
              -public:
              -  GradOpDescMakerBase(const OpDescBind& );
              -  virtual std::vector<std::unique_ptr<OpDescBind>> operator()()const = 0;
              -};
              -
              -
              -

              We can convert GradOpDescMakerBase to std::function<std::vector<std::unique_ptr<OpDescBind>>(const OpDescBind&)> by

              -
              using GradOpMaker = ...;
              -std::function<std::vector<OpDescBind>(const OpDescBind&)> func;
              -func = [] (const OpDescBind& fwd_op) {
              -  GradOpMaker maker(fwd_op);
              -  return maker();
              -};
              -
              -
              -

              We can write many helper functions since the GradOpDescMakerBase is a class now. The basic helper functions get the variables of Input, Output, InputGradient and OutputGradient in the forwarding operator.

              -

              We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So REGISTER_OP just register one operator. If the REGISTER_OPERATOR contains OpProtoAndCheckerMaker and GradOpDescMaker, we just list them in the same macro. It can be done by a macro contains __VA_ARGS__.

              -

              The user interface should be

              -
              vector<OpDesc> MinusOpGradMaker(OpDesc) {...}
              -REGISTER_OPERATOR(minus, MinusOp, MinusOpProtoAndCheckerMaker, SumOpGradMaker);
              -// Developers can still manually implement gradient operator.
              -REGISTER_OPERATOR(minus_grad, MinusGradOp);
              -
              -
              -

              The interface of current REGISTER_OP macro could not be changed. In REGISTER_OP, it will invoke REGISTER_OPERATOR two times and generate GradOpDescMaker inside.

              -
              REGISTER_OP(minus, MinusOp, MinusOpProtoAndCheckerMaker, minus_grad, MinusGradOp);
              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/regularization.html b/develop/doc/design/regularization.html deleted file mode 100644 index ad59a4acd7bbacee1cad15fbd0061cf9044f261e..0000000000000000000000000000000000000000 --- a/develop/doc/design/regularization.html +++ /dev/null @@ -1,320 +0,0 @@ - - - - - - - - - - - - - Regularization in PaddlePaddle — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Regularization in PaddlePaddle

              -
              -

              Introduction to Regularization

              -

              A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. A frequently faced problem is the problem of overfitting, where the model does not make reliable predictions on new unseen data. Regularization is the process of introducing additional information in order to prevent overfitting. This is usually done by adding extra penalties to the loss function that restricts the parameter spaces that an optimization algorithm can explore.

              -
              -

              Parameter Norm Penalties

              -

              Most common regularization approaches in deep learning are based on limiting the capacity of the models by adding a parameter norm penalty to the objective function J. This is given as follows:

              -


              -

              The parameter alpha is a hyperparameter that weights the relative contribution of the norm penalty term, omega, relative to the standard objective function J.

              -

              The most commonly used norm penalties are the L2 norm penalty and the L1 norm penalty. These are given as follows:

              -
              -

              L2 Regularization:

              -


              -
              -
              -

              L1 Regularization

              -


              -

              A much more detailed mathematical background of regularization can be found here.

              -
              -
              -
              -
              -

              Regularization Survey

              -

              A detailed survey of regularization in various deep learning frameworks can be found here.

              -
              -
              -

              Proposal for Regularization in PaddlePaddle

              -
              -

              Low-Level implementation

              -

              In the new design, we propose to create new operations for regularization. For now, we can add 2 ops that correspond to the most frequently used regularizations:

              -
                -
              • L2_regularization_op
              • -
              • L1_regularization_op
              • -
              -

              These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for Activation Ops. This abstraction pattern can make it very easy to implement new regularization schemes other than L1 and L2 norm penalties.

              -

              The idea of building ops for regularization is in sync with the refactored Paddle philosophy of using operators to represent any computation unit. The way these ops will be added to the computation graph, will be decided by the layer functions in Python API.

              -
              -
              -

              Computation Graph

              -

              Below is an example of a really simple feed forward neural network.

              -


              -

              The Python API will modify this computation graph to add regularization operators. The modified computation graph will look as follows:

              -


              -
              -
              -

              Python API implementation for Regularization

              -

              Using the low level ops, L2_regularization_op and L1_regularization_op, any user can add regularization to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support regularization. An example of such an API can be seen in Keras. As per the PaddlePaddle Python API design, the layer functions are responsible for creating operators, operator parameters and variables. Since regularization is a property of parameters, it makes sense to create these in the layer functions.

              -
              -

              Creation of Regularization ops

              -

              There are two possibilities for creating the regularization ops:

              -
                -
              1. We create these ops immediately while building the computation graph.
              2. -
              3. We add these ops in a lazy manner, just before the backward, similar to the way the optimization ops are added.
              4. -
              -

              The proposal is to add these ops in a lazy manner just before the backward pass.

              -
              -
              -

              Storage of Regularization attributes

              -

              Since we want to create the regularization ops in a lazy manner, the regularization attributes (type of regularization and weight of regularization penalty) can be stored as attributes of the Parameter class. This is because regularization is a property of the parameters and storing regularization properties with Parameters also allows for shared parameters.

              -
              -
              -

              High-level API

              -

              In PaddlePaddle Python API, users will primarily rely on layer functions to create neural network layers. Hence, we also need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in Keras and also by looking at Tensorflow in tf.contrib.layers.

              -
              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/releasing_process.html b/develop/doc/design/releasing_process.html deleted file mode 100644 index 285714af17a3d6b4084924a7ffde61e434b53fda..0000000000000000000000000000000000000000 --- a/develop/doc/design/releasing_process.html +++ /dev/null @@ -1,363 +0,0 @@ - - - - - - - - - - - - - PaddlePaddle发行规范 — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              PaddlePaddle发行规范

              -

              PaddlePaddle使用git-flow branching model做分支管理,使用Semantic Versioning标准表示PaddlePaddle版本号。

              -

              PaddlePaddle每次发新的版本,遵循以下流程:

              -
                -
              1. develop分支派生出新的分支,分支名为release/版本号。例如,release/0.10.0
              2. -
              3. 将新分支的版本打上tag,tag为版本号rc.Patch号。第一个tag为0.10.0rc1,第二个为0.10.0rc2,依次类推。
              4. -
              5. 对这个版本的提交,做如下几个操作:
              6. -
              -
                -
              • 使用Regression Test List作为检查列表,测试本次release的正确性。

                -
                  -
                • 如果失败,记录下所有失败的例子,在这个release/版本号分支中,修复所有bug后,Patch号加一,到第二步

                  -
                • -
                • 修改python/setup.py.in中的版本信息,并将istaged字段设为True

                  -
                • -
                • 编译这个版本的python wheel包,并发布到pypi。

                  -
                    -
                  • 由于pypi.python.org目前遵循严格的命名规范PEP 513,在使用twine上传之前,需要重命名wheel包中platform相关的后缀,比如将linux_x86_64修改成manylinux1_x86_64

                    -
                  • -
                  • pypi上的package名称为paddlepaddle和paddlepaddle_gpu,如果要上传GPU版本的包,需要修改build/python/setup.py中,name: “paddlepaddle_gpu”并重新打包wheel包:python setup.py bdist_wheel

                    -
                  • -
                  • 上传方法:

                    -
                    cd build/python
                    -pip install twine
                    -twine upload dist/[package to upload]
                    -
                    -
                    -
                  • -
                  • 编译这个版本的Docker发行镜像,发布到dockerhub。如果失败,修复Docker编译镜像问题,Patch号加一,返回第二步

                    -
                  • -
                  -
                • -
                -
              • -
              -
                -
              1. 第三步完成后,将release/版本号分支合入master分支,并删除release/版本号分支。将master分支的合入commit打上tag,tag为版本号。同时再将master分支合入develop分支。最后删除release/版本号分支。
              2. -
              3. 协同完成Release Note的书写
              4. -
              -

              需要注意的是:

              -
                -
              • release/版本号分支一旦建立,一般不允许再从develop分支合入release/版本号。这样保证release/版本号分支功能的封闭,方便测试人员测试PaddlePaddle的行为。
              • -
              • release/版本号分支存在的时候,如果有bugfix的行为,需要将bugfix的分支同时merge到master, developrelease/版本号这三个分支。
              • -
              -
              -

              发布wheel包到pypi

              -

              使用PaddlePaddle CI -完成自动化二进制编译,参考下图,选择需要发布的版本(通常包含一个CPU版本和一个GPU版本),点击”run”右侧的”...”按钮,可以 -弹出下面的选择框,在第二个tab (Changes)里选择需要发布的分支,这里选择0.11.0,然后点击”Run Build”按钮。等待编译完成后 -可以在此页面的”Artifacts”下拉框中找到生成的3个二进制文件,分别对应CAPI,cp27mcp27mu的版本。然后按照上述的方法 -使用twine工具上传即可。

              -

              -
                -
              • 注:CI环境使用 https://github.com/PaddlePaddle/buildtools 这里的DockerImage作为编译环境以支持更多的Linux -发型版,如果需要手动编译,也可以使用这些镜像。这些镜像也可以从 https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/ 下载得到。
              • -
              • pypi不支持覆盖上传,所以一个版本号的wheel包发布之后,不可以更改。下一个wheel包需要更新版本号才可以上传。
              • -
              -
              -
              -

              发布Docker镜像

              -

              上述PaddlePaddle CI编译wheel完成后会自动将Docker镜像push到DockerHub,所以,发布Docker镜像只需要对自动push的镜像打上 -版本号对应的tag即可:

              -
                -
              1. 进入 https://hub.docker.com/r/paddlepaddle/paddle/tags/ 查看latest tag的更新时间是否在上述编译wheel包完成后是否最新。
              2. -
              3. 执行 docker pull paddlepaddle/paddle:[latest tag],latest tag可以是latest或latest-gpu等。
              4. -
              5. 执行 docker tag paddlepaddle/paddle:[latest tag] paddlepaddle/paddle:[version]
              6. -
              7. 执行 docker push paddlepaddle/paddle:[version]
              8. -
              -
              -
              -

              PaddlePaddle 分支规范

              -

              PaddlePaddle开发过程使用git-flow分支规范,并适应github的特性做了一些区别。

              -
                -
              • PaddlePaddle的主版本库遵循git-flow分支规范。其中:
                  -
                • master分支为稳定(stable branch)版本分支。每一个master分支的版本都是经过单元测试和回归测试的版本。
                • -
                • develop分支为开发(develop branch)版本分支。每一个develop分支的版本都经过单元测试,但并没有经过回归测试。
                • -
                • release/版本号分支为每一次Release时建立的临时分支。在这个阶段的代码正在经历回归测试。
                • -
                -
              • -
              • 其他用户的fork版本库并不需要严格遵守git-flow分支规范,但所有fork的版本库的所有分支都相当于特性分支。
                  -
                • 建议,开发者fork的版本库使用develop分支同步主版本库的develop分支
                • -
                • 建议,开发者fork的版本库中,再基于develop版本fork出自己的功能分支。
                • -
                • 当功能分支开发完毕后,向PaddlePaddle的主版本库提交Pull Reuqest,进而进行代码评审。
                    -
                  • 在评审过程中,开发者修改自己的代码,可以继续在自己的功能分支提交代码。
                  • -
                  -
                • -
                -
              • -
              • BugFix分支也是在开发者自己的fork版本库维护,与功能分支不同的是,BugFix分支需要分别给主版本库的masterdevelop与可能有的release/版本号分支,同时提起Pull Request
              • -
              -
              -
              -

              PaddlePaddle回归测试列表

              -

              本列表说明PaddlePaddle发版之前需要测试的功能点。

              -
              -

              PaddlePaddle Book中所有章节

              -

              PaddlePaddle每次发版本首先要保证PaddlePaddle Book中所有章节功能的正确性。功能的正确性包括验证PaddlePaddle目前的paddle_trainer训练和纯使用Python训练模型正确性。

              -

              | | 新手入门章节 | 识别数字 | 图像分类 | 词向量 | 情感分析 | 语意角色标注 | 机器翻译 | 个性化推荐 | -| — | — | — | — | — | — | — | — | — | -| API.V2 + Docker + GPU | | | | | | | | | -| API.V2 + Docker + CPU | | | | | | | | | -| paddle_trainer + Docker + GPU | | | | | | | | | -| paddle_trainer + Docker + CPU | | | | | | | | | -| API.V2 + Ubuntu + GPU | | | | | | | | | -| API.V2 + Ubuntu + CPU | | | | | | | | | -| paddle_trainer + Ubuntu + GPU | | | | | | | | | -| paddle_trainer + Ubuntu + CPU | | | | | | | | |

              -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/scope.html b/develop/doc/design/scope.html deleted file mode 100644 index 25ed677b2852dec27d25f5ba0ee4a49072c88c5c..0000000000000000000000000000000000000000 --- a/develop/doc/design/scope.html +++ /dev/null @@ -1,373 +0,0 @@ - - - - - - - - - - - - - Design of Scope in Paddle — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design of Scope in Paddle

              -
              -

              Overview

              -

              Scope is an important concept in programming languages, which defines a program region that a set of bindings between names and entities applies. In a specific scope, a valid name is uniquely associated with an entity, such as a variable. And in another scope, this name may refer to other entity or nothing at all. It clearly restricts the visibility and validity of names in a program. Hence Scope is introduced to PaddlePaddle to manage variables in context. But different from the original abstract concept, Scope now becomes an object with two important attributes:

              -
                -
              • Scope is an association of a name to variable.
              • -
              • Variables in a parent scope can be retrieved from local scope.
              • -
              -

              A detailed explanation of these two attributes goes as following.

              -
              -
              -

              Scope is an association of a name to variable.

              -

              Scope is an association of a name to variable. All variables belong to Scope. You need to specify a scope to run a Net, i.e., net.Run(&scope). One net can run in different scopes and update different variable in the scope.

              -
                -
              1. Scope only contains a map of a name to variable.

                -

                All parameters, data, states in a Net should be variables and stored inside a scope. Each op should get inputs and outputs to do computation from a scope, such as data buffer, state (momentum) etc.

                -
              2. -
              3. Variable can only be created by Scope and a variable can only be got from Scope. User cannot create or get a variable outside a scope. This is a constraints of our framework, and will keep our framework simple and clear.

                -
              4. -
              5. Scope only contains methods that are used to Create and Get Variables. Scope do not contain Operators and have no information to run them. -Net is designed to drive the computation and Scope only contains a map of variables. There is no computation logic inside a Scope. Scope just handles the lifetime management of variables.

                -
                  -
                • Create is used to create a Variable by its name and add the mapping relation.
                • -
                • Get is used to find a Variable by name.
                • -
                -
              6. -
              7. Every variable only belongs to one certain Scope.

                -

                Variable can not belong to many scopes. If you want to use variables from parent scope, you can use parent scope.

                -
              8. -
              9. Scope should destruct all Variables inside it when itself is destructed. User can never store Variable pointer somewhere else.

                -

                Because Variable can only be got from Scope. When destroying Scope, we also need to destroy all the Variables in it. If user store Variable pointer to private data member or some global variable, the pointer will be an invalid pointer when associated Scope is destroyed.

                -
              10. -
              -
              class Scope {
              - public:
              -  Variable* Var(const std::string& name);
              -  const Variable* FindVar(const std::string& name) const;
              -
              - private:
              -    std::unordered_map<std::string, std::unique_ptr<Variable>> vars_;
              -};
              -
              -
              -
              -
              -

              Parent scope and local scope

              -

              Just like scope in programming languages, Scope in the neural network can also be a local scope. There are two attributes about local scope.

              -
                -
              1. We can create local variables in a local scope. When that local scope is destroyed, all local variables should also be destroyed.
              2. -
              3. Variables in a parent scope can be retrieved from local scopes of that parent scope, i.e., when user get a variable from a scope, it will try to search this variable in current scope. If there is no such variable in the local scope, scope will keep searching from its parent, until the variable is found or there is no parent.
              4. -
              -
              class Scope {
              - public:
              -  Scope(const std::shared_ptr<Scope>& scope): parent_(scope) {}
              -
              -  Variable* FindVar(const std::string& name) const {
              -    auto it = vars_.find(name);
              -    if (it != vars_.end()) {
              -      return it->second.get();
              -    } else if (parent_ != nullptr) {
              -      return parent_->FindVar(name);
              -    } else {
              -      return nullptr;
              -    }
              -  }
              -
              - private:
              -  std::shared_ptr<Scope> parent_ {nullptr};
              -};
              -
              -
              -

              In Scope class, there is a private data member called parent_. parent_ is a smart pointer to its parent scope. When user Get a variable by its name, the name will be searched inside the current scope. If the variable cannot be found locally and parent scope is not a nullptr, the variable will be searched inside that parent scope. parent_ pointer’s default value is nullptr. It means that the scope is a global scope when parent_ is nullptr.

              -

              A local scope is very useful when we implement Recurrent Neural Network. Each timestep of an RNN should be a Net. Each Net of timestep (StepNet for short) should use an independent local scope. Just like variables in a while loop is inside a local scope in programming languages. By using a single StepNet and changing local scope, we can implement an RNN easily.

              -
              -
              -
              -

              Interface Design

              -
              class Variable {
              - private:
              -  Variable() = default;
              -  friend class Scope;
              -};
              -
              -class Scope {
              - private:
              -  Scope(const std::shared_ptr<Scope>& parent = nullptr);
              -
              - public:
              -  static std::shared_ptr<Scope> Create(const std::shared_ptr<Scope>& parent = nullptr);
              -
              -  // return nullptr if not found.
              -  Variable* FindVar(const std::string& name) const;
              -
              -  // return if already contains same name variable.
              -  Variable* Var(const std::string& name);
              -
              - private:
              -  std::shared_ptr<Scope> parent_;
              -  std::unordered_map<std::string, std::unique_ptr<Variable>> vars_;
              -};
              -
              -
              -
              -

              Only scope can create a variable

              -

              To ensure only scope can create a variable, we should mark Variable‘s constructor as a private member function, and Scope is a friend class of Variable. And then only Var can construct Variable.

              -
              -
              -

              When scope destroyed, all variables inside this scope should be destroyed together

              -

              The scope hold unique pointers for all variables. User can FindVar from scope, but he should not hold this pointer as a member variable. Because when scope is destroyed, all variables inside this scope will be destroyed together.

              -
              -
              -

              Sharing a parent scope

              -

              Local scope contains a parent_ pointer. It is a linked-list for scopes. Using a shared_ptr because when a local scope is using, its parents cannot be destroyed.

              -

              Also, as the parent scope is a shared_ptr, we can only Create() a scope shared pointer. We cannot construct a scope variable, because it cannot be passed to other scope as parent pointer.

              -
              -
              -

              Orthogonal interface

              -

              FindVar will return nullptr when name is not found. It can be used as Contains method. Var will return an Error when there is a name conflict locally. Combine FindVar and Var, we can implement Var easily.

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/selected_rows.html b/develop/doc/design/selected_rows.html deleted file mode 100644 index 82a5f897bb53277ac23aeee8c0174030026ac5c8..0000000000000000000000000000000000000000 --- a/develop/doc/design/selected_rows.html +++ /dev/null @@ -1,319 +0,0 @@ - - - - - - - - - - - - - Design Doc: Selected Rows — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Selected Rows

              -

              SelectedRows is a type of sparse tensor data type, which is designed to support embedding operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure:

              -
              class SelectedRows {
              - private:
              -  vector<int> rows_;
              -  Tensor value_;
              -  int height_;
              -};
              -
              -
              -

              The field height_ is the first dimension of SelectedRows. The rows are the indices of the non-zero rows of SelectedRows. The value_ field is an N-dim tensor of shape [rows.size() /* NUM_ROWS */, ...], which supplies values for each row. The dimension of SelectedRows satisfies [height_] + value_.shape[1:].

              -

              Suppose that a SelectedRows-typed variable x has many rows, but only two of them have values – row 73 is [1, 2] and row 84 is [3, 4], the SelectedRows representation would be:

              -
              x = SelectedRow {
              -  rows = [73, 84],
              -  value = [[1, 2], [3,4]]
              -}
              -
              -
              -
              -

              SelectedRows in Protobuf

              -

              SelectedRows is a type of Variable. VarDesc in protobuf should describe the SelectedRows information. Only the tensor dimension of a SelectedRows will be described in compile-time because the rows_ and value_ are dependent on the training data. -So we use TensorDesc to unify data_type and dims. A LodTensorDesc contains a TensorDesc and lod_level. The description of SelectedRows is a Tensor description.

              -
              message TensorDesc {
              -  required DataType data_type = 1;
              -  repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
              -}
              -
              -message LodTensorDesc {
              -  required TensorDesc tensor = 1;
              -  optional int lod_level = 2;
              -}
              -
              -message VarDesc {
              -  required string name = 1;
              -  enum VarType { 
              -    LOD_TENSOR = 0;
              -    SELECTED_ROWS = 1;
              -  }
              -  required VarType type = 2;
              -  optional LodTensorDesc lod_desc = 3;
              -  optional TensorDesc selected_rows_desc = 4;
              -  optional bool persistable = 5 [ default = false ];
              -}
              -
              -
              -
              -
              -

              InferShape for Selected Rows

              -

              Just like LoD information, InferShape method will infer the output tensor type as well. The operator should decide whether its output is a SelectedRows or Dense tensor.

              -

              For example, the gradient operator of TableLookup will always generate SelectedRows. Its InferShape method should be like following

              -
              void TableLookupGrad::InferShape(context) {
              -  ...
              -  context.SetDataType("Embedding.Grad", kSelectedRows);
              -}
              -
              -
              -
              -
              -

              Sparse Operators

              -

              There are several operators that need to be written to support SelectedRows. These are:

              -
                -
              1. Operators which generate SelectedRows gradient. e.g. Gradient of TableLookupOp.
              2. -
              3. Optimize operators which support SelectedRows gradient. e.g. SGD or AdaGrad for SelectedRows. However, there should be only one SGD operator. OpWithKernel::Run should select a suitable kernel for both dense tensor or SelectedRows.
              4. -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/simple_op_design.html b/develop/doc/design/simple_op_design.html deleted file mode 100644 index ab16e08a0b7a29fce00e145904dfe47f55bd04cb..0000000000000000000000000000000000000000 --- a/develop/doc/design/simple_op_design.html +++ /dev/null @@ -1,441 +0,0 @@ - - - - - - - - - - - - - Interaction between C++ and Python — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Interaction between C++ and Python

              -

              Users employ API in Python to describe their own network, however, the network construction actually happens in C++. so Protobuf is introduced to send the message between Python and C++.

              -

              The Interaction between Python and C++ can be simplified as two steps:

              -
                -
              1. C++ tells Python how many Ops there are, and what parameter do users need to offer to initialize a new Op. Python then builds API for each Op at compile time.
              2. -
              3. Users invoke APIs built by Python and provide necessary parameters. These parameters will be sent to C++ for finishing the Op construction task.
              4. -
              -
              -

              Message from C++ to Python

              -

              We define a Protobuf message class OpProto to hold message needed in the first step. What should an OpProto contain? This question is equivalent to “What message do we need to offer, to build a Python API which is legal and user oriented and can use to describe a whole Op.”

              -

              Following message are necessary:

              -
                -
              1. Op’s name, and its simple comment.
              2. -
              3. Input and output variable number; each variable’s name, type, and comment.
              4. -
              5. Op’s attributes; each attribute includes name, type, comment, default value and value range.
              6. -
              -

              So OpProto can be defined as follows:

              -
              enum AttrType {
              -    INT = 1;
              -    FLOAT = 2;
              -    STRING = 3;
              -    INTS = 4;
              -    FLOATS = 5;
              -    STRINGS = 6;
              -};
              -
              -message AttrValue {
              -    AttrType type = 1;
              -    optional int iv = 2;
              -    optional float fv = 3;
              -    optional string sv = 4;
              -    repeated int ivs = 5;
              -    repeated float fvs = 6;
              -    repeated string svs = 7;
              -};
              -
              -message AttrProto {
              -    required string name = 1;
              -    required string comment = 2;
              -    required AttrType type = 3;
              -};
              -
              -message VarProto {
              -    required string name = 1;
              -    required string comment = 2;
              -    required bool is_tensor = 3;
              -};
              -
              -message OpProto {
              -    repeated VarProto inputs = 1;
              -    repeated VarProto outputs = 2;
              -    repeated AttrProto attrs = 3;
              -    required string type = 4;
              -    required string comment = 5;
              -};
              -
              -
              -

              To generate Python code automatically:

              -
              def create_python_ops_creatation_functions():
              -    op_protos = paddle.framework.OpRegistry.get_all_op_proto()
              -    for type_name in op_protos:
              -        op_proto = op_protos[type_name]
              -        def __impl__(**kwargs):  # User must use key word args in Paddle API
              -            inputs = [kwargs.get(ipt.name, "") for ipt in op_proto.inputs]
              -            outputs = [kwargs.get(opt.name, "") for opt in op_proto.outputs]
              -            attrs = [cast_to_op_attr(attr, kwargs.get(attr.name, None)) for attr in op_proto.attrs]
              -            opdesc = (input, outputs, type_name, attrs)
              -            return paddle.framework.OpRegistry.CreateOp(opdesc)
              -        __impl__.__doc__ = create_doc_string(op_proto)
              -        globals()[type_name] = __impl__
              -
              -create_python_ops_creatation_functions()
              -
              -
              -
              -
              -

              Message from Python to C++

              -

              To hold message needed in the above second step, we define Protobuf message class OpDesc. It is used to hold user-specified parameters in Op describing.

              -
              message OpDesc {
              -    required string type = 1;   
              -    repeated string inputs = 2;
              -    repeated string outputs = 3;
              -    map<string, AttrValue> attrs = 4;
              -};
              -
              -
              -
              -
              -
              -

              OpProto Register

              -

              Every Op has its own OpProto. For using convenience, we need to register them and record all their messages. For each Op class, we define a corresponding OpMaker class, in whose constructor we implement the OpProto‘s building process. OpMaker‘s constructor will be invoked by another function OpRegistry::RegisterOp().

              -
              class OpProtoMaker {
              -public:
              -    OpProtoMaker(OpProto* proto): proto_(proto) {}
              -protected:
              -    OpProto* proto_;
              -    void AddInput(const std::string& name, const std::string& desc) {...}
              -    void AddAttr(const std::string& name, const std::string& desc, TypeId type) {...}
              -    void AddComment(const std::string& comment) { ... }
              -};
              -
              -class OpRegistry {
              -public:
              -    using OpCreator = std::function<OperatorBase* (OpDesc& desc)>;
              -    
              -    template <typename OpType, typename OpMaker>
              -    static void RegisterOp(const std::string& name) {
              -        gCreators_[name] = [](const OpDesc& desc) {
              -            return new OpType(desc);
              -        };
              -        OpProto& opProto = gProtos_[name];
              -        OpMaker()(&opProto);
              -    }
              -
              -    static map<string, OpCreator> gCreators_;
              -    static map<string, OpProto> gProtos_;
              -};
              -
              -template <typename OpType, typename OpMaker>
              -class OpRegister {
              -  public:
              -    OpRegister(std::string type) {
              -        OpRegistry::RegisterOp<OpType, OpMaker>(type);
              -    }
              -};
              -
              -#define REGISTER_OP(op_class, op_maker_class, type_name)         \
              -    class op_class##Register {                                   \
              -      private:                                                   \
              -        const static OpRegister<#op_class, #op_maker_class> reg; \
              -    };                                                           \
              -    const Register op_class##Register::reg(#type_name);
              -    
              -class CosineOp {
              -// ...
              -}
              -
              -struct CosineOpProtoMaker : public OpProtoMaker {
              -    CosineOpProtoMaker(OpProto* proto) : OpProtoMaker(proto) {
              -        AddInput("input", "input of cosine op");
              -        AddAttr("scale", "scale of cosine op", float).Default(1.0).GreaterThan(0.0);
              -        AddType("cos");
              -        AddComment("This is cos op");
              -    }
              -}
              -
              -REGISTER_OP(CosineOp, CosineOpProtoMaker, cos);
              -
              -
              -

              In REGISTER_OP(CosineOp, CosineOpProtoMaker, cos), we register not only CosineOp but also CosineOpProto. As fields of CosineOpProto, the default value and value range of scale are also registered here.

              -
              -
              -

              Python API

              -

              Python APIs are divided into two types, high-level API and low-level API.

              -
              -

              High-Level API

              -

              High-level API is called by users directly, so it should keep its style consistent with existing V2 APIs.

              -

              Here is a sample about how a define a fc layer:

              -
              hd = fc_layer(input=data, size=56, with_bias=True, activation="sigmoid");
              -
              -
              -

              hd is the output of fc_layer and it’s a variable. It can be further sent into other layers as input.

              -

              The definition of fc_layer():

              -
              def fc_layer(input, size, with_bias, activation):
              -    attr_map = {"size":size}
              -    check_attrs(attr_map)
              -    w = make_variable('w')
              -    if with_bias:
              -        b = make_variable('b')
              -    else:
              -        b = None
              -    fc_output = make_variable('fc_output');
              -    fc_op(input, w, b, fc_output, attr_map)
              -    act_output = make_variable('sigmod_output');
              -    if activation == "sigmod":
              -        sigmod_op(fc_output, act_output);
              -    elif:
              -        # ...
              -    return act_output;
              -
              -
              -
              -
              -

              Low Leval API

              -

              In above sample, fc_op and sigmod_op are low-level API. They build OpDesc and invoke corresponding C++ code.

              -

              TODO

              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/speech/deep_speech_2.html b/develop/doc/design/speech/deep_speech_2.html deleted file mode 100644 index 9bad78759bb714c1bdb7ffb79f9d985a932a372d..0000000000000000000000000000000000000000 --- a/develop/doc/design/speech/deep_speech_2.html +++ /dev/null @@ -1,460 +0,0 @@ - - - - - - - - - - - - - DeepSpeech2 on PaddlePaddle: Design Doc — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              DeepSpeech2 on PaddlePaddle: Design Doc

              -

              We are planning to build Deep Speech 2 (DS2) [1], a powerful Automatic Speech Recognition (ASR) engine, on PaddlePaddle. For the first-stage plan, we have the following short-term goals:

              -
                -
              • Release a basic distributed implementation of DS2 on PaddlePaddle.
              • -
              • Contribute a chapter of Deep Speech to PaddlePaddle Book.
              • -
              -

              Intensive system optimization and low-latency inference library (details in [1]) are not yet covered in this first-stage plan.

              - -
              -

              Tasks

              -

              We roughly break down the project into 14 tasks:

              -
                -
              1. Develop an audio data provider:
                  -
                • Json filelist generator.
                • -
                • Audio file format transformer.
                • -
                • Spectrogram feature extraction, power normalization etc.
                • -
                • Batch data reader with SortaGrad.
                • -
                • Data augmentation (optional).
                • -
                • Prepare (one or more) public English data sets & baseline.
                • -
                -
              2. -
              3. Create a simplified DS2 model configuration:
                  -
                • With only fixed-length (by padding) audio sequences (otherwise need Task 3).
                • -
                • With only bidirectional-GRU (otherwise need Task 4).
                • -
                • With only greedy decoder (otherwise need Task 5, 6).
                • -
                -
              4. -
              5. Develop to support variable-shaped dense-vector (image) batches of input data.
                  -
                • Update DenseScanner in dataprovider_converter.py, etc.
                • -
                -
              6. -
              7. Develop a new lookahead-row-convolution layer (See [1] for details):
                  -
                • Lookahead convolution windows.
                • -
                • Within-row convolution, without kernels shared across rows.
                • -
                -
              8. -
              9. Build KenLM language model (5-gram) for beam search decoder:
                  -
                • Use KenLM toolkit.
                • -
                • Prepare the corpus & train the model.
                • -
                • Create infererence interfaces (for Task 6).
                • -
                -
              10. -
              11. Develop a beam search decoder with CTC + LM + WORDCOUNT:
                  -
                • Beam search with CTC.
                • -
                • Beam search with external custom scorer (e.g. LM).
                • -
                • Try to design a more general beam search interface.
                • -
                -
              12. -
              13. Develop a Word Error Rate evaluator:
                  -
                • update ctc_error_evaluator(CER) to support WER.
                • -
                -
              14. -
              15. Prepare internal dataset for Mandarin (optional):
                  -
                • Dataset, baseline, evaluation details.
                • -
                • Particular data preprocessing for Mandarin.
                • -
                • Might need cooperating with the Speech Department.
                • -
                -
              16. -
              17. Create standard DS2 model configuration:
                  -
                • With variable-length audio sequences (need Task 3).
                • -
                • With unidirectional-GRU + row-convolution (need Task 4).
                • -
                • With CTC-LM beam search decoder (need Task 5, 6).
                • -
                -
              18. -
              19. Make it run perfectly on clusters.
              20. -
              21. Experiments and benchmarking (for accuracy, not efficiency):
                  -
                • With public English dataset.
                • -
                • With internal (Baidu) Mandarin dataset (optional).
                • -
                -
              22. -
              23. Time profiling and optimization.
              24. -
              25. Prepare docs.
              26. -
              27. Prepare PaddlePaddle Book chapter with a simplified version.
              28. -
              -
              -
              -

              Task Dependency

              -

              Tasks parallelizable within phases:

              -

              Roadmap | Description | Parallelizable Tasks -———– | :———————————— | :——————– -Phase I | Simplified model & components | Task 1 ~ Task 8 -Phase II | Standard model & benchmarking & profiling | Task 9 ~ Task 12 -Phase III | Documentations | Task13 ~ Task14

              -

              Issue for each task will be created later. Contributions, discussions and comments are all highly appreciated and welcomed!

              -
              -
              -

              Design Details

              -
              -

              Overview

              -

              Traditional ASR (Automatic Speech Recognition) pipelines require great human efforts devoted to elaborately tuning multiple hand-engineered components (e.g. audio feature design, accoustic model, pronuncation model and language model etc.). Deep Speech 2 (DS2) [1], however, trains such ASR models in an end-to-end manner, replacing most intermediate modules with only a single deep network architecture. With scaling up both the data and model sizes, DS2 achieves a very significant performance boost.

              -

              Please read Deep Speech 2 [1,2] paper for more background knowledge.

              -

              The classical DS2 network contains 15 layers (from bottom to top):

              -
                -
              • Two data layers (audio spectrogram, transcription text)
              • -
              • Three 2D convolution layers
              • -
              • Seven uni-directional simple-RNN layers
              • -
              • One lookahead row convolution layers
              • -
              • One fully-connected layers
              • -
              • One CTC-loss layer
              • -
              -
              -
              -Figure 1. Archetecture of Deep Speech 2 Network. -

              We don’t have to persist on this 2-3-7-1-1-1 depth [2]. Similar networks with different depths might also work well. As in [1], authors use a different depth (e.g. 2-2-3-1-1-1) for final experiments.

              -

              Key ingredients about the layers:

              -
                -
              • Data Layers:
                  -
                • Frame sequences data of audio spectrogram (with FFT).
                • -
                • Token sequences data of transcription text (labels).
                • -
                • These two type of sequences do not have the same lengthes, thus a CTC-loss layer is required.
                • -
                -
              • -
              • 2D Convolution Layers:
                  -
                • Not only temporal convolution, but also frequency convolution. Like a 2D image convolution, but with a variable dimension (i.e. temporal dimension).
                • -
                • With striding for only the first convlution layer.
                • -
                • No pooling for all convolution layers.
                • -
                -
              • -
              • Uni-directional RNNs
                  -
                • Uni-directional + row convolution: for low-latency inference.
                • -
                • Bi-direcitional + without row convolution: if we don’t care about the inference latency.
                • -
                -
              • -
              • Row convolution:
                  -
                • For looking only a few steps ahead into the feature, instead of looking into a whole sequence in bi-directional RNNs.
                • -
                • Not nessesary if with bi-direcitional RNNs.
                • -
                • Row” means convolutions are done within each frequency dimension (row), and no convolution kernels shared across.
                • -
                -
              • -
              • Batch Normalization Layers:
                  -
                • Added to all above layers (except for data and loss layer).
                • -
                • Sequence-wise normalization for RNNs: BatchNorm only performed on input-state projection and not state-state projection, for efficiency consideration.
                • -
                -
              • -
              -

              Required Components | PaddlePaddle Support | Need to Develop -:————————————- | :————————————– | :———————– -Data Layer I (Spectrogram) | Not supported yet. | TBD (Task 3) -Data Layer II (Transcription) | paddle.data_type.integer_value_sequence | - -2D Convolution Layer | paddle.layer.image_conv_layer | - -DataType Converter (vec2seq) | paddle.layer.block_expand | - -Bi-/Uni-directional RNNs | paddle.layer.recurrent_group | - -Row Convolution Layer | Not supported yet. | TBD (Task 4) -CTC-loss Layer | paddle.layer.warp_ctc | - -Batch Normalization Layer | paddle.layer.batch_norm | - -CTC-Beam search | Not supported yet. | TBD (Task 6)

              -
              -
              -

              Row Convolution

              -

              TODO by Assignees

              -
              -
              -

              Beam Search with CTC and LM

              -
              -
              -Figure 2. Algorithm for CTC Beam Search Decoder. -
                -
              • The Beam Search Decoder for DS2 CTC-trained network follows the similar approach in [3] as shown in Figure 2, with two important modifications for the ambiguous parts:
                  -
                  1. -
                  2. in the iterative computation of probabilities, the assignment operation is changed to accumulation for one prefix may comes from different paths;
                  3. -
                  -
                • -
                  1. -
                  2. the if condition if l^+ not in A_prev then after probabilities’ computation is deprecated for it is hard to understand and seems unnecessary.
                  3. -
                  -
                • -
                -
              • -
              • An external scorer would be passed into the decoder to evaluate a candidate prefix during decoding whenever a white space appended in English decoding and any character appended in Mandarin decoding.
              • -
              • Such external scorer consists of language model, word count or any other custom scorers.
              • -
              • The language model is built from Task 5, with parameters should be carefully tuned to achieve minimum WER/CER (c.f. Task 7)
              • -
              • This decoder needs to perform with high efficiency for the convenience of parameters tuning and speech recognition in reality.
              • -
              -
              -
              -
              -

              Future Work

              -
                -
              • Efficiency Improvement
              • -
              • Accuracy Improvement
              • -
              • Low-latency Inference Library
              • -
              • Large-scale benchmarking
              • -
              -
              - -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/support_new_device.html b/develop/doc/design/support_new_device.html deleted file mode 100644 index 76d9ea85ee958ed64350c040e6d0a2ace9f49a53..0000000000000000000000000000000000000000 --- a/develop/doc/design/support_new_device.html +++ /dev/null @@ -1,453 +0,0 @@ - - - - - - - - - - - - - Design Doc: Supporting new Device/Library — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Supporting new Device/Library

              -
              -

              Background

              -

              Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries in a flexible and efficient manner.

              -

              On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example, Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library.

              -

              On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, Layer is exposed in Python, and Operator is exposed in C++. Both Layer and Operator are hardware independent.

              -

              So, how to support a new Device/Library in Fluid becomes a challenge.

              -
              -
              -

              Basic: Integrate A New Device/Library

              -

              For a general overview of fluid, please refer to the overview doc.

              -

              There are mainly three parts that we have to consider while integrating a new device/library:

              -
                -
              • Place and DeviceContext: indicate the device id and manage hardware resources
              • -
              • Memory and Tensor: malloc/free data on certain device
              • -
              • Math Functor and OpKernel: implement computing unit on certain devices/libraries
              • -
              -
              -

              Place and DeviceContext

              -

              Please note that device and computing library are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices.

              -
              -

              Place

              -

              Fluid uses class Place to represent the device memory where data is located. If we add another device, we have to add the corresponding DevicePlace.

              -
                      |   CPUPlace
              -Place --|   CUDAPlace
              -        |   FPGAPlace
              -
              -
              -

              And Place is defined as follows:

              -
              typedef boost::variant<CUDAPlace, CPUPlace, FPGAPlace> Place;
              -
              -
              -
              -
              -

              DeviceContext

              -

              Fluid uses class DeviceContext to manage the resources in different libraries, such as CUDA stream in CDUADeviceContext. There are also inheritance relationships between different kinds of DeviceContext.

              -
                              /->  CPUDeviceContext   
              -DeviceContext ---->  CUDADeviceContext  
              -                \->  FPGADeviceContext
              -
              -
              -

              An example of Nvidia GPU is as follows:

              -
                -
              • DeviceContext
              • -
              -
              class DeviceContext {
              -  virtual Place GetPlace() const = 0;
              -};  
              -
              -
              -
                -
              • CUDADeviceContext
              • -
              -
              class CUDADeviceContext : public DeviceContext {
              -  Place GetPlace() const override { return place_; }
              -private:
              -  CUDAPlace place_;
              -  cudaStream_t stream_; 
              -  cublasHandle_t cublas_handle_;
              -  std::unique_ptr<Eigen::GpuDevice> eigen_device_;  // binds with stream_
              -};
              -
              -
              -
              -
              -
              -

              Memory and Tensor

              -
              -

              memory module

              -

              Fluid provides the following memory interfaces:

              -
              template <typename Place>
              -void* Alloc(Place place, size_t size);
              -
              -template <typename Place>
              -void Free(Place place, void* ptr);
              -
              -template <typename Place>
              -size_t Used(Place place);
              -
              -
              -

              To implement these interfaces, we have to implement MemoryAllocator for different Devices.

              -
              -
              -

              Tensor

              -

              Tensor holds data with some shape in a specific Place.

              -
              class Tensor {
              - public:
              -  /*! Return a pointer to mutable memory block. */
              -  template <typename T>
              -  inline T* data();
              -
              -  /**
              -   * @brief   Return a pointer to mutable memory block.
              -   * @note    If not exist, then allocation.
              -   */
              -  template <typename T>
              -  inline T* mutable_data(platform::Place place);
              -
              -  /**
              -   * @brief     Return a pointer to mutable memory block.
              -   *
              -   * @param[in] dims    The dimensions of the memory block.
              -   * @param[in] place   The place of the memory block.
              -   *
              -   * @note      If not exist, then allocation.
              -   */
              -  template <typename T>
              -  inline T* mutable_data(DDim dims, platform::Place place);
              -
              -  /*! Resize the dimensions of the memory block. */
              -  inline Tensor& Resize(const DDim& dims);
              -
              -  /*! Return the dimensions of the memory block. */
              -  inline const DDim& dims() const;
              -
              - private:
              -  /*! holds the memory block if allocated. */
              -  std::shared_ptr<Placeholder> holder_;
              -
              -  /*! points to dimensions of memory block. */
              -  DDim dim_;
              -};
              -
              -
              -

              Placeholder is used to delay memory allocation; that is, we can first define a tensor, using Resize to configurate its shape, and then call mutuable_data to allocate the actual memory.

              -
              paddle::framework::Tensor t;
              -paddle::platform::CPUPlace place;
              -// set size first
              -t.Resize({2, 3});
              -// allocate memory on CPU later
              -t.mutable_data(place);
              -
              -
              -
              -
              -
              -

              Math Functor and OpKernel

              -

              Fluid implements computing units based on different DeviceContexts. Some computing units are shared between operators. This common part will be put in operators/math directory as basic Functors.

              -

              Let’s take MaxOutFunctor as an example:

              -

              The interface is defined in the header file.

              -
              template <typename DeviceContext, typename T>
              -class MaxOutFunctor {
              - public:
              -  void operator()(const DeviceContext& context, const framework::Tensor& input,
              -                  framework::Tensor* output, int groups);
              -};
              -
              -
              -

              CPU implementation is in .cc file

              -
              template <typename T>
              -class MaxOutFunctor<platform::CPUDeviceContext, T> {
              -  public:
              -  void operator()(const platform::CPUDeviceContext& context,
              -                  const framework::Tensor& input, framework::Tensor* output,
              -                  int groups) {
              -                  ...
              -                  }
              -};
              -
              -
              -

              CUDA implementation is in .cu file

              -
              template <typename T>
              -class MaxOutFunctor<platform::CUDADeviceContext, T> {
              - public:
              -  void operator()(const platform::CUDADeviceContext& context,
              -                  const framework::Tensor& input, framework::Tensor* output,
              -                  int groups) {
              -                  ...
              -                  }
              -};                  
              -
              -
              -

              We first obtain the computing handle from a concrete DeviceContext and then compute on tensors.

              -

              The implementation of OpKernel is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map.

              -

              Fluid provides different register interfaces in op_registry.h

              -

              Let’s take Crop operator as an example:

              -

              In .cc file:

              -
              REGISTER_OP_CPU_KERNEL(crop, ops::CropKernel<float>);
              -REGISTER_OP_CPU_KERNEL(
              -    crop_grad, ops::CropGradKernel<paddle::platform::CPUDeviceContext, float>);
              -
              -
              -

              In .cu file:

              -
              REGISTER_OP_CUDA_KERNEL(crop, ops::CropKernel<float>);
              -REGISTER_OP_CUDA_KERNEL(
              -    crop_grad, ops::CropGradKernel<paddle::platform::CUDADeviceContext, float>);
              -
              -
              -
              -
              -
              -

              Advanced topics: How to switch between different Device/Library

              -

              Generally, we will implement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not suitable on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run on GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.

              -

              For more details, please refer to following docs:

              -
                -
              • operator kernel type doc
              • -
              • switch kernel doc
              • -
              -
              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/switch.html b/develop/doc/design/switch.html deleted file mode 100644 index 3ab75f9038886947e8570d2fd03ef1aa01f04628..0000000000000000000000000000000000000000 --- a/develop/doc/design/switch.html +++ /dev/null @@ -1,283 +0,0 @@ - - - - - - - - - - - - - Design Doc: Switch — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design Doc: Switch

              -
              -
              -

              Background

              -

              Many programming languages provide switch as a generalization of if-elif-else. We want to add it to Fluid.

              -

              The following example shows the usage of fluid.switch.

              -
              a = fluid.Var(10)
              -b = fluid.Var(0)
              -
              -with switch() as switch:
              -    with switch.case(fluid.less_equal(a, 10)):
              -        fluid.print("Case 1")
              -    with switch.case(fluid.larger(a, 0)):
              -        fluid.print("Case 2")
              -    with switch.default():
              -        fluid.print("Case 3")
              -
              -
              -
              -
              -

              The Semantics

              -
                -
              1. A switch control-flow checks cases one-by-one.
              2. -
              3. The condition of each case is a boolean value, which is a scalar, and differs from the fluid.if_else control-flow, which condition could be a vector of boolean values.
              4. -
              5. It runs the first matched case, or the default case if there is one.
              6. -
              7. Once it matches a case, it runs the corresponding branch and only that branch. It’s like there is a C’s break keyword at the end of each case.
              8. -
              -

              The above program should print and print only “Case 1”.

              -

              The implementation of the backward pass of the switch control-flow is easier than the backward of the if_else, because switch runs at most one branch, whereas if-else could run more than one branches.

              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/tensor_array.html b/develop/doc/design/tensor_array.html deleted file mode 100644 index 54efa977bc140f4dbebd286ec3c0ba8efdd74947..0000000000000000000000000000000000000000 --- a/develop/doc/design/tensor_array.html +++ /dev/null @@ -1,503 +0,0 @@ - - - - - - - - - - - - - Design for TensorArray — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Design for TensorArray

              -

              This design doc presents the necessity of a new C++ class TensorArray. -In addition to the very simple C++ implementation

              -
              class TensorArray {
              - public:
              -  explicit TensorArray(const LoDTensor&);
              -  explicit TensorArray(size_t size);
              -
              - private:
              -  vector<LoDTensor> values_;
              -};
              -
              -
              -

              We also need to expose it to PaddlePaddle’s Python API, -because users would want to use it with our very flexible operators WhileLoop. -An example for a RNN based on dynamic operators is

              -
              input = pd.data(...)
              -num_steps = Var(12)
              -
              -TensorArray states(size=num_steps)
              -TensorArray step_inputs(unstack_from=input)
              -TensorArray step_outputs(size=num_steps)
              -
              -W = Tensor(...)
              -U = Tensor(...)
              -default_state = some_op()
              -
              -step = Var(1)
              -
              -wloop = paddle.create_whileloop(loop_vars=[step])
              -with wloop.frame():
              -    wloop.break_if(pd.equal(step, num_steps)
              -    pre_state = states.read(step-1, default_state)
              -    step_input = step_inputs.read(step)
              -    state = pd.sigmoid(pd.matmul(U, pre_state) + pd.matmul(W, step_input))
              -    states.write(step, state)
              -    step_outputs.write(step, state) # output state
              -    step.update(state+1)
              -
              -output = step_outputs.stack()
              -
              -
              -
              -

              Background

              -

              Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call states[step_id] will get the state in step_idth time step.

              -

              An RNN can be implemented with the following pseudocode

              -
              Array states;
              -Array input_segments;
              -Array output_segments;
              -Parameter W, U;
              -
              -step = 1
              -seq_len = 12
              -while_loop {
              -   if (step == seq_len) break;
              -    states[step] = sigmoid(W * states[step-1] + U * input_segments[step]);
              -    output_segments[step] = states[step] // take state as output
              -   step++;
              -}
              -
              -
              -

              According to the RNN roadmap, there are several different RNNs that PaddlePaddle will eventually support.

              -

              Currently, the basic RNN implementation supported by PaddlePaddle is the recurrent_op which takes tensors as input and splits them into input_segments.

              -

              Since a tensor cannot store variable-length sequences directly, PaddlePaddle implements the tensor with level of details (LoDTensor for short). -Segmenting the LoDTensor is much more complicated than splitting a tensor, that makes it necessary to refactor the recurrent_op with LoDTensor segmenting support.

              -

              As the next step in RNN support, dynamic_recurrent_op should be introduced to handle inputs with variable-length sequences.

              -

              The implementation is similar to recurrent_op. -The key difference is the way the original input LoDTensors and outupts are split to get the input_segments and the output_segments.

              -

              Though it can’t be built over recurrent_op or dynamic_recurrent_op directly, -the logic behind splitting a tensor or a LoD tensor into input_segments remains the same.

              -
              -
              -

              Why TensorArray

              -

              The logic behind splitting the inputs to segments, states and outputs is similar and can be shared in a seperate module.

              -

              The array of states, input_segments and output_segments would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes.

              -

              So there should be an array-like container, which can store the segments of a tensor or LoD tensor.

              -

              This container can store an array of tensors and provides several methods to split a tensor or a LoD tensor . -This is where the notion of TensorArray comes from.

              -
              -
              -

              Introduce TensorArray to uniform all the three RNNs

              -

              TensorArray as a new concept is borrowed from TensorFlow, -it is meant to be used with dynamic iteration primitives such as while_loop and map_fn.

              -

              This concept can be used to support our new design of dynamic operations, and help to refactor some existing variant-sentence-related layers, -such as recurrent_op, RecurrentGradientMachine.

              -

              In our design for dynamic RNN, -TensorArray is used to segment inputs and store states in all time steps. -By providing some methods similar to a C++ array, -the definition of some state-based dynamic models such as RNN can be more natural and highly flexible.

              -
              -
              -

              Dynamic-operations on TensorArray

              -

              TensorArray will be used directly when defining dynamic models, so some operators listed below should be implemented

              -
              # several helper operators for TensorArray
              -def tensor_array_stack(ta, tensor):
              -    '''
              -    get a tensor array `ta`, return a packed `tensor`.
              -    '''
              -    pass
              -
              -def tensor_array_unstack(tensor, ta):
              -    '''
              -    get a `tensor`, unstack it and get a tensor array `ta`.
              -    '''
              -    pass
              -
              -def tensor_array_write(ta, index, tensor, data_shared):
              -    '''
              -    get a `tensor` and a scalar tensor `index`, write `tensor` into index-th
              -    value of the tensor array `ta`.
              -    `data_shared` is an attribute that specifies whether to copy or reference the tensors.
              -    '''
              -    pass
              -
              -def tensor_array_read(ta, index, tensor):
              -    '''
              -    get a tensor array `ta`, a scalar tensor `index`, read the index-th value of
              -    `ta` and return as the `tensor`.
              -    '''
              -    pass
              -
              -def tensor_array_size(ta, tensor):
              -    '''
              -    get a tensor array `ta`, return the size of `ta` and return as the scalar `tensor`.
              -    '''
              -    pass
              -
              -
              -

              It is trivial for users to use so many low-level operators, so some helper methods should be proposed in python wrapper to make TensorArray easier to use, -for example

              -
              class TensorArray:
              -    def __init__(self, name):
              -        self.name = name
              -        self.desc = TensorArrayDesc()
              -
              -    def stack(self, name=None):
              -        '''
              -        Pack the values in a `TensorArray` into a tensor with rank one higher
              -        than each tensor in `values`.
              -        `stack` can be used to split tensor into time steps for RNN or whileloop.
              -
              -        @name: str
              -            the name of the variable to output.
              -        '''
              -        tensor = Var(name)
              -        tensor_array_stack(self.name, tensor)
              -        return tensor
              -
              -    def unstack(self, input):
              -        '''
              -        Unpacks the given dimension of a rank-`R` tensor into rank-`(R-1)` tensors.
              -        `unstack` can be used to concatenate all the time steps for RNN or whileloop.
              -
              -        @input: str
              -            the name of input tensor
              -        '''
              -        tensor_array_unstack(tensor, self.name)
              -
              -    def write(self, index, value, data_shared=True):
              -        '''
              -        Write value into index of the TensorArray.
              -        If `data_shared` is set to True, than the index-th value in TensorArray will
              -        be shared with the tensor passed in.
              -
              -        @index: str
              -            name of a scalar tensor
              -        @value: str
              -            name of a tensor
              -        @data_shared: bool
              -        '''
              -        tensor_array_write(self.name, index, value, data_shared)
              -
              -    def read(self, index, output):
              -        '''
              -        Read the value at location `index` in the `TensorArray`.
              -
              -        @index: str
              -            name of a scalar tensor
              -        @output:
              -            name of a output variable
              -        '''
              -        tensor_array_read(self.name, index, output)
              -
              -
              -    def size(self, output):
              -        '''
              -        Return the number of values.
              -
              -        @output: str
              -            name of a scalar tensor
              -        '''
              -        tensor_array_size(self.name, output)
              -
              -
              -
              - -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/design/var_desc.html b/develop/doc/design/var_desc.html deleted file mode 100644 index 54b60cbff437c7bb05531d90e71299b66e3baef0..0000000000000000000000000000000000000000 --- a/develop/doc/design/var_desc.html +++ /dev/null @@ -1,327 +0,0 @@ - - - - - - - - - - - - - Background — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              - - - - -
              - - - - - - -
              -
              - - - - - - -
              - -
              -
              -
              -
              - -
              -

              Background

              -

              PaddlePaddle divides the description of neural network computation into two stages: compile time and runtime. At compile time, the neural network computation is described as a ProgramDesc whereas at runtime an Executor interprets the ProgramDesc to compute the operations.

              -

              PaddlePaddle uses proto message to describe compile time program because :

              -
                -
              1. The computation program description must be serializable and saved in a file.
              2. -
              3. During distributed training, the serialized program will be sent to multiple workers. It should also be possible to break the program into different components, each of which can be executed on a different worker.
              4. -
              -

              The computation Program consists of nested Blocks. Each Block will consist of data(i.e. Variable) and Operations. The concept to represent them is in the table below.

              -

              | |compile time|runtime| -|—|—|—| -|Data|VarDesc(proto)|Variable(cpp)| -|Operation|OpDesc(proto)|Operator(cpp)|

              -
              -
              -

              Definition of VarType

              -

              A VarDesc should have a name, type and whether or not it is persistable. The are different kinds of variable types supported in PaddlePaddle, apart from the POD_Types like: LOD_TENSOR, SELECTED_ROWS, FEED_MINIBATCH, FETCH_LIST, STEP_SCOPES, LOD_RANK_TABLE, LOD_TENSOR_ARRAY, PLACE_LIST, READER and CHANNEL. These are declared inside VarType. A VarDesc then looks as the following:

              -
              message VarDesc {
              -  required string name = 1;
              -  required VarType type = 2;
              -  optional bool persistable = 3 [ default = false ];
              -}
              -
              -
              -
              -
              -

              Definition of TensorDesc

              -
              message TensorDesc {
              -  // Should only be PODType. Is enforced in C++
              -  required Type data_type = 1;
              -  repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
              -}
              -
              -
              -

              The Type here comes from the enum defined inside of VarType :

              -
              enum Type {
              -  // Pod Types
              -  BOOL = 0;
              -  INT16 = 1;
              -  INT32 = 2;
              -  INT64 = 3;
              -  FP16 = 4;
              -  FP32 = 5;
              -  FP64 = 6;
              -
              -  // Other types that may need additional descriptions
              -  LOD_TENSOR = 7;
              -  SELECTED_ROWS = 8;
              -  FEED_MINIBATCH = 9;
              -  FETCH_LIST = 10;
              -  STEP_SCOPES = 11;
              -  LOD_RANK_TABLE = 12;
              -  LOD_TENSOR_ARRAY = 13;
              -  PLACE_LIST = 14;
              -  READER = 15;
              -  CHANNEL = 16;
              -}
              -
              -
              -

              A TensorDesc describes SelectedRows and LoDTensor. For details of SelectedRows, please reference SelectedRows.

              -
              -
              -

              Definition of LodTensorDesc

              -
              message LoDTensorDesc {
              -  required TensorDesc tensor = 1;
              -  optional int32 lod_level = 2 [ default = 0 ];
              -}
              -
              -
              -

              A LoDTensorDesc contains a tensor and a lod_level.

              -
              -
              -

              Definition of Variable in Python

              -

              For Variable in Python, please reference Python API.

              -
              - - -
              -
              -
              - - -
              - -
              -

              - © Copyright 2016, PaddlePaddle developers. - -

              -
              - Built with Sphinx using a theme provided by Read the Docs. - -
              - -
              -
              - -
              - -
              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/dev/contribute_to_paddle_en.html b/develop/doc/dev/contribute_to_paddle_en.html index eb34f2236a3082e10ddda941b4a79dc4d6ebe2a1..c57f7cb7c4e24e3b9c2eb8c5baed894b39d0fefa 100644 --- a/develop/doc/dev/contribute_to_paddle_en.html +++ b/develop/doc/dev/contribute_to_paddle_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || [];
          • +
          • C-API Prediction Library +
          • RNN Models
            • RNN Configuration
            • Recurrent Group Tutorial
            • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
            • Development
            • FAQ
                diff --git a/develop/doc/dev/index_en.html b/develop/doc/dev/index_en.html index 1d451b4c23ba0d670e9aa7e50412f21024e71769..569ec5c5ca1d9addaf7508c0712cad7ba7f50cda 100644 --- a/develop/doc/dev/index_en.html +++ b/develop/doc/dev/index_en.html @@ -123,6 +123,12 @@ var _hmt = _hmt || [];
            • +
            • C-API Prediction Library +
            • RNN Models
              • RNN Configuration
              • Recurrent Group Tutorial
              • @@ -136,6 +142,7 @@ var _hmt = _hmt || [];
              • Development
              • FAQ
                  @@ -196,6 +203,7 @@ var _hmt = _hmt || []; diff --git a/develop/doc/dev/new_layer_en.html b/develop/doc/dev/new_layer_en.html index 4a11afd9280c600f24e78a9011dea118abc48ed3..58eeb6ad1096c168a9d951c32511645b849825aa 100644 --- a/develop/doc/dev/new_layer_en.html +++ b/develop/doc/dev/new_layer_en.html @@ -35,7 +35,10 @@ - + + + + - - - - - - - - - -
                  - - - - -
                  - - - - - - -
                  -
                  - - - - - - -
                  - -
                  -
                  -
                  -
                  - -
                  -

                  How to write a new operator

                  - -
                  -

                  Background

                  -

                  Here are the base types needed. For details, please refer to the design docs.

                  -
                    -
                  • class OpProtoAndCheckerMaker: Describes an Operator’s input, output, attributes and description, mainly used to interface with Python API.
                  • -
                  • framework::OperatorBase: Operator (Op)base class.
                  • -
                  • framework::OpKernel: Base class for Op computation kernel.
                  • -
                  • framework::OperatorWithKernel: Inherited from OperatorBase, describing an operator with computation kernels.
                  • -
                  -

                  Operators can be categorized into two groups: operator with kernel(s) and operator without kernel(s). An operator with kernel(s) inherits from OperatorWithKernel while the one without kernel(s) inherits from OperatorBase. This tutorial focuses on implementing operators with kernels. In short, an operator includes the following information:

                  -

                  Information | Where is it defined -————– | :———————- -OpProtoMake definition | .ccfiles, Backward Op does not need an OpProtoMake interface. -Op definition | .cc files -Kernel implementation | The kernel methods shared between CPU and CUDA are defined in .h files. CPU-specific kernels live in .cc files, while CUDA-specific kernels are implemented in .cufiles. -Registering the Op | Ops are registered in .cc files; For Kernel registration, .cc files contain the CPU implementation, while .cu files contain the CUDA implementation.

                  -

                  New Operator implementations are added to the list paddle/operators, with file names in the format *_op.h (if applicable), *_op.cc, *_op.cu (if applicable).** The system will use the naming scheme to automatically build operators and their corresponding Python extensions.**

                  -

                  Let’s take matrix multiplication operator, MulOp, as an example to introduce the writing of an Operator with Kernel.

                  -
                  -
                  -

                  Implementing C++ Types

                  -
                  -

                  Defining ProtoMaker

                  -

                  Matrix Multiplication can be written as $Out = X * Y$, meaning that the operation consists of two inputs and pne output.

                  -

                  First, define ProtoMaker to describe the Operator’s input, output, and additional comments:

                  -
                  class MulOpMaker : public framework::OpProtoAndCheckerMaker {
                  - public:
                  -  MulOpMaker(OpProto *proto, OpAttrChecker *op_checker)
                  -      : OpProtoAndCheckerMaker(proto, op_checker) {
                  -    AddInput("X", "(Tensor), 2D tensor of size (M x K)");
                  -    AddInput("Y", "(Tensor), 2D tensor of size (K x N)");
                  -    AddOutput("Out", "(Tensor), 2D tensor of size (M x N)");
                  -    AddComment(R"DOC(
                  -Two Element Mul Operator.
                  -The equation is: Out = X * Y
                  -)DOC");
                  -  }
                  -};
                  -
                  -
                  -

                  MulOpMakeris inherited fromframework::OpProtoAndCheckerMaker, consisting of 2 variables in the constructor:

                  -
                    -
                  • framework::OpProto stores Operator input and variable attribute, used for generating Python API interfaces.
                  • -
                  • framework::OpAttrChecker is used to validate variable attributes.
                  • -
                  -

                  The constructor utilizes AddInput, AddOutput, and AddComment, so that the corresponding information will be added to OpProto.

                  -

                  The code above adds two inputs X and Y to MulOp, an output Out, and their corresponding descriptions, in accordance to Paddle’s naming convention.

                  -

                  An additional example ScaleOp is implemented as follows:

                  -
                  template <typename AttrType>
                  -class ScaleOpMaker : public framework::OpProtoAndCheckerMaker {
                  - public:
                  -  ScaleOpMaker(OpProto *proto, OpAttrChecker *op_checker)
                  -      : OpProtoAndCheckerMaker(proto, op_checker) {
                  -    AddInput("X", "The input tensor of scale operator.").NotInGradient();
                  -    AddOutput("Out", "The output tensor of scale operator.").NotInGradient();
                  -    AddComment(R"DOC(Scale operator
                  -The equation is: Out = scale*X
                  -)DOC");
                  -    AddAttr<AttrType>("scale", "scale of scale operator.").SetDefault(1.0);
                  -  }
                  -};
                  -
                  -
                  -

                  There are two changes in this example:

                  -
                    -
                  • AddInput("X","...").NotInGradient() expresses that input X is not involved in ScaleOp‘s corresponding computation. If an input to an operator is not participating in back-propagation, please explicitly set .NotInGradient().
                  • -
                  • AddAttr<AttrType>("scale", "...").SetDefault(1.0); adds scaleconstant as an attribute, and sets the default value to 1.0.
                  • -
                  -
                  -
                  -

                  Defining Operator

                  -

                  The following code defines the interface for MulOp:

                  -
                  class MulOp : public framework::OperatorWithKernel {
                  - public:
                  -  using framework::OperatorWithKernel::OperatorWithKernel;
                  -
                  - protected:
                  -  void InferShape(const framework::InferShapeContext &ctx) const override {
                  -    auto dim0 = ctx.Input<Tensor>("X")->dims();
                  -    auto dim1 = ctx.Input<Tensor>("Y")->dims();
                  -    PADDLE_ENFORCE_EQ(dim0.size(), 2,
                  -                      "input X(%s) should be a tensor with 2 dims, a matrix",
                  -                      ctx.op_.Input("X"));
                  -    PADDLE_ENFORCE_EQ(dim1.size(), 2,
                  -                      "input Y(%s) should be a tensor with 2 dims, a matrix",
                  -                      ctx.op_.Input("Y"));
                  -    PADDLE_ENFORCE_EQ(
                  -        dim0[1], dim1[0],
                  -        "First matrix's width must be equal with second matrix's height.");
                  -    ctx.Output<Tensor>("Out")->Resize({dim0[0], dim1[1]});
                  -  }
                  -};
                  -
                  -
                  -

                  MulOp is inherited from OperatorWithKernel. Its public member

                  -
                  using framework::OperatorWithKernel::OperatorWithKernel;
                  -
                  -
                  -

                  expresses an operator constructor using base class OperatorWithKernel, alternatively written as

                  -
                  MulOp(const std::string &type, const framework::VariableNameMap &inputs,
                  -      const framework::VariableNameMap &outputs,
                  -      const framework::AttributeMap &attrs)
                  -  : OperatorWithKernel(type, inputs, outputs, attrs) {}
                  -
                  -
                  -

                  InferShape interface needs to be re-written.InferShape is a constant method and cannot modify Op’s member variables, its constant member const framework::InferShapeContext &ctx can be used to extract input, output, and attributes. It functions to

                  -
                    -
                  • 1). validate and error out early: it checks input data dimensions and types.
                  • -
                  • 2). configures the tensor shape in the output.
                  • -
                  -

                  Usually OpProtoMaker and Op‘s type definitions are written in .cc files, which also include the registration methods introduced later.

                  -
                  -
                  -

                  Defining OpKernel

                  -

                  MulKernel inherits framework::OpKernel, which includes the following templates:

                  -
                    -
                  • typename DeviceContext denotes device context type. When different devices, namely the CPUDeviceContext and the CUDADeviceContext, share the same kernel, this template needs to be added. If they don’t share kernels, this must not be added. An example of a non-sharing kernel is OnehotCrossEntropyOpKernel.
                  • -
                  • typename T denotes data type, such as float or double.
                  • -
                  -

                  MulKernel types need to rewrite the interface for Compute.

                  -
                    -
                  • Compute takes one input parameter: const framework::ExecutionContext& context.
                  • -
                  • Compared with InferShapeContext, ExecutionContext includes device types, and can similarly extract input, output, and attribute variables.
                  • -
                  • Compute implements the computation logics of an OpKernel.
                  • -
                  -

                  MulKernel‘s implementation of Compute is as follows:

                  -
                  template <typename DeviceContext, typename T>
                  -class MulKernel : public framework::OpKernel {
                  -public:
                  -void Compute(const framework::ExecutionContext& context) const override {
                  -  auto* X = context.Input<Tensor>("X");
                  -  auto* Y = context.Input<Tensor>("Y");
                  -  auto* Z = context.Output<Tensor>("Out");
                  -  Z->mutable_data<T>(context.GetPlace());
                  -  auto& device_context = context.template device_context<DeviceContext>();
                  -  math::matmul<DeviceContext, T>(*X, false, *Y, false, 1, Z, 0, device_context);
                  -}
                  -};
                  -
                  -
                  -

                  Note that different devices (CPU, CUDA)share one Op definition; whether or not they share the same OpKernel depends on whether Compute calls functions can support both devices.

                  -

                  MulOp‘s CPU and CUDA share the same Kernel. A non-sharing OpKernel example can be seen in OnehotCrossEntropyOpKernel.

                  -

                  To ease the writing of OpKernel compute, and for reusing code cross-device, Eigen-unsupported Tensor module is used to implement Compute interface. To learn about how the Eigen library is used in PaddlePaddle, please see usage document.

                  -

                  This concludes the forward implementation of an operator. Next its operation and kernel need to be registered in a .cc file.

                  -

                  The definition of its corresponding backward operator, if applicable, is similar to that of an forward operator. Note that a backward operator does not include a ProtoMaker.

                  -
                  -
                  -

                  Registering Operator and OpKernel

                  -
                    -
                  • In .cc files, register forward and backward operator classes and the CPU kernel.

                    -
                    namespace ops = paddle::operators;
                    -REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad);
                    -
                    -REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUDeviceContext, float>);
                    -REGISTER_OP_CPU_KERNEL(mul_grad,
                    -              ops::MulGradKernel<paddle::platform::CPUDeviceContext, float>);
                    -
                    -
                    -

                    In that code block,

                    -
                      -
                    • REGISTER_OP registers the ops::MulOp class, type named mul, its type ProtoMaker is ops::MulOpMaker, registering ops::MulOpGrad as mul_grad.
                    • -
                    • REGISTER_OP_WITHOUT_GRADIENT registers an operator without gradient.
                    • -
                    • REGISTER_OP_CPU_KERNEL registers ops::MulKernel class and specialized template types paddle::platform::CPUPlace and float, which also registers ops::MulGradKernel.
                    • -
                    -
                  • -
                  -
                    -
                  • Registering CUDA Kernel in .cu files

                    -
                      -
                    • Note that if CUDA Kernel is implemented using the Eigen unsupported module, then on top of .cu, a macro definition #define EIGEN_USE_GPU is needed, such as
                    • -
                    -
                    // if use Eigen unsupported module before include head files
                    -#define EIGEN_USE_GPU
                    -
                    -namespace ops = paddle::operators;
                    -REGISTER_OP_CUDA_KERNEL(mul, ops::MulKernel<paddle::platform::CUDADeviceContext, float>);
                    -REGISTER_OP_CUDA_KERNEL(mul_grad,
                    -                       ops::MulGradKernel<paddle::platform::CUDADeviceContext, float>);
                    -
                    -
                    -
                  • -
                  -
                  -
                  -

                  Compilation

                  -

                  Run the following commands to compile.

                  -
                  # maybe you need to rerun cmake
                  -make mul_op
                  -
                  -
                  -
                  -
                  -
                  -

                  Python Binding

                  -

                  The system will automatically bind to Python and link it to a generated library.

                  -
                  -
                  -

                  Unit Tests

                  -

                  Unit tests for an operator include

                  -
                    -
                  1. comparing a forward operator’s implementations on different devices,
                  2. -
                  3. comparing a backward operator’s implementation on different devices, and
                  4. -
                  5. a scaling test for the backward operator.
                  6. -
                  -

                  Here, we introduce the unit tests for MulOp.

                  -
                  -

                  Testing Forward Operators

                  -

                  A forward operator unit test inherits unittest.TestCase and defines metaclass __metaclass__ = OpTestMeta. More concrete tests are performed in OpTestMeta. Testing a forward operator requires the following:

                  -
                    -
                  1. Defining input, output and relevant attributes in setUp method.
                  2. -
                  3. Generating random input data.
                  4. -
                  5. Implementing the same computation logic in a Python script.
                  6. -
                  7. Call check gradient function to check the backward operator.
                  8. -
                  -
                  import unittest
                  -import numpy as np
                  -from op_test import OpTest
                  -
                  -
                  -class TestMulOp(OpTest):
                  -    def setUp(self):
                  -        self.op_type = "mul"
                  -        self.inputs = {
                  -            'X': np.random.random((32, 84)).astype("float32"),
                  -            'Y': np.random.random((84, 100)).astype("float32")
                  -        }
                  -        self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
                  -
                  -    def test_check_output(self):
                  -        self.check_output()
                  -        
                  -    def test_check_grad_normal(self):
                  -        self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)
                  -
                  -    def test_check_grad_ingore_x(self):
                  -        self.check_grad(
                  -            ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X"))
                  -
                  -    def test_check_grad_ingore_y(self):
                  -        self.check_grad(
                  -            ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y'))
                  -
                  -
                  -

                  Get its output, and compare it with the forward operator’s own output.

                  -

                  The code above first loads required packages. In addition, we have

                  -
                    -
                  • self.op_type = "mul" defines the type that is identical to what the operator’s registered type.
                  • -
                  • self.inputs defines input, with type numpy.array and initializes it.
                  • -
                  • self.outputs defines output and completes the same operator computation in the Python script, and returns its result from the Python script.
                  • -
                  -
                  -
                  -

                  Testing Backward Operators

                  -

                  Some key points in checking gradient above include:

                  -
                    -
                  • test_normal calls check_grad to validate scaling tests’ correctness and stability through numeric methods.
                      -
                    • The first variable ["X", "Y"] appoints X and Y to be scale tested.
                    • -
                    • The second variable "Out" points to the network’s final output target Out.
                    • -
                    • The third variable max_relative_error points to the maximum relative tolerance error during scaling tests.
                    • -
                    -
                  • -
                  • test_check_grad_ingore_x and test_check_grad_ingore_ybranches test the cases where there is only one scaling input.
                  • -
                  -
                  -
                  -

                  Compiling and Running

                  -

                  Any new unit testing file of the format test_*.py added to the director python/paddle/v2/framework/tests is automatically added to the project to compile.

                  -

                  Note that unlike the compile test for Ops, running unit tests requires compiling the entire project and requires compiling with flag WITH_TESTING on i.e. cmake paddle_dir -DWITH_TESTING=ON.

                  -

                  After successfully compiling the project, run the following command to run unit tests:

                  -
                  make test ARGS="-R test_mul_op -V"
                  -
                  -
                  -

                  Or,

                  -
                  ctest -R test_mul_op
                  -
                  -
                  -
                  -
                  -
                  -

                  Remarks

                  -
                    -
                  • Every *_op.h (if applicable), *_op.cc, and *_op.cu (if applicable) must be created for a unique Op. Compiling will fail if multiple operators are included per file.
                  • -
                  • The type with which an operator is registered needs to be identical to the Op’s name. Registering REGISTER_OP(B, ...) in A_op.cc will cause unit testing failures.
                  • -
                  • If the operator does not implement a CUDA kernel, please refrain from creating an empty *_op.cu file, or else unit tests will fail.
                  • -
                  • If multiple operators rely on some shared methods, a file NOT named *_op.* can be created to store them, such as gather.h.
                  • -
                  -
                  -
                  - - -
                  -
                  -
                  - - -
                  - -
                  -

                  - © Copyright 2016, PaddlePaddle developers. - -

                  -
                  - Built with Sphinx using a theme provided by Read the Docs. - -
                  - -
                  -
                  - -
                  - -
                  - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/dev/new_op_kernel_en.html b/develop/doc/dev/new_op_kernel_en.html deleted file mode 100644 index f2e1d255fe9f37803209765dfdc4e6b2a4e2acca..0000000000000000000000000000000000000000 --- a/develop/doc/dev/new_op_kernel_en.html +++ /dev/null @@ -1,366 +0,0 @@ - - - - - - - - - - - - - Add Kernels for a New Device — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                  - - - - -
                  - - - - - - -
                  -
                  - - - - - - -
                  - -
                  -
                  -
                  -
                  - -
                  -

                  Add Kernels for a New Device

                  -
                  -

                  Background

                  -

                  PaddlePaddle Fluid have hundreds of operators. Each operator could have one or more kernels. A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU.

                  -

                  This document explains how to add an operator, and its kernels. The kernels of an operator are indexed by a C++ type OpKernelType. An operator chooses the right kernel at runtime. This choosing mechanism is described here.

                  -
                  -
                  -

                  Write Kernels for A New Device

                  -
                  -

                  Add A New Device

                  -

                  For some historical reaons, we misuse the word library for device. For example, we call the deivce type by library type. An example is the header file library_type.h. We will correct this ASAP.

                  -

                  To register a new device, we need to add an enum value to LibraryType:

                  -
                  enum class LibraryType {
                  -  kPlain = 0,
                  -  kMKLDNN = 1,
                  -  kCUDNN = 2,
                  -};
                  -
                  -
                  -
                  -
                  -

                  Add A New Place

                  -

                  If you have a new kind of Device, firstly you need to add a new kind of Place. For example CUDAPlace:

                  -
                  struct CUDAPlace {
                  -  CUDAPlace() : CUDAPlace(0) {}
                  -  explicit CUDAPlace(int d) : device(d) {}
                  -
                  -  inline int GetDeviceId() const { return device; }
                  -  // needed for variant equality comparison
                  -  inline bool operator==(const CUDAPlace &o) const {
                  -    return device == o.device;
                  -  }
                  -  inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); }
                  -
                  -  int device;
                  -};
                  -
                  -typedef boost::variant<CUDAPlace, CPUPlace> Place;
                  -
                  -
                  -
                  -
                  -

                  Add device context

                  -

                  After a new kind of Device is added, you should add a corresponding DeviceContext for it.

                  -
                  class DeviceContext {
                  - public:
                  -  virtual ~DeviceContext() {}
                  -  virtual Place GetPlace() const = 0;
                  -
                  -  virtual void Wait() const {}
                  -};
                  -
                  -
                  -
                  -
                  -

                  Implement new OpKernel for your Device.

                  -

                  A detailed documentation can be found in new_op_and_kernel

                  -
                  class OpKernelBase {
                  - public:
                  -  /**
                  -   * ExecutionContext is the only parameter of Kernel Run function.
                  -   * Run will get input/output variables, state such as momentum and
                  -   * device resource such as CUDA stream, cublas handle, etc. from
                  -   * ExecutionContext. User should construct it before run the Operator.
                  -   */
                  -
                  -  virtual void Compute(const ExecutionContext& context) const = 0;
                  -
                  -  virtual ~OpKernelBase() = default;
                  -};
                  -
                  -template <typename T>
                  -class OpKernel : public OpKernelBase {
                  - public:
                  -  using ELEMENT_TYPE = T;
                  -};
                  -
                  -
                  -
                  -
                  -

                  Register the OpKernel to framework

                  -

                  After writing the components described above, we should register the kernel to the framework.

                  -

                  We use REGISTER_OP_KERNEL to do the registration.

                  -
                  REGISTER_OP_KERNEL(
                  -    op_type,
                  -    library_type,
                  -    place_type,
                  -    kernel0, kernel1, ...)
                  -
                  -
                  -

                  kernel0, kernel1 are kernels that have the same op_type, library_type, place_type but different data_types.

                  -

                  take conv2d as an example:

                  -
                  ```cpp
                  -REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace,
                  -        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
                  -        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);
                  -
                  -REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace,
                  -       paddle::operators::CUDNNConvOpKernel<float>,
                  -       paddle::operators::CUDNNConvOpKernel<double>);
                  -```
                  -
                  -
                  -

                  In the code above:

                  -
                    -
                  • conv2d is the type/name of the operator
                  • -
                  • CUDNN/CPU is library
                  • -
                  • paddle::platform::CUDAPlace/CPUPlace is place
                  • -
                  • template parameter float/double on CUDNNConvOpKernel<T> is data_type.
                  • -
                  -
                  -
                  -
                  - - -
                  -
                  -
                  - - -
                  - -
                  -

                  - © Copyright 2016, PaddlePaddle developers. - -

                  -
                  - Built with Sphinx using a theme provided by Read the Docs. - -
                  - -
                  -
                  - -
                  - -
                  - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/dev/use_eigen_en.html b/develop/doc/dev/use_eigen_en.html deleted file mode 100644 index d36f5d2c5ad901df7623a03186286c786590532d..0000000000000000000000000000000000000000 --- a/develop/doc/dev/use_eigen_en.html +++ /dev/null @@ -1,380 +0,0 @@ - - - - - - - - - - - - - How to use Eigen in Paddle — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                  - - - - -
                  - - - - - - -
                  -
                  - - - - - - -
                  - -
                  -
                  -
                  -
                  - -
                  -

                  How to use Eigen in Paddle

                  -

                  Essentially, a neural network is a compute graph. T data needed for the computation is stored in Tensors and its computation procedure is described by Operators. An Operator calls the Compute interface in its corresponding OpKernel and operates on the Tensor.

                  -
                  -

                  Eigen Tensor Module

                  -

                  The Eigen Tensor module supports powerful element-wise computation. In addition, a piece of code written using it can be run on both the CPU and the GPU.

                  -

                  Note that Eigen Tensor is still being actively developed, so its tests are not completely covered and its documentation may be sparse.

                  -

                  For details on Eigen Tensor module, please see doc 1 and doc 2.

                  -
                  -
                  -

                  paddle::framework::Tensor

                  -

                  Paddle Tensor’s is defined in the framework directory with the following interface:

                  -
                  class Tensor {
                  - public:
                  -  /*! Return a pointer to mutable memory block. */
                  -  template <typename T>
                  -  inline T* data();
                  -
                  -  /**
                  -   * @brief   Return a pointer to mutable memory block.
                  -   * @note    If not exist, then allocation.
                  -   */
                  -  template <typename T>
                  -  inline T* mutable_data(platform::Place place);
                  -
                  -  /**
                  -   * @brief     Return a pointer to mutable memory block.
                  -   *
                  -   * @param[in] dims    The dimensions of the memory block.
                  -   * @param[in] place   The place of the memory block.
                  -   *
                  -   * @note      If not exist, then allocation.
                  -   */
                  -  template <typename T>
                  -  inline T* mutable_data(DDim dims, platform::Place place);
                  -
                  -  /*! Resize the dimensions of the memory block. */
                  -  inline Tensor& Resize(const DDim& dims);
                  -
                  -  /*! Return the dimensions of the memory block. */
                  -  inline const DDim& dims() const;
                  -
                  - private:
                  -  /*! holds the memory block if allocated. */
                  -  std::shared_ptr<Placeholder> holder_;
                  -
                  -  /*! points to dimensions of memory block. */
                  -  DDim dim_;
                  -};
                  -
                  -
                  -

                  Placeholder is used to delay memory allocation; that is, we can first define a tensor, using Resize to configure its shape, and then call mutuable_data to allocate the actual memory.

                  -
                  paddle::framework::Tensor t;
                  -paddle::platform::CPUPlace place;
                  -// set size first
                  -t.Resize({2, 3});
                  -// allocate memory on CPU later
                  -t.mutable_data(place);
                  -
                  -
                  -
                  -
                  -

                  paddle::framework::Tensor Usage

                  -

                  AddOp demonstrates Tensor’s usage.

                  -
                    -
                  • InferShape
                  • -
                  -

                  When computing a neural network’s compute graph, first call every Operator‘s InferShape method, and use Resize to configure the size of the output tensor.

                  -
                  void InferShape(const framework::InferShapeContext &ctx) const override {
                  -  PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("X")->dims(),
                  -                    ctx.Input<Tensor>("Y")->dims(),
                  -                    "Two input of Add Op's dimension must be same.");
                  -  ctx.Output<Tensor>("Out")->Resize(ctx.Input<Tensor>("X")->dims());
                  -}
                  -
                  -
                  -
                    -
                  • Run
                  • -
                  -
                  void Compute(const framework::ExecutionContext& context) const override {
                  -  auto* input0 = context.Input<Tensor>("X");
                  -  auto* input1 = context.Input<Tensor>("Y");
                  -  auto* output = context.Output<Tensor>("Out");
                  -
                  -  output->mutable_data<T>(context.GetPlace());
                  -
                  -  auto x = EigenVector<T>::Flatten(*input0);
                  -  auto y = EigenVector<T>::Flatten(*input1);
                  -  auto z = EigenVector<T>::Flatten(*output);
                  -
                  -  auto place = context.GetEigenDevice<Place>();
                  -
                  -  z.device(place) = x + y;
                  -}
                  -
                  -
                  -
                  -
                  -

                  paddle::framework::Tensor到EigenTensor的转换

                  -

                  As shown above, in actual computation, we need to transform the input and output Tensors into formats Eigen supports. We show some functions in eigen.h to implement the transformation from paddle::framework::Tensorto EigenTensor/EigenMatrix/EigenVector/EigenScalar.

                  -

                  Using EigenTensor as an example:

                  -
                  Tensor t;
                  -float* p = t.mutable_data<float>(make_ddim({1, 2, 3}), platform::CPUPlace());
                  -for (int i = 0; i < 1 * 2 * 3; i++) {
                  -  p[i] = static_cast<float>(i);
                  -}
                  -
                  -EigenTensor<float, 3>::Type et = EigenTensor<float, 3>::From(t);
                  -
                  -
                  -

                  From is an interfacing method provided by the EigenTensor template, which implements the transformation from a paddle::framework::Tensor object to an EigenTensor. Since rank is a template parameter, it needs to be explicitly specified at the time of the transformation.

                  -

                  In Eigen, tensors with different ranks are different types, with Vector bring a rank-1 instance. Note that EigenVector<T>::From uses a transformation from an 1-dimensional Paddle tensor to a 1-dimensional Eigen tensor while EigenVector<T>::Flatten reshapes a paddle tensor and flattens it into a 1-dimensional Eigen tensor. Both resulting tensors are still typed EigenVector.

                  -

                  For more transformations, see the unit tests in the eigen_test.cc file.

                  -
                  -
                  -

                  Implementing Computation

                  -

                  While computing, the device interface is needed from the EigenTensors on the left hand side of the assignments. Note that the computation between EigenTensors only changes the data originally inthe Tensor and does not change all the shape information associated with the Tensor.

                  -
                  auto x = EigenVector<T>::Flatten(*input0);
                  -auto y = EigenVector<T>::Flatten(*input1);
                  -auto z = EigenVector<T>::Flatten(*output);
                  -auto place = context.GetEigenDevice<Place>();
                  -z.device(place) = x + y;
                  -
                  -
                  -

                  In this code segment, input0/input1/output can be Tensors of arbitrary dimension. We are calling Flatten from EigenVector, transforming a tensor of any dimension into a 1-dimensional EigenVector. After completing computation, input0/input1/output will retain the same shape information, and they can be resized using the Resize interface.

                  -

                  Because the Eigen Tensor module is under-documented, please refer to OpKernel‘s computation code in TensorFlow’s kernel module documentation.

                  -
                  -
                  - - -
                  -
                  -
                  - - -
                  - -
                  -

                  - © Copyright 2016, PaddlePaddle developers. - -

                  -
                  - Built with Sphinx using a theme provided by Read the Docs. - -
                  - -
                  -
                  - -
                  - -
                  - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/dev/write_docs_en.html b/develop/doc/dev/write_docs_en.html index f5ef67b519f6a66bb9f7eae539bb6b638001e0de..808f96d6f10200c0b85b577c3f28ffc2565e26fb 100644 --- a/develop/doc/dev/write_docs_en.html +++ b/develop/doc/dev/write_docs_en.html @@ -37,7 +37,7 @@ - + - - - - - - - - - -
                  - - - - -
                  - - - - - - -
                  -
                  - - - - - - -
                  - -
                  -
                  -
                  -
                  - -
                  -

                  Fluid Distributed Training

                  -
                  -

                  Introduction

                  -

                  In this article, we’ll explain how to configure and run distributed training jobs with PaddlePaddle Fluid in a bare metal cluster.

                  -
                  -
                  -

                  Preparations

                  -
                  -

                  Getting the cluster ready

                  -

                  Prepare the compute nodes in the cluster. Nodes in this cluster can be of any specification that runs PaddlePaddle, and with a unique IP address assigned to it. Make sure they can communicate to each other.

                  -
                  -
                  -

                  Have PaddlePaddle installed

                  -

                  PaddlePaddle must be installed on all nodes. If you have GPU cards on your nodes, be sure to properly install drivers and CUDA libraries.

                  -

                  PaddlePaddle build and installation guide can be found here.

                  -

                  In addition to above, the cmake command should be run with the option WITH_DISTRIBUTE set to on. An example bare minimum cmake command would look as follows:

                  -
                  cmake .. -DWITH_DOC=OFF -DWITH_GPU=OFF -DWITH_DISTRIBUTE=ON -DWITH_SWIG_PY=ON -DWITH_PYTHON=ON
                  -
                  -
                  -
                  -
                  -

                  Update the training script

                  -
                  -

                  Non-cluster training script

                  -

                  Let’s take Deep Learning 101‘s first chapter: “fit a line” as an example.

                  -

                  The non-cluster version of this demo with fluid API is as follows:

                  -
                  import paddle.v2 as paddle
                  -import paddle.fluid as fluid
                  -
                  -x = fluid.layers.data(name='x', shape=[13], dtype='float32')
                  -y_predict = fluid.layers.fc(input=x, size=1, act=None)
                  -y = fluid.layers.data(name='y', shape=[1], dtype='float32')
                  -
                  -cost = fluid.layers.square_error_cost(input=y_predict, label=y)
                  -avg_cost = fluid.layers.mean(x=cost)
                  -
                  -sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
                  -sgd_optimizer.minimize(avg_cost)
                  -
                  -BATCH_SIZE = 20
                  -
                  -train_reader = paddle.batch(
                  -    paddle.reader.shuffle(
                  -        paddle.dataset.uci_housing.train(), buf_size=500),
                  -    batch_size=BATCH_SIZE)
                  -
                  -place = fluid.CPUPlace()
                  -feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
                  -exe = fluid.Executor(place)
                  -
                  -exe.run(fluid.default_startup_program())
                  -
                  -PASS_NUM = 100
                  -for pass_id in range(PASS_NUM):
                  -    fluid.io.save_persistables(exe, "./fit_a_line.model/")
                  -    fluid.io.load_persistables(exe, "./fit_a_line.model/")
                  -    for data in train_reader():
                  -        avg_loss_value, = exe.run(fluid.default_main_program(),
                  -                                  feed=feeder.feed(data),
                  -                                  fetch_list=[avg_cost])
                  -
                  -        if avg_loss_value[0] < 10.0:
                  -            exit(0)  # if avg cost less than 10.0, we think our code is good.
                  -exit(1)
                  -
                  -
                  -

                  We created a simple fully-connected neural network training program and handed it to the fluid executor to run for 100 passes.

                  -

                  Now let’s try to convert it to a distributed version to run on a cluster.

                  -
                  -
                  -

                  Introducing parameter server

                  -

                  As we can see from the non-cluster version of training script, there is only one role in the script: the trainer, that performs the computing as well as holds the parameters. In cluster training, since multi-trainers are working on the same task, they need one centralized place to hold and distribute parameters. This centralized place is called the Parameter Server in PaddlePaddle.

                  -

                  parameter server architecture

                  -

                  Parameter Server in fluid not only holds the parameters but is also assigned with a part of the program. Trainers communicate with parameter servers via send/receive OPs. For more technical details, please refer to this document.

                  -

                  Now we need to create programs for both: trainers and parameter servers, the question is how?

                  -
                  -
                  -

                  Slice the program

                  -

                  Fluid provides a tool called “Distributed Transpiler” that automatically converts the non-cluster program into cluster program.

                  -

                  The idea behind this tool is to find the optimize OPs and gradient parameters, slice the program into 2 pieces and connect them with send/receive OP.

                  -

                  Optimize OPs and gradient parameters can be found from the return values of optimizer’s minimize function.

                  -

                  To put them together:

                  -
                  ... #define the program, cost, and create sgd optimizer
                  -
                  -optimize_ops, params_grads = sgd_optimizer.minimize(avg_cost) #get optimize OPs and gradient parameters
                  -
                  -t = fluid.DistributeTranspiler() # create the transpiler instance
                  -# slice the program into 2 pieces with optimizer_ops and gradient parameters list, as well as pserver_endpoints, which is a comma separated list of [IP:PORT] and number of trainers
                  -t.transpile(optimize_ops, params_grads, pservers=pserver_endpoints, trainers=2)
                  -
                  -... #create executor
                  -
                  -# in pserver, run this
                  -#current_endpoint here means current pserver IP:PORT you wish to run on
                  -pserver_prog = t.get_pserver_program(current_endpoint)
                  -pserver_startup = t.get_startup_program(current_endpoint, pserver_prog)
                  -exe.run(pserver_startup)
                  -exe.run(pserver_prog)
                  -
                  -# in trainer, run this
                  -... # define data reader
                  -exe.run(fluid.default_startup_program())
                  -for pass_id in range(100):
                  -    for data in train_reader():
                  -        exe.run(t.get_trainer_program())
                  -
                  -
                  -
                  -
                  -
                  -
                  -
                  -

                  E2E demo

                  -

                  Please find the complete demo from here. -First cd into the folder that contains the python files. In this case:

                  -
                  cd /paddle/python/paddle/fluid/tests/book_distribute
                  -
                  -
                  -

                  In parameter server node run the following in the command line:

                  -
                  PSERVERS=192.168.1.2:6174 SERVER_ENDPOINT=192.168.1.2:6174 TRAINING_ROLE=PSERVER python notest_dist_fit_a_line.py
                  -
                  -
                  -

                  please note we assume that your parameter server runs at 192.168.1.2:6174

                  -

                  Wait until the prompt Server listening on 192.168.1.2:6174

                  -

                  Then in 2 of your trainer nodes run this:

                  -
                  PSERVERS=192.168.1.2:6174 SERVER_ENDPOINT=192.168.1.2:6174 TRAINING_ROLE=TRAINER python notest_dist_fit_a_line.py
                  -
                  -
                  -

                  the reason you need to run this command twice in 2 nodes is because: in the script we set the trainer count to be 2. You can change this setting on line 50

                  -

                  Now you have 2 trainers and 1 parameter server up and running.

                  -
                  -
                  -
                  - - -
                  -
                  -
                  - - -
                  - -
                  -

                  - © Copyright 2016, PaddlePaddle developers. - -

                  -
                  - Built with Sphinx using a theme provided by Read the Docs. - -
                  - -
                  -
                  - -
                  - -
                  - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/howto/cluster/index_en.html b/develop/doc/howto/cluster/index_en.html index 6fae077549019a9ff41d8980e556c55355f44fc5..689ae77232d25369547c43720467fa41b4cce66e 100644 --- a/develop/doc/howto/cluster/index_en.html +++ b/develop/doc/howto/cluster/index_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || [];
              • +
              • C-API Prediction Library +
              • RNN Models
                • RNN Configuration
                • Recurrent Group Tutorial
                • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
                • Development
                • FAQ
                    diff --git a/develop/doc/howto/cluster/multi_cluster/fabric_en.html b/develop/doc/howto/cluster/multi_cluster/fabric_en.html index a333bb0df7d1f394114755cbe0c3128770271b9a..07faa05be662915bae3fa55a067452802f0c77cd 100644 --- a/develop/doc/howto/cluster/multi_cluster/fabric_en.html +++ b/develop/doc/howto/cluster/multi_cluster/fabric_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || [];
                • +
                • C-API Prediction Library +
                • RNN Models
                  • RNN Configuration
                  • Recurrent Group Tutorial
                  • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
                  • Development
                  • FAQ
                      diff --git a/develop/doc/howto/cluster/multi_cluster/index_en.html b/develop/doc/howto/cluster/multi_cluster/index_en.html index e77cc07f6ebd4985c5efc2e6131bf0d24e17d072..d1fd7ba835c3343ba6e6ce16db4c79015be566fb 100644 --- a/develop/doc/howto/cluster/multi_cluster/index_en.html +++ b/develop/doc/howto/cluster/multi_cluster/index_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || [];
                  • +
                  • C-API Prediction Library +
                  • RNN Models
                    • RNN Configuration
                    • Recurrent Group Tutorial
                    • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
                    • Development
                    • FAQ
                        diff --git a/develop/doc/howto/cluster/multi_cluster/k8s_aws_en.html b/develop/doc/howto/cluster/multi_cluster/k8s_aws_en.html index e4837cb010762d73ae84042728e914a647fb0dad..c5745f919e1fb9d2e0fca43e74240017ce0c1351 100644 --- a/develop/doc/howto/cluster/multi_cluster/k8s_aws_en.html +++ b/develop/doc/howto/cluster/multi_cluster/k8s_aws_en.html @@ -37,7 +37,7 @@ - + - - - - - - - - - -
                        - - - - -
                        - - - - - - -
                        -
                        - - - - - - -
                        - -
                        -
                        -
                        -
                        - -

                        This tutorial introduces techniques we use to profile and tune the -CPU performance of PaddlePaddle. We will use Python packages -cProfile and yep, and Google’s perftools.

                        -

                        Profiling is the process that reveals performance bottlenecks, -which could be very different from what’s in the developers’ mind. -Performance tuning is done to fix these bottlenecks. Performance optimization -repeats the steps of profiling and tuning alternatively.

                        -

                        PaddlePaddle users program AI applications by calling the Python API, which calls -into libpaddle.so. written in C++. In this tutorial, we focus on -the profiling and tuning of

                        -
                          -
                        1. the Python code and
                        2. -
                        3. the mixture of Python and C++ code.
                        4. -
                        -
                        -

                        Profiling the Python Code

                        -
                        -

                        Generate the Performance Profiling File

                        -

                        We can use Python standard -package, cProfile, -to generate Python profiling file. For example:

                        -
                        python -m cProfile -o profile.out main.py
                        -
                        -
                        -

                        where main.py is the program we are going to profile, -o specifies -the output file. Without -o, cProfile would outputs to standard -output.

                        -
                        -
                        -

                        Look into the Profiling File

                        -

                        cProfile generates profile.out after main.py completes. We can -use cprofilev to look into -the details:

                        -
                        cprofilev -a 0.0.0.0 -p 3214 -f profile.out main.py
                        -
                        -
                        -

                        where -a specifies the HTTP IP, -p specifies the port, -f -specifies the profiling file, and main.py is the source file.

                        -

                        Open the Web browser and points to the local IP and the specifies -port, we will see the output like the following:

                        -
                           ncalls  tottime  percall  cumtime  percall filename:lineno(function)
                        -        1    0.284    0.284   29.514   29.514 main.py:1(<module>)
                        -     4696    0.128    0.000   15.748    0.003 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/executor.py:20(run)
                        -     4696   12.040    0.003   12.040    0.003 {built-in method run}
                        -        1    0.144    0.144    6.534    6.534 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/__init__.py:14(<module>)
                        -
                        -
                        -

                        where each line corresponds to Python function, and the meaning of -each column is as follows:

                        -

                        | column | meaning | -| — | — | -| ncalls | the number of calls into a function | -| tottime | the total execution time of the function, not including the execution time of other functions called by the function | -| percall | tottime divided by ncalls | -| cumtime | the total execution time of the function, including the execution time of other functions being called | -| percall | cumtime divided by ncalls | -| filename:lineno(function) | where the function is defined |

                        -
                        -
                        -

                        Identify Performance Bottlenecks

                        -

                        Usually, tottime and the related percall time is what we want to -focus on. We can sort above profiling file by tottime:

                        -
                             4696   12.040    0.003   12.040    0.003 {built-in method run}
                        -   300005    0.874    0.000    1.681    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/dataset/mnist.py:38(reader)
                        -   107991    0.676    0.000    1.519    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:219(__init__)
                        -     4697    0.626    0.000    2.291    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp)
                        -        1    0.618    0.618    0.618    0.618 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/__init__.py:1(<module>)
                        -
                        -
                        -

                        We can see that the most time-consuming function is the built-in method run, which is a C++ function in libpaddle.so. We will -explain how to profile C++ code in the next section. At this -moment, let’s look into the third function sync_with_cpp, which is a -Python function. We can click it to understand more about it:

                        -
                        Called By:
                        -
                        -   Ordered by: internal time
                        -   List reduced from 4497 to 2 due to restriction <'sync_with_cpp'>
                        -
                        -Function                                                                                                 was called by...
                        -                                                                                                             ncalls  tottime  cumtime
                        -/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp)  <-    4697    0.626    2.291  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp)
                        -/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp)  <-    4696    0.019    2.316  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:487(clone)
                        -                                                                                                                  1    0.000    0.001  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:534(append_backward)
                        -
                        -
                        -Called:
                        -
                        -   Ordered by: internal time
                        -   List reduced from 4497 to 2 due to restriction <'sync_with_cpp'>
                        -
                        -
                        -

                        The lists of the callers of sync_with_cpp might help us understand -how to improve the function definition.

                        -
                        -
                        -
                        -

                        Profiling Python and C++ Code

                        -
                        -

                        Generate the Profiling File

                        -

                        To profile a mixture of Python and C++ code, we can use a Python -package, yep, that can work with Google’s perftools, which is a -commonly-used profiler for C/C++ code.

                        -

                        In Ubuntu systems, we can install yep and perftools by running the -following commands:

                        -
                        apt update
                        -apt install libgoogle-perftools-dev
                        -pip install yep
                        -
                        -
                        -

                        Then we can run the following command

                        -
                        python -m yep -v main.py
                        -
                        -
                        -

                        to generate the profiling file. The default filename is -main.py.prof.

                        -

                        Please be aware of the -v command line option, which prints the -analysis results after generating the profiling file. By examining the -the print result, we’d know that if we stripped debug -information from libpaddle.so at build time. The following hints -help make sure that the analysis results are readable:

                        -
                          -
                        1. Use GCC command line option -g when building libpaddle.so so to -include the debug information. The standard building system of -PaddlePaddle is CMake, so you might want to set -CMAKE_BUILD_TYPE=RelWithDebInfo.
                        2. -
                        3. Use GCC command line option -O2 or -O3 to generate optimized -binary code. It doesn’t make sense to profile libpaddle.so -without optimization, because it would anyway run slowly.
                        4. -
                        5. Profiling the single-threaded binary file before the -multi-threading version, because the latter often generates tangled -profiling analysis result. You might want to set environment -variable OMP_NUM_THREADS=1 to prevents OpenMP from automatically -starting multiple threads.
                        6. -
                        -
                        -
                        -

                        Examining the Profiling File

                        -

                        The tool we used to examine the profiling file generated by -perftools is pprof, which -provides a Web-based GUI like cprofilev.

                        -

                        We can rely on the standard Go toolchain to retrieve the source code -of pprof and build it:

                        -
                        go get github.com/google/pprof
                        -
                        -
                        -

                        Then we can use it to profile main.py.prof generated in the previous -section:

                        -
                        pprof -http=0.0.0.0:3213 `which python`  ./main.py.prof
                        -
                        -
                        -

                        Where -http specifies the IP and port of the HTTP service. -Directing our Web browser to the service, we would see something like -the following:

                        -

                        result

                        -
                        -
                        -

                        Identifying the Performance Bottlenecks

                        -

                        Similar to how we work with cprofilev, we’d focus on tottime and -cumtime.

                        -

                        kernel_perf

                        -

                        We can see that the execution time of multiplication and the computing -of the gradient of multiplication takes 2% to 4% of the total running -time, and MomentumOp takes about 17%. Obviously, we’d want to -optimize MomentumOp.

                        -

                        pprof would mark performance critical parts of the program in -red. It’s a good idea to follow the hints.

                        -
                        -
                        - - -
                        -
                        -
                        - - -
                        - -
                        -

                        - © Copyright 2016, PaddlePaddle developers. - -

                        -
                        - Built with Sphinx using a theme provided by Read the Docs. - -
                        - -
                        -
                        - -
                        - -
                        - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/howto/optimization/gpu_profiling_en.html b/develop/doc/howto/optimization/gpu_profiling_en.html index bbdfceba67c33a05154100a7504988fc09b6d453..cb9550bb59d5721c1452bc493b9fdc100702b186 100644 --- a/develop/doc/howto/optimization/gpu_profiling_en.html +++ b/develop/doc/howto/optimization/gpu_profiling_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || [];
                    • +
                    • C-API Prediction Library +
                    • RNN Models
                      • RNN Configuration
                      • Recurrent Group Tutorial
                      • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
                      • Development
                      • FAQ
                      • +
                      • C-API Prediction Library +
                      • RNN Models
                        • RNN Configuration
                        • Recurrent Group Tutorial
                        • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
                        • Development
                        • FAQ
                            diff --git a/develop/doc/howto/rnn/hrnn_rnn_api_compare_en.html b/develop/doc/howto/rnn/hrnn_rnn_api_compare_en.html index 2cccff840a98308c8b3530f0e690f0ba42dc6d18..b31ecbbaa714d3355801f759e6dc87f544f60e48 100644 --- a/develop/doc/howto/rnn/hrnn_rnn_api_compare_en.html +++ b/develop/doc/howto/rnn/hrnn_rnn_api_compare_en.html @@ -124,6 +124,12 @@ var _hmt = _hmt || [];
                        • +
                        • C-API Prediction Library +
                        • RNN Models
                          • RNN Configuration
                          • Recurrent Group Tutorial
                          • @@ -137,6 +143,7 @@ var _hmt = _hmt || [];
                          • Development
                          • FAQ
                              diff --git a/develop/doc/howto/rnn/index_en.html b/develop/doc/howto/rnn/index_en.html index 813bacef64a2bfd498d0d030d5f7e5c5d0af7545..2bc3dda8df713d430993d350dd0a9e9a5451af32 100644 --- a/develop/doc/howto/rnn/index_en.html +++ b/develop/doc/howto/rnn/index_en.html @@ -38,7 +38,7 @@ - + - - - - - - - - - -
                              - - - - -
                              - - - - - - -
                              -
                              - - - - - - -
                              - -
                              -
                              -
                              -
                              - -
                              -

                              Build PaddlePaddle for Android

                              -

                              There are two approaches to build PaddlePaddle for Android:

                              - -
                              -

                              Cross-Compiling Using Docker

                              -

                              Docker-based cross-compiling is the recommended approach because Docker runs on all major operating systems, including Linux, Mac OS X, and Windows.

                              -
                              -

                              Build the Docker Image

                              -

                              The following steps pack all the tools that we need to build PaddlePaddle into a Docker image.

                              -
                              $ git clone https://github.com/PaddlePaddle/Paddle.git
                              -$ cd Paddle
                              -$ docker build -t paddle:dev-android . -f Dockerfile.android
                              -
                              -
                              -

                              Users can directly use the published Docker image.

                              -
                              $ docker pull paddlepaddle/paddle:latest-dev-android
                              -
                              -
                              -

                              For users in China, we provide a faster mirror.

                              -
                              $ docker pull docker.paddlepaddlehub.com/paddle:latest-dev-android
                              -
                              -
                              -
                              -
                              -

                              Build the Inference Library

                              -

                              We can run the Docker image we just created to build the inference library of PaddlePaddle for Android using the command below:

                              -
                              $ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=armeabi-v7a" -e "ANDROID_API=21" paddle:dev-android
                              -
                              -
                              -

                              The Docker image accepts two arguments ANDROID_ABI and ANDROID_API:

                              - -- - - - - - - - - - - - - - - - - - - - - - - -
                              ArgumentOptional ValuesDefault
                              ANDROID_ABIarmeabi-v7a, arm64-v8aarmeabi-v7a
                              ANDROID_API>= 1621

                              The ARM-64 architecture (arm64-v8a) requires at least level 21 of Android API.

                              -

                              The default entry-point of the Docker image, paddle/scripts/docker/build_android.sh generates the Android cross-compiling standalone toolchain based on the argument: ANDROID_ABI or ANDROID_API. For information about other configuration arguments, please continue reading.

                              -

                              The above command generates and outputs the inference library in $PWD/install_android and puts third-party libraries in $PWD/install_android/third_party.

                              -
                              -
                              -
                              -

                              Cross-Compiling on Linux

                              -

                              The Linux-base approach to cross-compile is to run steps in Dockerfile.android manually on a Linux x64 computer.

                              -
                              -

                              Setup the Environment

                              -

                              To build for Android’s, we need Android NDK:

                              -
                              wget -q https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip
                              -unzip -q android-ndk-r14b-linux-x86_64.zip
                              -
                              -
                              -

                              Android NDK includes everything we need to build the standalone toolchain, which in then used to build PaddlePaddle for Android. (We plan to remove the intermediate stage of building the standalone toolchain in the near future.)

                              -
                                -
                              • To build the standalone toolchain for armeabi-v7a and Android API level 21:
                              • -
                              -
                              your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \
                              -        --arch=arm --platform=android-21 --install-dir=your/path/to/arm_standalone_toolchain
                              -
                              -
                              -

                              The generated standalone toolchain will be in your/path/to/arm_standalone_toolchain.

                              -
                                -
                              • To build the standalone toolchain for arm64-v8a and Android API level 21:
                              • -
                              -
                              your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \
                              -        --arch=arm64 --platform=android-21 --install-dir=your/path/to/arm64_standalone_toolchain
                              -
                              -
                              -

                              The generated standalone toolchain will be in your/path/to/arm64_standalone_toolchain.

                              -
                              -
                              -

                              Cross-Compiling Arguments

                              -

                              CMake supports choosing the toolchain. PaddlePaddle provides android.cmake, which configures the Android cross-compiling toolchain for CMake. android.cmake is not required for CMake >= 3.7, which support Android cross-compiling. PaddlePaddle detects the CMake version, for those newer than 3.7, it uses the official version.

                              -

                              Some other CMake arguments you need to know:

                              -
                                -
                              • CMAKE_SYSTEM_NAME must be Android. This tells PaddlePaddle’s CMake system to cross-compile third-party dependencies. This also changes some other CMake arguments like WITH_GPU=OFF, WITH_AVX=OFF, WITH_PYTHON=OFF, WITH_RDMA=OFF, WITH_MKL=OFF and WITH_GOLANG=OFF.
                              • -
                              • WITH_C_API must be ON, to build the C-based inference library for Android.
                              • -
                              • WITH_SWIG_PY must be OFF because the Android platform doesn’t support SWIG-based API.
                              • -
                              -

                              Some Android-specific arguments:

                              -
                                -
                              • ANDROID_STANDALONE_TOOLCHAIN: the absolute path of the Android standalone toolchain, or the path relative to the CMake build directory. PaddlePaddle’s CMake extensions would derive the cross-compiler, sysroot and Android API level from this argument.
                              • -
                              • ANDROID_TOOLCHAIN: could be gcc or clang. The default value is clang.
                                  -
                                • For CMake >= 3.7, it should anyway be clang. For older versions, it could be gcc.
                                • -
                                • Android’s official clang requires glibc >= 2.15.
                                • -
                                -
                              • -
                              • ANDROID_ABI: could be armeabi-v7a or arm64-v8a. The default value is armeabi-v7a.
                              • -
                              • ANDROID_NATIVE_API_LEVEL: could be derived from the value of ANDROID_STANDALONE_TOOLCHAIN.
                              • -
                              • ANROID_ARM_MODE:
                                  -
                                • could be ON or OFF, and defaults to ON, when ANDROID_ABI=armeabi-v7a;
                                • -
                                • no need to specify when ANDROID_ABI=arm64-v8a.
                                • -
                                -
                              • -
                              • ANDROID_ARM_NEON: indicates if to use NEON instructions.
                                  -
                                • could be ON or OFF, and defaults to ON, when ANDROID_ABI=armeabi-v7a;
                                • -
                                • no need to specify when ANDROID_ABI=arm64-v8a.
                                • -
                                -
                              • -
                              -

                              Other useful arguments:

                              -
                                -
                              • USE_EIGEN_FOR_BLAS: indicates if using Eigen. Could be ON or OFF, defaults to OFF.
                              • -
                              • HOST_C/CXX_COMPILER: specifies the host compiler, which is used to build the host-specific protoc and target-specific OpenBLAS. It defaults to the value of the environment variable CC/C++, or cc/c++.
                              • -
                              -

                              Some frequent configurations for your reference:

                              -
                              cmake -DCMAKE_SYSTEM_NAME=Android \
                              -      -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm_standalone_toolchain \
                              -      -DANDROID_ABI=armeabi-v7a \
                              -      -DANDROID_ARM_NEON=ON \
                              -      -DANDROID_ARM_MODE=ON \
                              -      -DUSE_EIGEN_FOR_BLAS=ON \
                              -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                              -      -DWITH_C_API=ON \
                              -      -DWITH_SWIG_PY=OFF \
                              -      ..
                              -
                              -
                              -
                              cmake -DCMAKE_SYSTEM_NAME=Android \
                              -      -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm64_standalone_toolchain \
                              -      -DANDROID_ABI=arm64-v8a \
                              -      -DUSE_EIGEN_FOR_BLAS=OFF \
                              -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                              -      -DWITH_C_API=ON \
                              -      -DWITH_SWIG_PY=OFF \
                              -      ..
                              -
                              -
                              -

                              There are some other arguments you might want to configure.

                              -
                                -
                              • CMAKE_BUILD_TYPE=MinSizeRel minimizes the size of library.
                              • -
                              • CMAKE_BUILD_TYPE-Release optimizes the runtime performance.
                              • -
                              -

                              Our own tip for performance optimization to use clang and Eigen or OpenBLAS:

                              -
                                -
                              • CMAKE_BUILD_TYPE=Release
                              • -
                              • ANDROID_TOOLCHAIN=clang
                              • -
                              • USE_EIGEN_BLAS=ON for armeabi-v7a, or USE_EIGEN_FOR_BLAS=OFF for arm64-v8a.
                              • -
                              -
                              -
                              -

                              Build and Install

                              -

                              After running cmake, we can run make; make install to build and install.

                              -

                              Before building, you might want to remove the third_party and build directories including pre-built libraries for other architectures.

                              -

                              After building,in the directory CMAKE_INSTALL_PREFIX, you will find three sub-directories:

                              -
                                -
                              • include: the header file of the inference library,
                              • -
                              • lib: the inference library built for various Android ABIs,
                              • -
                              • third_party: dependent third-party libraries built for Android.
                              • -
                              -
                              -
                              -
                              - - -
                              -
                              -
                              - - -
                              - -
                              -

                              - © Copyright 2016, PaddlePaddle developers. - -

                              -
                              - Built with Sphinx using a theme provided by Read the Docs. - -
                              - -
                              -
                              - -
                              - -
                              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/mobile/cross_compiling_for_ios_en.html b/develop/doc/mobile/cross_compiling_for_ios_en.html deleted file mode 100644 index 681e9d9784b67cd8c223c972e08ab61e3bc8d51e..0000000000000000000000000000000000000000 --- a/develop/doc/mobile/cross_compiling_for_ios_en.html +++ /dev/null @@ -1,369 +0,0 @@ - - - - - - - - - - - - - Build PaddlePaddle for iOS — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                              - - - - -
                              - - - - - - -
                              -
                              - - - - - - -
                              - -
                              -
                              -
                              -
                              - -
                              -

                              Build PaddlePaddle for iOS

                              -

                              This tutorial will walk you through cross compiling the PaddlePaddle library for iOS from the source in MacOS.

                              -
                              -

                              Preparation

                              -

                              Apple provides Xcode for cross-compiling and IDE for iOS development. Download from App store or here. To verify your installation, run command as follows

                              -
                              $ xcodebuild -version
                              -Xcode 9.0
                              -Build version 9A235
                              -
                              -
                              -
                              -
                              -

                              Cross-compiling configurations

                              -

                              PaddlePaddle provides cross-compiling toolchain configuration documentation cmake/cross_compiling/ios.cmake, which has some default settings for frequently used compilers.

                              -

                              There are some mandatory environment variables need to be set before cross compiling PaddlePaddle for iOS:

                              -
                                -
                              • CMAKE_SYSTEM_NAME, CMake compiling target platform name, has to be iOS. PaddlePaddle CMake will compile all the third party dependencies and enforce some parameters (WITH_C_API=ON, WITH_GPU=OFF, WITH_AVX=OFF, WITH_PYTHON=OFF,WITH_RDMA=OFF) when this variable is set with value iOS.
                              • -
                              • WITH_C_API, Whether to compile inference C-API library, has to be ON, since C-API is the only supported interface for inferencing in iOS.
                              • -
                              • WITH_SWIG_PY, has to be OFF. It’s not supported to inference or train via swig in iOS.
                              • -
                              -

                              Optional environment variables for iOS are:

                              -
                                -
                              • IOS_PLATFORM, either OS (default) or SIMULATOR.

                                -
                                  -
                                • OS, build targets ARM-based physical devices like iPhone or iPad.
                                • -
                                • SIMULATOR, build targets x86 architecture simulators.
                                • -
                                -
                              • -
                              • IOS_ARCH, target architecture. By default, all architecture types will be compiled. If you need to specify the architecture to compile for, please find valid values for different IOS_PLATFORM settings from the table below:

                                - - - - - - - - - - - - - - - - - - - - - -
                                IOS_PLATFORMIOS_ARCH
                                OSarmv7, armv7s, arm64
                                SIMULATORi386, x86_64
                              • -
                              • IOS_DEPLOYMENT_TARGET, minimum iOS version to deployment, 7.0 by default.

                                -
                              • -
                              • IOS_ENABLE_BITCODE, whether to enable Bitcode, values can be ON/OFF, ON by default.

                                -
                              • -
                              • IOS_USE_VECLIB_FOR_BLAS, whether to use vecLib framework for BLAS computing. values can be ON/OFF, OFF by default.

                                -
                              • -
                              • IOS_DEVELOPMENT_ROOT, the path to Developer directory, can be explicitly set with your /path/to/platform/Developer. If left blank, PaddlePaddle will automatically pick the Xcode corresponding platform‘s Developer directory based on your IOS_PLATFORM value.

                                -
                              • -
                              • IOS_SDK_ROOT, the path to SDK root, can be explicitly set with your /path/to/platform/Developer/SDKs/SDK. if left black, PaddlePaddle will pick the latest SDK in the directory of IOS_DEVELOPMENT_ROOT.

                                -
                              • -
                              -

                              other settings:

                              -
                                -
                              • USE_EIGEN_FOR_BLAS, whether to use Eigen for matrix computing. effective when IOS_USE_VECLIB_FOR_BLAS=OFF. Values can be ON/OFF, OFF by default.
                              • -
                              • HOST_C/CXX_COMPILER, host C/C++ compiler. Uses value from environment variable CC/CXX by default or cc/c++ if CC/CXX doesn’t exist.
                              • -
                              -

                              some typical cmake configurations:

                              -
                              cmake -DCMAKE_SYSTEM_NAME=iOS \
                              -      -DIOS_PLATFORM=OS \
                              -      -DIOS_ARCH="armv7;arm64" \
                              -      -DIOS_ENABLE_BITCODE=ON \
                              -      -DIOS_USE_VECLIB_FOR_BLAS=ON \
                              -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                              -      -DWITH_C_API=ON \
                              -      -DWITH_TESTING=OFF \
                              -      -DWITH_SWIG_PY=OFF \
                              -      ..
                              -
                              -
                              -
                              cmake -DCMAKE_SYSTEM_NAME=iOS \
                              -      -DIOS_PLATFORM=SIMULATOR \
                              -      -DIOS_ARCH="x86_64" \
                              -      -DIOS_USE_VECLIB_FOR_BLAS=ON \
                              -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                              -      -DWITH_C_API=ON \
                              -      -DWITH_TESTING=OFF \
                              -      -DWITH_SWIG_PY=OFF \
                              -      ..
                              -
                              -
                              -

                              You can set other compiling parameters for your own need. I.E. if you are trying to minimize the library size, set CMAKE_BUILD_TYPE with MinSizeRel; or if the performance is your concern, set CMAKE_BUILD_TYPE with Release. You can even manipulate the PaddlePaddle compiling procedure by manually set CMAKE_C/CXX_FLAGS values.

                              -

                              TIPS for a better performance:

                              -
                                -
                              • set CMAKE_BUILD_TYPE with Release
                              • -
                              • set IOS_USE_VECLIB_FOR_BLAS with ON
                              • -
                              -
                              -
                              -

                              Build and install

                              -

                              After CMake, run following commands, PaddlePaddle will download the compile 3rd party dependencies, compile and install PaddlePaddle inference library.

                              -
                              $ make
                              -$ make install
                              -
                              -
                              -

                              Please Note: if you compiled PaddlePaddle in the source directory for other platforms, do remove third_party and build directory within the source with rm -rf to ensure that all the 3rd party libraries dependencies and PaddlePaddle is newly compiled with current CMake configuration.

                              -

                              your/path/to/install directory will have following directories after make install:

                              -
                                -
                              • include, contains all the C-API header files.
                              • -
                              • lib, contains PaddlePaddle C-API static library.
                              • -
                              • third_party contains all the 3rd party libraries.
                              • -
                              -

                              Please note: if PaddlePaddle library need to support both physical devices and simulators, you will need to compile correspondingly, then merge fat library with lipo.

                              -

                              Now you will have PaddlePaddle library compiled and installed, the fat library can be used in deep learning related iOS APPs. Please refer to C-API documentation for usage guides.

                              -
                              -
                              - - -
                              -
                              -
                              - - -
                              - -
                              -

                              - © Copyright 2016, PaddlePaddle developers. - -

                              -
                              - Built with Sphinx using a theme provided by Read the Docs. - -
                              - -
                              -
                              - -
                              - -
                              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/mobile/cross_compiling_for_raspberry_en.html b/develop/doc/mobile/cross_compiling_for_raspberry_en.html deleted file mode 100644 index 1caf888d4ef26aea33c515cc59ea6bdde73e3dda..0000000000000000000000000000000000000000 --- a/develop/doc/mobile/cross_compiling_for_raspberry_en.html +++ /dev/null @@ -1,303 +0,0 @@ - - - - - - - - - - - - - Build PaddlePaddle for Raspberry Pi — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                              - - - - -
                              - - - - - - -
                              -
                              - - - - - - -
                              - -
                              -
                              -
                              -
                              - -
                              -

                              Build PaddlePaddle for Raspberry Pi

                              -

                              You may use any of the following two approaches to build the inference library of PaddlePaddle for Raspberry Pi:

                              -
                                -
                              1. Build using SSH: Log in to a Raspberry Pi using SSH and build the library. The required development tools and third-party dependencies are listed in here: /Dockerfile.
                              2. -
                              3. Cross-compile: We talk about how to cross-compile PaddlePaddle for Raspberry Pi on a Linux/x64 machine, in more detail in this article.
                              4. -
                              -
                              -

                              The Cross-Compiling Toolchain

                              -

                              Step 1. Clone the Github repo by running the following command.

                              -
                              git clone https://github.com/raspberrypi/tools.git
                              -
                              -
                              -

                              Step 2. Use the pre-built cross-compiler found in ./tools/tree/master/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64. To run it on a Linux computer, glibc version >= 2.14 is needed.

                              -
                              -
                              -

                              CMake Arguments

                              -

                              CMake supports cross-compiling. All CMake configuration arguments required for the cross-compilation for Raspberry Pi can be found in cmake/cross_compiling/raspberry_pi.cmake.

                              -

                              Some important arguments that need to be set:

                              -
                                -
                              • CMAKE_SYSTEM_NAME: The target platform. Must be RPi.
                              • -
                              • RPI_TOOLCHAIN: The absolute path of the cross-compiling toolchain.
                              • -
                              • RPI_ARM_NEON: Use ARM NEON Intrinsics. This is a required argument and set default to ON.
                              • -
                              • HOST_C/CXX_COMPILER: The C/C++ compiler for the host. It is used to build building tools running on the host, for example, protoc.
                              • -
                              -

                              A commonly-used CMake configuration is as follows:

                              -
                              cmake -DCMAKE_SYSTEM_NAME=RPi \
                              -      -DRPI_TOOLCHAIN=your/path/to/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64 \
                              -      -DRPI_ARM_NEON=ON \
                              -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                              -      -DWITH_GPU=OFF \
                              -      -DWITH_C_API=ON \
                              -      -DWITH_PYTHON=OFF \
                              -      -DWITH_SWIG_PY=OFF \
                              -      ..
                              -
                              -
                              -

                              To build the inference library, please set the argument WITH_C_API to ON: WITH_C_API=ON.

                              -

                              You can add more arguments. For example, to minimize the size of the generated inference library, you may use CMAKE_BUILD_TYPE=MinSizeRel. For performance optimization, you may use CMAKE_BUILD_TYPE=Release.

                              -
                              -
                              -

                              Build and Install

                              -

                              The following commands build the inference library of PaddlePaddle for Raspberry Pi and third-party dependencies.

                              -
                              make
                              -make install
                              -
                              -
                              -

                              The intermediate files will be stored in build. Third-party libraries will be located in build/third_party. If you have already built it for other platforms like Android or iOS, you may want to clear these directories by running the command: rm -rf build.

                              -

                              The infernece library will be in your/path/to/install/lib, with related header files in your/path/to/install/include.

                              -
                              -
                              - - -
                              -
                              -
                              - - -
                              - -
                              -

                              - © Copyright 2016, PaddlePaddle developers. - -

                              -
                              - Built with Sphinx using a theme provided by Read the Docs. - -
                              - -
                              -
                              - -
                              - -
                              - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc/objects.inv b/develop/doc/objects.inv index 0cf27ce29007a722b96735af9993999d6e9b1707..5d9d5336ec8a1178679138405df28be79e127d0b 100644 Binary files a/develop/doc/objects.inv and b/develop/doc/objects.inv differ diff --git a/develop/doc/search.html b/develop/doc/search.html index 035c88f22e429f37b02d306ae64a82b8da1ff560..06b2c6b33db78c9636dba6076f50c1358156593b 100644 --- a/develop/doc/search.html +++ b/develop/doc/search.html @@ -121,6 +121,12 @@ var _hmt = _hmt || [];
                          • +
                          • C-API Prediction Library +
                          • RNN Models
                            • RNN Configuration
                            • Recurrent Group Tutorial
                            • @@ -134,6 +140,7 @@ var _hmt = _hmt || [];
                            • Development
                            • FAQ
                                diff --git a/develop/doc/searchindex.js b/develop/doc/searchindex.js index 090b33e01709c9bd0fdfd5823c188e2c9734e47d..679fd3ca98fdc30876d3e58f339d7a0c1002a905 100644 --- a/develop/doc/searchindex.js +++ b/develop/doc/searchindex.js @@ -1 +1 @@ -Search.setIndex({docnames:["build_and_install/build_from_source_en","build_and_install/docker_install_en","build_and_install/index_en","build_and_install/pip_install_en","design/api","design/auto_gradient_check","design/backward","design/block","design/build_system/README","design/cluster_train/README","design/cluster_train/checkpointing","design/cluster_train/data_dispatch","design/cluster_train/large_model_dist_train","design/cluster_train/master_server","design/cluster_train/pserver_client","design/cluster_train/remote_parameter_updater","design/cluster_train/save_model","design/cluster_train/submit-job","design/concurrent_programming","design/cpp_data_feeding","design/csp","design/dist_refactor/distributed_architecture","design/dist_refactor/multi_cpu","design/dist_refactor/parameter_server","design/error_clip","design/evaluator","design/executor","design/file_manager/README","design/file_manager/pfs/pfsclient","design/float16","design/fluid","design/fluid_compiler","design/functions_operators_layers","design/gan_api","design/graph","design/graph_survey","design/if_else_op","design/infer_var_type","design/kernel_hint_design","design/kernel_selection","design/memory_optimization","design/mkl/mkl_packed","design/mkl/mkldnn","design/mkl/mkldnn_fluid","design/model_format","design/multi_language_interface/00.why_plain_c","design/multi_language_interface/01.inference_implementation","design/operator_kernel_type","design/ops/rnn","design/ops/sequence_decoder","design/optimizer","design/paddle_nccl","design/parallel_do","design/parameter_average","design/parameters_in_cpp","design/profiler","design/program","design/prune","design/python_api","design/reader/README","design/refactorization","design/register_grad_op","design/regularization","design/releasing_process","design/scope","design/selected_rows","design/simple_op_design","design/speech/deep_speech_2","design/support_new_device","design/switch","design/tensor_array","design/var_desc","dev/contribute_to_paddle_en","dev/index_en","dev/new_layer_en","dev/new_op_en","dev/new_op_kernel_en","dev/use_eigen_en","dev/write_docs_en","faq/build_and_install/index_en","faq/cluster/index_en","faq/index_en","faq/local/index_en","faq/model/index_en","faq/parameter/index_en","getstarted/concepts/use_concepts_en","getstarted/index_en","getstarted/quickstart_en","howto/capi/compile_paddle_lib_en","howto/capi/index_en","howto/capi/organization_of_the_inputs_en","howto/capi/workflow_of_capi_en","howto/cluster/cmd_argument_en","howto/cluster/fluid_cluster_train_en","howto/cluster/index_en","howto/cluster/multi_cluster/fabric_en","howto/cluster/multi_cluster/index_en","howto/cluster/multi_cluster/k8s_aws_en","howto/cluster/multi_cluster/k8s_distributed_en","howto/cluster/multi_cluster/k8s_en","howto/cluster/multi_cluster/openmpi_en","howto/cluster/multi_cluster/src/k8s_data/README","howto/cluster/multi_cluster/src/k8s_train/README","howto/cluster/preparations_en","howto/cmd_parameter/arguments_en","howto/cmd_parameter/detail_introduction_en","howto/cmd_parameter/index_en","howto/cmd_parameter/use_case_en","howto/index_en","howto/optimization/cpu_profiling_en","howto/optimization/gpu_profiling_en","howto/read_source","howto/rnn/hierarchical_layer_en","howto/rnn/hrnn_rnn_api_compare_en","howto/rnn/index_en","howto/rnn/recurrent_group_en","howto/rnn/rnn_config_en","index_en","mobile/cross_compiling_for_android_en","mobile/cross_compiling_for_ios_en","mobile/cross_compiling_for_raspberry_en","survey/cluster_bootstrapping_tools"],envversion:50,filenames:["build_and_install/build_from_source_en.rst","build_and_install/docker_install_en.rst","build_and_install/index_en.rst","build_and_install/pip_install_en.rst","design/api.md","design/auto_gradient_check.md","design/backward.md","design/block.md","design/build_system/README.md","design/cluster_train/README.md","design/cluster_train/checkpointing.md","design/cluster_train/data_dispatch.md","design/cluster_train/large_model_dist_train.md","design/cluster_train/master_server.md","design/cluster_train/pserver_client.md","design/cluster_train/remote_parameter_updater.md","design/cluster_train/save_model.md","design/cluster_train/submit-job.md","design/concurrent_programming.md","design/cpp_data_feeding.md","design/csp.md","design/dist_refactor/distributed_architecture.md","design/dist_refactor/multi_cpu.md","design/dist_refactor/parameter_server.md","design/error_clip.md","design/evaluator.md","design/executor.md","design/file_manager/README.md","design/file_manager/pfs/pfsclient.md","design/float16.md","design/fluid.md","design/fluid_compiler.md","design/functions_operators_layers.md","design/gan_api.md","design/graph.md","design/graph_survey.md","design/if_else_op.md","design/infer_var_type.md","design/kernel_hint_design.md","design/kernel_selection.md","design/memory_optimization.md","design/mkl/mkl_packed.md","design/mkl/mkldnn.md","design/mkl/mkldnn_fluid.md","design/model_format.md","design/multi_language_interface/00.why_plain_c.md","design/multi_language_interface/01.inference_implementation.md","design/operator_kernel_type.md","design/ops/rnn.md","design/ops/sequence_decoder.md","design/optimizer.md","design/paddle_nccl.md","design/parallel_do.md","design/parameter_average.md","design/parameters_in_cpp.md","design/profiler.md","design/program.md","design/prune.md","design/python_api.md","design/reader/README.md","design/refactorization.md","design/register_grad_op.md","design/regularization.md","design/releasing_process.md","design/scope.md","design/selected_rows.md","design/simple_op_design.md","design/speech/deep_speech_2.md","design/support_new_device.md","design/switch.md","design/tensor_array.md","design/var_desc.md","dev/contribute_to_paddle_en.md","dev/index_en.rst","dev/new_layer_en.rst","dev/new_op_en.md","dev/new_op_kernel_en.md","dev/use_eigen_en.md","dev/write_docs_en.rst","faq/build_and_install/index_en.rst","faq/cluster/index_en.rst","faq/index_en.rst","faq/local/index_en.rst","faq/model/index_en.rst","faq/parameter/index_en.rst","getstarted/concepts/use_concepts_en.rst","getstarted/index_en.rst","getstarted/quickstart_en.rst","howto/capi/compile_paddle_lib_en.md","howto/capi/index_en.rst","howto/capi/organization_of_the_inputs_en.md","howto/capi/workflow_of_capi_en.md","howto/cluster/cmd_argument_en.md","howto/cluster/fluid_cluster_train_en.md","howto/cluster/index_en.rst","howto/cluster/multi_cluster/fabric_en.md","howto/cluster/multi_cluster/index_en.rst","howto/cluster/multi_cluster/k8s_aws_en.md","howto/cluster/multi_cluster/k8s_distributed_en.md","howto/cluster/multi_cluster/k8s_en.md","howto/cluster/multi_cluster/openmpi_en.md","howto/cluster/multi_cluster/src/k8s_data/README.md","howto/cluster/multi_cluster/src/k8s_train/README.md","howto/cluster/preparations_en.md","howto/cmd_parameter/arguments_en.md","howto/cmd_parameter/detail_introduction_en.md","howto/cmd_parameter/index_en.rst","howto/cmd_parameter/use_case_en.md","howto/index_en.rst","howto/optimization/cpu_profiling_en.md","howto/optimization/gpu_profiling_en.rst","howto/read_source.md","howto/rnn/hierarchical_layer_en.rst","howto/rnn/hrnn_rnn_api_compare_en.rst","howto/rnn/index_en.rst","howto/rnn/recurrent_group_en.md","howto/rnn/rnn_config_en.rst","index_en.rst","mobile/cross_compiling_for_android_en.md","mobile/cross_compiling_for_ios_en.md","mobile/cross_compiling_for_raspberry_en.md","survey/cluster_bootstrapping_tools.md"],objects:{},objnames:{},objtypes:{},terms:{"00m":110,"03m":110,"0424m":110,"055ee37d":97,"0630u":110,"06u":110,"0810u":110,"0957m":110,"0_cudnn5":0,"0_cudnn5_avx_mkl":[1,3],"0_cudnn7_avx_mkl":3,"0rc":103,"0rc1":63,"0rc2":63,"0x10f256d50":35,"0x7ffe4de00110":35,"100gb":110,"100gi":97,"10g":17,"10m":110,"1150u":110,"11\u5b9e\u73b0\u4e86c":46,"11e6":99,"124n":110,"12gb":40,"13m":99,"1490u":110,"1550u":110,"16u":110,"173n":110,"1770u":110,"18ad":97,"18e457ce3d362ff5f3febf8e7f85ffec852f70f3b629add10aed84f930a68750":99,"197u":110,"1gb":110,"210u":110,"215n":110,"228u":110,"2520u":110,"2680u":110,"279n":110,"27m":110,"285m":110,"2863m":110,"28m":110,"2977m":110,"2cbf7385":97,"302n":110,"30u":110,"328n":110,"32u":110,"331n":110,"3320u":110,"365e":97,"36u":110,"3710m":110,"3768m":110,"387u":110,"38u":110,"3920u":110,"39u":110,"3rd":119,"4035m":110,"4090u":110,"4096mb":105,"4279m":110,"43u":110,"448a5b355b84":99,"4560u":110,"4563m":110,"45u":110,"4650u":110,"4726m":110,"473m":99,"4gb":105,"50bd":97,"50gi":97,"514u":110,"525n":110,"526u":110,"536u":110,"5460u":110,"5470u":110,"54u":110,"5690m":110,"573u":110,"578n":110,"5798m":110,"586u":110,"58s":99,"5969m":110,"5_cudnn5_avx_mkl":3,"5_cudnn5_avx_openbla":[3,87],"6080u":110,"6140u":110,"6305m":110,"639u":110,"64m":44,"655u":110,"6780u":110,"6810u":110,"682u":110,"6970u":110,"6ce9":97,"704u":110,"7090u":110,"72u":110,"73u":110,"75u":110,"760u":110,"767u":110,"783n":110,"784u":110,"78m":110,"7kb":99,"8250u":110,"8300u":110,"830n":110,"849m":110,"861u":110,"8661m":110,"892m":110,"901n":110,"90u":110,"918u":110,"9247m":110,"924n":110,"9261m":110,"9330m":110,"94u":110,"9530m":110,"983m":110,"988u":110,"997u":110,"99u":110,"9a235":119,"9f18":99,"\u4e00\u4e2a\u5178\u578b\u7684chunk\u5982\u4e0b\u6240\u793a":27,"\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u7684\u6a21\u578b\u7531\u5927\u91cf\u7684\u53c2\u6570\u7ec4\u6210":10,"\u4e00\u4e2achunk\u7531\u6240\u5728\u7684\u6587\u4ef6\u504f\u79fb":27,"\u4e00\u4e2aposix\u517c\u5bb9\u7684\u6587\u4ef6\u7cfb\u7edf":27,"\u4e00\u822c\u4e0d\u5141\u8bb8\u518d\u4ece":63,"\u4e00\u822c\u7531mkl":42,"\u4e0a\u4f20\u5230cloud\u6216\u8005\u4e0b\u8f7d\u5230\u672c\u5730\u7684\u65f6\u95f4\u53ef\u80fd\u6bd4\u8f83\u957f":27,"\u4e0a\u4f20\u65b9\u6cd5":63,"\u4e0a\u6ce8\u518c\u4e00\u4e0b":27,"\u4e0a\u8ff0paddlepaddl":63,"\u4e0b\u4e00\u4e2awheel\u5305\u9700\u8981\u66f4\u65b0\u7248\u672c\u53f7\u624d\u53ef\u4ee5\u4e0a\u4f20":63,"\u4e0b\u5b58\u653e\u516c\u5171\u6570\u636e\u96c6\u5408":11,"\u4e0b\u62c9\u6846\u4e2d\u627e\u5230\u751f\u6210\u76843\u4e2a\u4e8c\u8fdb\u5236\u6587\u4ef6":63,"\u4e0b\u8f7d":27,"\u4e0b\u8f7d\u5230\u672c\u5730":27,"\u4e0b\u8f7d\u5f97\u5230":63,"\u4e0b\u9762\u5206\u522b\u4ecb\u7ecd\u67d0\u4e00\u7c7b\u6587\u4ef6\u7684\u5b9e\u73b0\u65b9\u5f0f":46,"\u4e0d\u4e00\u81f4\u7684\u7531pfsclient\u4e0b\u8f7d\u6216\u8005\u4f20\u8f93chunk\u5b8c\u6210":27,"\u4e0d\u4f7f\u7528\u9759\u6001\u5e93":45,"\u4e0d\u4f7f\u7528c":45,"\u4e0d\u4f7f\u7528swig":45,"\u4e0d\u53ef\u4ee5\u66f4\u6539":63,"\u4e0d\u540c":42,"\u4e0d\u540c\u7248\u672c\u7684\u7f16\u8bd1\u5668\u4e4b\u95f4":45,"\u4e0d\u540c\u8bed\u8a00\u7684\u63a5\u53e3\u9002\u5e94\u4e0d\u540c\u8bed\u8a00\u7684\u7279\u6027":45,"\u4e0d\u5728":46,"\u4e0d\u5bb9\u6613\u51fa\u9519":27,"\u4e0d\u5d4c\u5165\u5176\u4ed6\u8bed\u8a00\u89e3\u91ca\u5668":45,"\u4e0d\u5d4c\u5165python\u89e3\u91ca\u5668":45,"\u4e0d\u663e\u793a\u7684\u5199\u6bcf\u4e2a\u7c7b\u5177\u4f53\u5305\u542b\u4ec0\u4e48":45,"\u4e0d\u7528mount\u7684\u65b9\u5f0f\u6765\u8bbf\u95ee\u6570\u636e":11,"\u4e0e":42,"\u4e0e\u4e4b\u76f8\u5bf9\u7684\u662flocal":27,"\u4e0e\u5176\u4ed6\u7b2c\u4e09\u65b9\u5e93\u4e00\u6837":42,"\u4e0e\u529f\u80fd\u5206\u652f\u4e0d\u540c\u7684\u662f":63,"\u4e0e\u53ef\u80fd\u6709\u7684":63,"\u4e0ebatch":41,"\u4e14\u589e\u52a0\u4e00\u4e2a\u7b2c\u4e09\u65b9\u8bed\u8a00":45,"\u4e14\u8c03\u7528\u65f6\u4e0d\u80fd\u629b\u51fa\u5f02\u5e38\u6216\u51fa\u73b0\u8fd0\u884c\u65f6\u9519\u8bef":46,"\u4e14c99\u652f\u6301bool\u7c7b\u578b\u548c\u5b9a\u957f\u6574\u6570":45,"\u4e14c99\u76f8\u5bf9\u4e8ec11\u4f7f\u7528\u66f4\u52a0\u5e7f\u6cdb":45,"\u4e25\u683c\u7684\u547d\u540d\u89c4\u8303pep":63,"\u4e2a\u6027\u5316\u63a8\u8350":63,"\u4e2d":[41,42,45,46],"\u4e2d\u4f1a\u63d0\u4f9b\u4e00\u4e9b\u5fc5\u8981\u7684\u63a5\u53e3\u548c\u51fd\u6570":42,"\u4e2d\u5199\u5165json\u5185\u5bb9":10,"\u4e2d\u5b8c\u5168\u4e00\u81f4":45,"\u4e2d\u5b9e\u73b0\u4e86\u4e00\u4e2amerge\u7684\u65b9\u6cd5":42,"\u4e2d\u5b9e\u73b0\u7684\u7ed3\u6784\u4f53":46,"\u4e2d\u5bf9\u5e94\u7684layer\u5904":41,"\u4e2d\u5f15\u5165\u7684":41,"\u4e2d\u63d0\u4f9b\u4e00\u4e2a\u4e0emkl\u6709\u5173\u7684\u603b\u5f00\u5173":42,"\u4e2d\u6839\u636e":41,"\u4e2d\u6dfb\u52a0":41,"\u4e2d\u6dfb\u52a0\u4e00\u4e2a":42,"\u4e2d\u7684\u7248\u672c\u4fe1\u606f":63,"\u4e2d\u8fd0\u884c\u4efb\u52a1\u7684\u89d2\u5ea6":11,"\u4e3a":[41,42],"\u4e3a\u4e86\u5c3d\u53ef\u80fd\u5c11\u7684\u5728\u7236\u7c7blayer\u4e2d\u6dfb\u52a0\u53d8\u91cf\u6216\u8005\u51fd\u6570":42,"\u4e3a\u4e86\u5e94\u5bf9\u4ee5\u4e0a\u7684\u95ee\u9898":27,"\u4e3a\u4e86\u66b4\u9732\u7684\u63a5\u53e3\u5c3d\u91cf\u7b80\u5355":46,"\u4e3a\u4e86\u66f4\u597d\u7684\u7b26\u5408paddlepaddle\u7684\u4ee3\u7801\u98ce\u683c":42,"\u4e3a\u4e86\u6700\u5927\u7a0b\u5ea6\u51cf\u5c11\u591a\u6b21\u8c03\u7528":41,"\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347paddlepaddle\u5728\u57fa\u672c\u6570\u5b66\u8fd0\u7b97\u7684\u8ba1\u7b97\u901f\u5ea6":42,"\u4e3b\u8981\u529f\u80fd\u5305\u62ec":27,"\u4e3b\u8981\u5305\u62ec":42,"\u4e3b\u8981\u5305\u62ec\u4e86\u6df1\u5ea6\u5b66\u4e60\u76f8\u5173\u7684\u6570\u5b66\u539f\u8bed\u4e0e\u64cd\u4f5c":42,"\u4e3b\u8981\u9488\u5bf9paddlepaddle\u5728\u91cd\u6784\u4e4b\u524d\u7684\u4ee3\u7801\u6846\u67b6\u4ee5\u53cav1\u7684api":42,"\u4e4b\u5916\u7684\u6240\u6709\u5934\u6587\u4ef6":46,"\u4e5f\u4e0d\u4f7f\u7528\u5176\u4ed6\u52a8\u6001\u5e93":45,"\u4e5f\u4e0d\u5e94\u8be5\u62a5\u9519":46,"\u4e5f\u4e0d\u751f\u6210":46,"\u4e5f\u53ef\u4ee5\u4f7f\u7528\u8fd9\u4e9b\u955c\u50cf":63,"\u4e5f\u5c31\u662f\u8bf4\u8f93\u51fa\u7684\u7ed3\u679c\u4e0d\u4f1a\u5728\u539f\u6765\u7684\u6570\u636e\u4e0a\u7d2f\u52a0":42,"\u4e66\u5199":45,"\u4eba\u8138\u8bc6\u522b":11,"\u4ec5\u4ec5\u4f7f\u7528":45,"\u4ec5\u4f1a\u5728\u652f\u6301avx2\u6307\u4ee4\u96c6\u53ca\u4ee5\u4e0a\u7684\u673a\u5668\u624d\u4f7f\u7528mkl":42,"\u4ece":63,"\u4ece\u78c1\u76d8\u6587\u4ef6\u4e2d\u52a0\u8f7duuid\u6587\u4ef6\u540d\u7684\u68c0\u67e5\u70b9\u5feb\u7167\u6587\u4ef6":10,"\u4ece\u800c\u907f\u514d\u4e86packing\u5197\u4f59":41,"\u4eceetcd\u4e2d\u8bfb\u53d6\u8282\u70b9":10,"\u4ed6\u4e3b\u8981\u5305\u542b\u4e86\u5b9e\u9645\u66b4\u9732\u7684\u7c7b\u578b\u7ed3\u6784":46,"\u4ed6\u662f\u5c06":46,"\u4ed6\u7684\u76ee\u6807\u662f\u4f7f\u7528c":45,"\u4ee3\u7801\u751f\u6210\u7684\u7b26\u53f7\u53ef\u80fd\u4e0d\u4e00\u81f4":45,"\u4ee3\u8868\u8fd9\u4e2alayer\u662f\u7528\u4e8e\u8dd1\u5728mkl":42,"\u4ee3\u8868\u8fd9\u4e2ashard\u7684\u6700\u5927index":11,"\u4ee3\u8868shard\u7684index":11,"\u4ee5\u4e0a\u4ee3\u7801\u7684reader\u8f93\u51fa\u7684data":11,"\u4ee5\u4e0a\u547d\u4ee4\u4f1a\u5728\u5f53\u524d\u76ee\u5f55\u4e0b\u751f\u6210100\u4e2a\u6587\u4ef6":11,"\u4ee5\u4e0b":11,"\u4ee5\u4e0b\u7b80\u79f0rnn":41,"\u4ee5\u4fbf\u6211\u4eec\u53ef\u4ee5\u628a\u66f4\u591a\u7684\u7cbe\u529b\u653e\u5230\u903b\u8f91\u672c\u8eab\u4e0a":27,"\u4ee5\u53ca":41,"\u4ee5\u53canumpi":11,"\u4ee5\u6b64\u8fbe\u5230\u6700\u597d\u7684\u6027\u80fd":42,"\u4ee5\u793a\u533a\u5206":[41,42],"\u4efb\u610f\u65f6\u523b\u53ea\u53ef\u80fd\u540c\u65f6\u6709\u4e00\u53f0\u670d\u52a1\u5668\u6545\u969c":10,"\u4f18\u5316\u524d":41,"\u4f18\u5316\u540e":41,"\u4f1a\u4ee5":[41,42],"\u4f1a\u4f7f\u7528\u76f8\u540c\u7684\u539f\u6570\u636e":41,"\u4f1a\u5148\u4e34\u65f6\u4fdd\u5b58\u5728":42,"\u4f1a\u5728":42,"\u4f1a\u5728\u7f16\u8bd1paddlepaddle\u7684\u65f6\u5019\u4e0b\u8f7d\u5e76\u7f16\u8bd1mkl":42,"\u4f1a\u5bfc\u81f4\u4e0d\u540c\u7248\u672cpython\u5728\u4e00\u4e2a\u8fdb\u7a0b\u91cc\u7684bug":45,"\u4f1a\u5f15\u5165":42,"\u4f1a\u628acpu\u7684buffer\u5bf9\u9f50\u4e3a4096":42,"\u4f1a\u6dfb\u52a0\u76f8\u5e94\u7684\u811a\u672c\u5728":42,"\u4f1a\u6dfb\u52a0\u76f8\u5e94\u7684\u811a\u672c\u7528\u4e8e\u6d4b\u8bd5\u548c\u5bf9\u6bd4\u5728\u4f7f\u7528mkl":41,"\u4f1a\u76f4\u63a5\u62a5\u9519\u9000\u51fa":45,"\u4f1a\u81ea\u52a8\u4f7f\u7528mklml\u5e93\u4f5c\u4e3apaddlepaddle\u7684cblas\u548clapack\u5e93":42,"\u4f1a\u81ea\u52a8\u6839\u636e\u786c\u4ef6\u914d\u7f6e":42,"\u4f1a\u88abpickle\u5e8f\u5217\u5316\u6210\u5b57\u7b26\u4e32":11,"\u4f20\u5165":11,"\u4f46":46,"\u4f46\u4e0d\u66b4\u9732":46,"\u4f46\u5e76\u6ca1\u6709\u7ecf\u8fc7\u56de\u5f52\u6d4b\u8bd5":63,"\u4f46\u6240\u6709fork\u7684\u7248\u672c\u5e93\u7684\u6240\u6709\u5206\u652f\u90fd\u76f8\u5f53\u4e8e\u7279\u6027\u5206\u652f":63,"\u4f46\u662f\u53c8\u8fc7\u4e8e\u7410\u788e":46,"\u4f46\u662f\u5728mkl":42,"\u4f46\u662f\u5728paddlepaddle\u4e2d":42,"\u4f46\u662f\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4e0d\u9700\u8981\u4efb\u4f55\u8f6c\u6362":42,"\u4f46\u662f\u6ce8\u610f\u7684\u662f":42,"\u4f46\u662f\u89e3\u91ca\u6027\u8bed\u8a00":45,"\u4f5c\u4e3a\u53e6\u4e00\u4e2a\u7b2c\u4e09\u65b9\u5e93\u96c6\u6210\u8fdbpaddlepaddl":42,"\u4f5c\u4e3a\u5b58\u50a8\u7cfb\u7edf":11,"\u4f5c\u4e3a\u7c7b\u53e5\u67c4":45,"\u4f7f\u7528":[42,46,63],"\u4f7f\u7528\u4e0b\u9762\u547d\u4ee4":11,"\u4f7f\u7528\u52a8\u6001\u5e93":45,"\u4f7f\u7528\u540c\u6837\u7684\u8bad\u7ec3\u6570\u636eblock":10,"\u4f7f\u7528\u667a\u80fd\u6307\u9488\u7684\u539f\u56e0\u662f":46,"\u4f7f\u7528\u7684\u53c2\u6570\u4e0epaddlepaddle\u7533\u8bf7\u7684buffer\u5171\u7528\u4e00\u5757\u5185\u5b58":42,"\u4f7f\u7528\u76f8\u5bf9\u8def\u5f84\u7684\u5f15\u7528\u65b9\u5f0f":46,"\u4f7f\u7528\u8fd9\u4e2a\u795e\u7ecf\u7f51\u7edc\u53ef\u4ee5\u5b8c\u6210\u5bf9\u65b0\u6570\u636e\u7684\u9884\u6d4b":10,"\u4f7f\u7528\u9759\u6001\u5e93\u548c\u52a8\u6001\u5e93\u96be\u5ea6\u5dee\u4e0d\u591a":45,"\u4f7f\u7528c":46,"\u4f7f\u7528c99\u505a\u63a5\u53e3":45,"\u4f7f\u7528c99\u800c\u4e0d\u4f7f\u7528c11\u7684\u539f\u56e0\u662f":45,"\u4f7f\u7528c99\u800c\u4e0d\u4f7f\u7528c89":45,"\u4f7f\u7528regress":63,"\u4f7f\u7528swig\u53ea\u652f\u6301cpython\u89e3\u91ca\u5668":45,"\u4f7f\u7528swig\u9700\u8981\u591a\u8bed\u8a00\u7ed1\u5b9a\u7684\u5f00\u53d1\u4eba\u5458\u719f\u7ec3\u638c\u63e1swig\u914d\u7f6e":45,"\u4f7f\u7528void":45,"\u4f8b\u5982":[11,45,46,63],"\u4f8b\u5982\u5728deepspeech2":41,"\u4f8b\u5982\u5bf9\u4e8ejava\u6216\u8005python":45,"\u4f8b\u5982\u5bf9\u4e8ejava\u6765\u8bf4":45,"\u4f8b\u5982\u5bf9\u4e8epython":45,"\u4f8b\u5982c":45,"\u4f8b\u5982java\u4e0epython\u7684\u9519\u8bef\u5904\u7406\u662f\u76f4\u63a5\u6254\u51fa\u6765except":45,"\u4f8b\u5982python\u53ef\u4ee5\u4f7f\u7528":45,"\u4f8b\u5982python\u7684":45,"\u4f8b\u5982rnn":41,"\u4f9d\u6b21\u7c7b\u63a8":63,"\u4fbf\u662f\u5c06\u9759\u6001\u5e93\u52a0\u5165jvm\u4e2d":45,"\u4fee\u590d\u6240\u6709bug\u540e":63,"\u4fee\u590ddocker\u7f16\u8bd1\u955c\u50cf\u95ee\u9898":63,"\u4fee\u6539":[42,63],"\u4fee\u6539\u6210":63,"\u505a\u53ea\u8bfb\u6302\u8f7d":11,"\u505a\u5982\u4e0b\u51e0\u4e2a\u64cd\u4f5c":63,"\u505a\u63a5\u53e3":45,"\u505c\u6b62\u4fdd\u5b58\u68c0\u67e5\u70b9\u7684\u7ebf\u7a0b":10,"\u5145\u5206\u53d1\u6325\u82f1\u7279\u5c14\u5e73\u53f0\u7684\u4f18\u52bf":41,"\u5145\u5206\u5c55\u73b0\u82f1\u7279\u5c14\u5e73\u53f0\u7684\u4f18\u52bf":42,"\u5148\u5b8c\u6210\u5bf9\u6743\u91cd\u7684packing\u64cd\u4f5c":41,"\u5148\u5b9e\u73b0\u6a21\u578b\u63a8\u65ad\u7684api":46,"\u5171\u4eab\u5185\u5b58":42,"\u5171\u4eab\u540c\u4e00\u4e2a\u6743\u91cd":41,"\u5176\u4e2d":[45,63],"\u5176\u4ed6\u51fd\u6570\u5747\u8fd4\u56de":46,"\u5176\u4ed6\u7528\u6237\u7684fork\u7248\u672c\u5e93\u5e76\u4e0d\u9700\u8981\u4e25\u683c\u9075\u5b88":63,"\u5176\u8f6c\u6362\u6b21\u6570\u51cf\u5c11\u81f3":41,"\u5177\u4f53\u4f7f\u7528\u65b9\u6cd5\u4e3a":46,"\u5177\u4f53\u539f\u56e0\u53c2\u8003":46,"\u5177\u4f53\u53ef\u4ee5\u53c2\u8003mkl":42,"\u5177\u4f53\u5b9e\u73b0\u65b9\u5f0f\u6bd4\u5982":[41,42],"\u5177\u4f53\u7684\u5b8c\u6210\u72b6\u6001\u53ef\u4ee5\u53c2\u89c1":42,"\u5177\u4f53\u8bf7\u53c2\u8003":46,"\u5185\u90e8\u5b58\u50a8":42,"\u5185\u90e8\u9a71\u52a8python\u89e3\u91ca\u5668\u8fdb\u884c\u6a21\u578b\u914d\u7f6e\u89e3\u6790\u548c\u6570\u636e\u8bfb\u53d6":45,"\u518d\u5728\u6bcf\u4e00\u4e2aapi\u4e2d\u81ea\u5df1\u68c0\u67e5\u7c7b\u578b":45,"\u518d\u57fa\u4e8e":63,"\u518d\u628a\u5df2\u8f6c\u6362\u4e3apacked\u683c\u5f0f\u7684\u6570\u636e\u4f20\u9012\u7ed9\u90a3\u4e9b\u590d\u7528\u540c\u4e00\u6570\u636e\u7684gemm":41,"\u5199\u4ee3\u7801":45,"\u5199\u5165\u5feb\u7167\u6570\u636e":10,"\u51fd\u6570":[41,42],"\u51fd\u6570\u5373\u53ef\u5b8c\u6210\u8f6c\u6362":11,"\u51fd\u6570\u540d\u4e3a":46,"\u51fd\u6570\u547d\u540d":45,"\u5206\u522b\u4ee3\u8868\u8f93\u5165\u6570\u636e":42,"\u5206\u522b\u5bf9\u5e94capi":63,"\u5206\u5e03\u5f0f\u5b58\u50a8\u670d\u52a1":10,"\u5206\u652f":63,"\u5206\u652f\u4e00\u65e6\u5efa\u7acb":63,"\u5206\u652f\u4e2d":63,"\u5206\u652f\u4e3a\u5f00\u53d1":63,"\u5206\u652f\u4e3a\u6bcf\u4e00\u6b21release\u65f6\u5efa\u7acb\u7684\u4e34\u65f6\u5206\u652f":63,"\u5206\u652f\u4e3a\u7a33\u5b9a":63,"\u5206\u652f\u529f\u80fd\u7684\u5c01\u95ed":63,"\u5206\u652f\u5408\u5165":63,"\u5206\u652f\u5408\u5165master\u5206\u652f":63,"\u5206\u652f\u540c\u6b65\u4e3b\u7248\u672c\u5e93\u7684":63,"\u5206\u652f\u540d\u4e3a":63,"\u5206\u652f\u5b58\u5728\u7684\u65f6\u5019":63,"\u5206\u652f\u6d3e\u751f\u51fa\u65b0\u7684\u5206\u652f":63,"\u5206\u652f\u7684\u7248\u672c\u90fd\u662f\u7ecf\u8fc7\u5355\u5143\u6d4b\u8bd5\u548c\u56de\u5f52\u6d4b\u8bd5\u7684\u7248\u672c":63,"\u5206\u652f\u7684\u7248\u672c\u90fd\u7ecf\u8fc7\u5355\u5143\u6d4b\u8bd5":63,"\u5206\u7247":10,"\u5219\u4f7f\u7528\u542f\u52a8\u53c2\u6570\u5b9a\u4e49\u7684\u521d\u59cb\u5316\u65b9\u6cd5\u521d\u59cb\u5316\u53c2\u6570":10,"\u5219\u5ffd\u7565":10,"\u5219\u628a\u53e6\u4e00\u4e2a\u6162\u901f\u7684kill\u6389":10,"\u5219\u76f4\u63a5\u5f15\u5165\u53e6\u4e00\u79cd\u7c7b\u578b\u7684\u5934\u6587\u4ef6":46,"\u5219\u9700\u8981\u56de\u6eda\u5230\u4e0a\u4e00\u4e2a\u68c0\u67e5\u70b9":10,"\u521b\u5efa":42,"\u5220\u9664\u78c1\u76d8\u76ee\u5f55\u4e2d\u4e0d\u662f\u5f53\u524duuid\u7684\u5feb\u7167\u6587\u4ef6":10,"\u5230":10,"\u5230\u7b2c\u4e8c\u6b65":63,"\u524d\u540e\u7684\u7f51\u7edc\u6027\u80fd":41,"\u529f\u80fd":27,"\u529f\u80fd\u7684\u6b63\u786e\u6027\u5305\u62ec\u9a8c\u8bc1paddlepaddle\u76ee\u524d\u7684":63,"\u52a8\u6001\u5e93":45,"\u5305\u542b\u4e86\u67d0\u79cd\u7c7b\u578b\u7684\u7c7b\u578b\u5b9a\u4e49\u548c\u66b4\u9732\u7684\u5168\u90e8\u51fd\u6570":46,"\u5305\u62ec":[11,41,42],"\u5305\u62ec\u6743\u91cdw\u548c\u504f\u7f6eb":10,"\u5305\u62ecmkl":42,"\u534f\u540c\u5b8c\u6210releas":63,"\u5355\u4e2a\u503c":11,"\u5355\u70b9\u6545\u969c":10,"\u5373":46,"\u5373\u4f7f\u7528":46,"\u5373\u4f7f\u7528\u6237\u76f4\u63a5\u5f15\u7528\u67d0\u79cd\u7c7b\u578b\u7684\u5934\u6587\u4ef6":46,"\u5373\u4f7fc":46,"\u5373\u4f8b\u5982":46,"\u5373\u4fbfpaddl":46,"\u5373\u5b8c\u6210\u67d0\u4e00\u4e2a\u4efb\u52a1\u7684\u6700\u5c11\u51fd\u6570":46,"\u5373\u66b4\u9732":46,"\u5373\u8868\u793a\u4e0d\u9700\u8981\u8f6c\u6362":42,"\u5373\u8fd9\u4e2a\u52a8\u6001\u5e93\u662f\u4e0d\u4f9d\u8d56\u4e8e\u5176\u4ed6\u4efb\u4f55\u6587\u4ef6\u7684":45,"\u539f\u6765\u7684\u65b9\u6848":42,"\u53c2\u6570":45,"\u53c2\u8003":[27,45],"\u53c2\u8003\u4e0b\u56fe":63,"\u53c8\u53ef\u4ee5\u907f\u514d\u4e0d\u5fc5\u8981\u7684\u8f6c\u6362":42,"\u53cc\u5411\u9a8c\u8bc1":27,"\u53d1\u578b\u7248":63,"\u53d1\u5e03\u5230dockerhub":63,"\u53d1\u5e03docker\u955c\u50cf\u53ea\u9700\u8981\u5bf9\u81ea\u52a8push\u7684\u955c\u50cf\u6253\u4e0a":63,"\u53d8\u91cf\u6765\u533a\u5206layer\u7684\u5c5e\u6027":42,"\u53ea\u5bf9\u7279\u6b8a\u5728\u7ebf\u7cfb\u7edf\u8003\u8651\u4e24\u53f0\u4ee5\u4e0a\u540c\u65f6\u6545\u969c\u7684\u5bb9\u707e":10,"\u53ea\u66b4\u9732\u6982\u5ff5\u7684\u63a5\u53e3":46,"\u53ea\u80fd\u8c03\u7528paddle\u7684\u52a8\u6001\u5e93":45,"\u53ea\u9700\u8981\u6062\u590d\u8fd9\u53f0\u8282\u70b9":10,"\u53ef\u4ee5":63,"\u53ef\u4ee5\u51cf\u5c0f\u7cfb\u7edf\u590d\u6742\u6027":10,"\u53ef\u4ee5\u5728\u4efb\u4f55\u673a\u5668\u4e0a\u6267\u884c\u7684":45,"\u53ef\u4ee5\u5728\u6b64\u9875\u9762\u7684":63,"\u53ef\u4ee5\u628a\u672c\u5730\u7684\u6570\u636e\u4e0a\u4f20\u5230\u5b58\u50a8\u96c6\u7fa4\u4e2d":11,"\u53ef\u4ee5\u6709\u6548\u7684\u907f\u514dparamet":10,"\u53ef\u4ee5\u7528":27,"\u53ef\u4ee5\u7528\u4ee5\u4e0b\u6307\u4ee4":11,"\u53ef\u4ee5\u7ee7\u7eed\u5728\u81ea\u5df1\u7684\u529f\u80fd\u5206\u652f\u63d0\u4ea4\u4ee3\u7801":63,"\u53ef\u4ee5\u901a\u8fc7\u9636\u6bb5\u6027\u7684\u4fdd\u5b58\u6bcf\u4e2aparamet":10,"\u53ef\u80fd\u4f1a\u9020\u6210\u7f51\u7edc\u62e5\u585e":10,"\u53f3\u4fa7\u7684":63,"\u5404\u6b21\u524d\u5411\u4e4b\u95f4\u4e5f\u90fd\u4f7f\u7528\u4e86\u76f8\u540c\u7684\u6743\u91cd":41,"\u540c\u4e00\u6b21\u524d\u5411":41,"\u540c\u65f6":[41,42],"\u540c\u65f6\u4f1a\u5f00\u542fintel":42,"\u540c\u65f6\u518d\u5c06":63,"\u540c\u65f6\u53c8\u5c3d\u53ef\u80fd\u5c11\u7684\u727a\u7272mkl":42,"\u540c\u65f6\u63d0\u8d77":63,"\u540c\u65f6\u6570\u636e\u683c\u5f0f\u5c31\u662f":42,"\u540d\u5b57\u4fee\u9970":45,"\u540e\u5411":41,"\u540e\u5411\u65f6\u590d\u7528\u5df2\u7ecf\u8f6c\u6362\u8fc7\u7684\u6743\u91cd":41,"\u5411\u6307\u5b9a\u7684\u76ee\u5f55\u4e2d\u4e00\u4e2a\u65b0\u7684\u6587\u4ef6":10,"\u5411paddlepaddle\u7684\u4e3b\u7248\u672c\u5e93\u63d0\u4ea4":63,"\u5426\u5219\u5f97\u628apaddle\u9759\u6001\u5e93\u94fe\u63a5\u5230\u89e3\u91ca\u5668\u91cc":45,"\u542f\u52a8\u4e00\u4e2a\u65b0\u7684\u7ebf\u7a0b\u5f00\u59cb\u4fdd\u5b58\u68c0\u67e5\u70b9":10,"\u548c":[11,41,42,45,46,63],"\u548c\u672a\u6765\u53ef\u80fd\u8fd8\u4f1a\u7528\u5230":42,"\u548c\u79bb\u7ebf\u6570\u636e\u7684\u65b9\u5f0f":11,"\u54ea\u4e2atrainer\u5148\u5b8c\u6210block\u7684\u8bad\u7ec3":10,"\u56e0\u4e3a\u8fd9\u6837\u505a\u4e5f\u6ca1\u6cd5\u4fdd\u8bc1\u6d88\u9664\u968f\u673a\u6027":10,"\u56e0\u4e3aswig\u5728\u7b2c\u4e09\u65b9\u8bed\u8a00\u4e2d\u66b4\u9732\u7684\u51fd\u6570\u540d":45,"\u56e0\u6b64":41,"\u56fe\u50cf\u5206\u7c7b":63,"\u5728":[41,42,46,63],"\u5728\u4e00\u4e2a\u4e0d\u53ef\u4e2d\u65ad\u5e76\u7f3a\u5c11\u5907\u4efd\u7684\u8bad\u7ec3\u4efb\u52a1\u4e2d":10,"\u5728\u4e0a\u56fe\u4e2d\u663e\u793a\u4e86\u5728\u4e00\u4e2a\u5b9e\u9645\u751f\u4ea7\u73af\u5883\u4e2d\u7684\u5e94\u7528":11,"\u5728\u4f7f\u7528twine\u4e0a\u4f20\u4e4b\u524d":63,"\u5728\u51fa\u73b0\u5355\u70b9\u6545\u969c\u65f6":10,"\u5728\u5b9e\u73b0\u6bcf\u4e2a\u5b50\u7c7b\u7684\u65f6\u5019\u5c31\u4e0d\u9700\u8981\u5173\u5fc3\u5206\u652f\u7684\u4e8b\u60c5\u4e86":42,"\u5728\u5b9e\u73b0\u8fc7\u7a0b\u4e2d":46,"\u5728\u5bf9\u5e94\u7684":41,"\u5728\u5c42\u521d\u59cb\u5316\u7684\u65f6\u5019":41,"\u5728\u5f00\u59cb\u8bad\u7ec3\u4e4b\u524d":11,"\u5728\u5f02\u6784\u96c6\u7fa4\u4e2d":10,"\u5728\u5f15\u5165\u5176\u4ed6\u7c7b\u578b\u7684\u5934\u6587\u4ef6\u65f6":46,"\u5728\u5feb\u7167\u5199\u5165\u5b8c\u6210\u540e":10,"\u5728\u60a8\u7684\u5b9e\u9645\u73af\u5883\u4e2d":10,"\u5728\u6709\u666e\u901a\u7684cpu":42,"\u5728\u672c\u6587\u6863\u4e2d":27,"\u5728\u673a\u7fa4\u4e0a\u8fd0\u884c\u8f6c\u6362\u7a0b\u5e8f":11,"\u5728\u6837\u4f8b\u4e2d":46,"\u5728\u7528\u6237\u4f7f\u7528c":46,"\u5728\u7b2c\u4e8c\u4e2atab":63,"\u5728\u7ebf\u6a21\u578b\u9884\u6d4b\u670d\u52a1":11,"\u5728\u8bad\u7ec3\u7ed3\u675f\u7684\u65f6\u5019\u518d\u4fdd\u5b58\u4e3apaddlepaddle\u7684\u683c\u5f0f":42,"\u5728\u8bc4\u5ba1\u8fc7\u7a0b\u4e2d":63,"\u5728\u8fd9\u4e2a":63,"\u5728\u8fd9\u4e2a\u52a8\u6001\u5e93\u4e2d\u4e0d\u5d4c\u5165\u4efb\u4f55\u5176\u4ed6\u8bed\u8a00\u7684\u89e3\u91ca\u5668":45,"\u5728\u8fd9\u4e2a\u9636\u6bb5\u7684\u4ee3\u7801\u6b63\u5728\u7ecf\u5386\u56de\u5f52\u6d4b\u8bd5":63,"\u5728\u8fd9\u4e9b\u5934\u6587\u4ef6\u4e2d":46,"\u5728\u8fd9\u4e9b\u6587\u4ef6\u4e2d":46,"\u5728\u91cd\u6784\u524d\u7684paddlepaddle\u4e2d":42,"\u5728\u95ee\u9898\u672c\u8eab\u7684\u8ba1\u7b97\u91cf\u6bd4\u8f83\u5c0f\u7684\u65f6\u5019":41,"\u5728batch":41,"\u5728c":45,"\u5728c\u7684\u5934\u6587\u4ef6":45,"\u5728packing\u4e0a\u7684\u8017\u65f6":41,"\u5728paddle\u4e4b\u4e0a\u8fd0\u884c\u7684\u6df1\u5ea6\u5b66\u4e60\u8bad\u7ec3\u8f93\u51fa\u7684\u6a21\u578b\u4f1a\u63d0\u4f9b\u7ed9\u5728\u7ebf\u4eba\u8138\u8bc6\u522b\u7684\u5e94\u7528\u4f7f\u7528":11,"\u5728paramet":10,"\u5728rnn\u7684\u60c5\u51b5\u4e0b":41,"\u5747\u4f1a\u88ab\u5b89\u88c5\u5230includ":46,"\u5747\u662f\u5728":46,"\u57fa\u4e8e\u7c98\u6027\u4f1a\u8bdd\u7684\u8d1f\u8f7d\u5747\u8861\u529f\u80fd":27,"\u5916\u90e8\u5b58\u50a8":42,"\u591a\u4e2a\u503c":11,"\u591a\u4e2aparamet":10,"\u591a\u6b21\u8c03\u7528":41,"\u5927\u591a\u6570\u8bed\u8a00\u90fd\u652f\u6301\u4f7f\u7528c\u8bed\u8a00api":45,"\u5982\u56fe\u4e2dtrainer":10,"\u5982\u679c\u4e0a\u9762\u4e24\u6b65\u51fa\u73b0\u9519\u8bef":10,"\u5982\u679c\u4e0d\u9700\u8981\u5916\u90e8\u5b58\u50a8\u7528\u4e8e\u8f6c\u6362":42,"\u5982\u679c\u4f7f\u7528swig\u6211\u4eec\u9700\u8981\u5c06\u5728interface\u6587\u4ef6\u91cc":45,"\u5982\u679c\u5728\u4f7f\u7528mkl":42,"\u5982\u679c\u5931\u8d25":63,"\u5982\u679c\u5b58\u5728\u6570\u636e\u6392\u5217\u683c\u5f0f\u4e0d\u4e00\u6837\u7684\u60c5\u51b5\u65f6":42,"\u5982\u679c\u5b58\u5728\u67d0\u4e9btrainer\u6267\u884c\u901f\u5ea6\u8fc7\u6162\u4f1a\u5f71\u54cd\u6574\u4f53\u96c6\u7fa4\u7684\u901f\u5ea6":10,"\u5982\u679c\u5df2\u7ecf\u6b63\u5728\u6267\u884c\u4fdd\u5b58\u68c0\u67e5\u70b9\u7684\u7ebf\u7a0b":10,"\u5982\u679c\u662f\u5176\u5b83\u7c7b\u578b":11,"\u5982\u679c\u6709bugfix\u7684\u884c\u4e3a":63,"\u5982\u679c\u67d0\u4e00\u4e2a\u7c7b\u578b\u9700\u8981\u5f15\u7528\u53e6\u4e00\u4e2a\u7c7b\u578b":46,"\u5982\u679c\u67d0\u4e00\u4e2apaddl":46,"\u5982\u679c\u67d0\u4e00\u4e2apaddle\u6982\u5ff5\u5fc5\u987b\u8981\u66b4\u9732":46,"\u5982\u679c\u6ee1\u8db3\u6761\u4ef6":10,"\u5982\u679c\u7528\u6237\u8981\u628apaddle\u7684\u9759\u6001\u5e93":45,"\u5982\u679c\u8981\u4e0a\u4f20gpu\u7248\u672c\u7684\u5305":63,"\u5982\u679c\u8c03\u7528\u9759\u6001\u5e93\u53ea\u80fd\u5c06\u9759\u6001\u5e93\u4e0e\u89e3\u91ca\u5668\u94fe\u63a5":45,"\u5982\u679c\u9700\u8981\u624b\u52a8\u7f16\u8bd1":63,"\u5982\u679cmkl":42,"\u5982\u679cparamet":10,"\u5b50\u7c7b\u53ea\u9700\u8981\u4f7f\u7528\u5b9a\u4e49\u597d\u7684\u63a5\u53e3":42,"\u5b57\u6bb5\u8bbe\u4e3a":63,"\u5b57\u7b26\u4e32":11,"\u5b58\u50a8":11,"\u5b66\u4e60\u6210\u672c\u9ad8":45,"\u5b83\u4eec\u4e3b\u8981\u662f\u7528\u4e8e":42,"\u5b83\u4eec\u7684\u6587\u4ef6\u540d\u662f":11,"\u5b83\u53ea\u4f1a\u5305\u62ec\u751f\u6210\u597d\u7684\u52a8\u6001\u5e93\u548c\u5934\u6587\u4ef6":42,"\u5b83\u8d1f\u8d23\u51b3\u5b9a\u7f16\u8bd1\u65f6\u662f\u5426\u4f7f\u7528mklml\u548cmkl":42,"\u5b89\u88c5\u540e\u7684\u76ee\u5f55\u7ed3\u6784\u4e3a":46,"\u5b8c\u6210\u4e00\u4e2a\u4f20\u8f93\u52a8\u4f5c\u5b8c\u6210\u7684\u65f6\u95f4\u4e5f\u6bd4\u8f83\u77ed":27,"\u5b8c\u6210\u5e38\u7528layer\u7684mkl":42,"\u5b8c\u6210\u5e38\u89c1\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edcvgg":42,"\u5b8c\u6210\u6570\u636e\u7684\u9884\u5904\u7406":11,"\u5b8c\u6210\u81ea\u52a8\u5316\u4e8c\u8fdb\u5236\u7f16\u8bd1":63,"\u5b9a\u4e49":42,"\u5b9a\u4e49\u4e00\u4e9b\u9664\u4e86layer\u548cmemory\u76f8\u5173\u7684\u7c7b\u548c\u51fd\u6570":42,"\u5b9e\u73b0\u5177\u4f53\u7684\u51fd\u6570\u529f\u80fd\u5373\u53ef":42,"\u5b9e\u73b0\u7b80\u5355":45,"\u5bf9\u4e8e\u4e0d\u540c\u8bed\u8a00":45,"\u5bf9\u4e8e\u540c\u4e00\u6bb5c":45,"\u5bf9\u4e8e\u540c\u6837\u8bbe\u7f6e\u7684\u7f51\u7edc\u6a21\u578b":41,"\u5bf9\u4e8e\u591a\u8bed\u8a00\u63a5\u53e3":45,"\u5bf9\u4e8e\u5927\u591a\u6570\u8bed\u8a00":45,"\u5bf9\u4e8e\u5e8f\u5217\u957f\u5ea6":41,"\u5bf9\u4e8e\u6709\u53c2\u6570\u7684\u5c42":42,"\u5bf9\u4e8e\u6bcf\u4e00\u4e2a\u65b0\u52a0\u7684rnn":41,"\u5bf9\u4e8e\u6bcf\u79cd\u7c7b\u578b":46,"\u5bf9\u4e8e\u6bcf\u79cdc":46,"\u5bf9\u65b0\u7684\u6743\u91cd\u8fdb\u884c\u8f6c\u6362\u7528\u4e8e\u4e0b\u6b21\u8fed\u4ee3":41,"\u5bf9\u6bd4":45,"\u5bf9\u6bd4\u4f18\u5316\u540elayer\u4e0e\u76f8\u5bf9\u5e94\u7684paddlepaddle\u539f\u6709lay":41,"\u5bf9\u6bd4\u4f18\u5316\u540elayer\u81ea\u8eab":41,"\u5bf9\u8f93\u5165\u53c2\u6570\u7684\u5b89\u5168\u6027\u8fdb\u884c\u4e86\u5fc5\u8981\u7684\u5224\u65ad":46,"\u5bf9\u8fd9\u4e2a\u7248\u672c\u7684\u63d0\u4ea4":63,"\u5bfb\u627e\u6709\u6ca1\u6709\u5176\u4ed6\u53ef\u4ee5\u4f18\u5316\u7684\u53ef\u80fd":42,"\u5bfc\u51fa\u8fd9\u4e9b\u63a5\u53e3":46,"\u5c06":63,"\u5c06\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u53c2\u6570\u62c6\u5206\u6210\u591a\u4efd":10,"\u5c06\u5927\u91cf\u7684":45,"\u5c06\u65b0\u5206\u652f\u7684\u7248\u672c\u6253\u4e0atag":63,"\u5c06master\u5206\u652f\u7684\u5408\u5165commit\u6253\u4e0atag":63,"\u5c0f\u4e8e\u67d0\u4e2a\u6bd4\u8f83\u5c0f\u7684\u9608\u503c\u8ba4\u4e3a\u901a\u8fc7":42,"\u5c31\u9700\u8981\u5bf9\u8fd9\u4e2a\u7b2c\u4e09\u65b9\u8bed\u8a00\u589e\u52a0\u4e00\u4e9b\u5b9a\u4e49":45,"\u5de5\u5177\u4e0a\u4f20\u5373\u53ef":63,"\u5e73\u5747\u6545\u969c\u4fee\u590d\u65f6\u95f4":10,"\u5e73\u5747\u6545\u969c\u7387":10,"\u5e76\u4e14\u4f1a\u5199\u597d":42,"\u5e76\u4e14\u4f7f\u7528":46,"\u5e76\u4e14\u53ea\u9700\u8981\u5728\u5fc5\u8981\u7684\u65f6\u5019\u8f6c\u6362\u8fd9\u79cd\u683c\u5f0f":42,"\u5e76\u4e14\u5728\u5e38\u89c1\u7684\u5e73\u53f0\u4e0a":45,"\u5e76\u4e14\u5f53\u7f16\u8bd1\u65f6":41,"\u5e76\u4e14\u628a\u7cfb\u7edf\u751f\u6210\u7684ca":27,"\u5e76\u4e14\u628a\u7ed3\u679c\u8fd4\u56depfsclient\u7aef":27,"\u5e76\u4e14\u8ba9\u63a5\u53e3\u8131\u79bb\u5b9e\u73b0\u7ec6\u8282":45,"\u5e76\u4e14\u8f93\u5165\u8f93\u51fa\u90fd\u662f\u5171\u7528\u4e00\u5757\u5185\u5b58":42,"\u5e76\u5220\u9664":63,"\u5e76\u5220\u9664\u66f4\u65e9\u7684\u5feb\u7167":10,"\u5e76\u52a0\u8f7d\u5176\u4e2d\u7684\u53c2\u6570":10,"\u5e76\u53d1\u5e03\u5230pypi":63,"\u5e76\u5728\u6bcf\u6b21\u6743\u91cd\u66f4\u65b0\u540e":41,"\u5e76\u5728\u96c6\u7fa4\u4e2d\u8fd0\u884c\u591a\u4e2a\u5206\u5e03\u5f0f\u6570\u636e\u5904\u7406\u4efb\u52a1":11,"\u5e76\u5c06":63,"\u5e76\u5c06c":46,"\u5e76\u628a\u5feb\u7167\u4fdd\u5b58\u5230\u8fd9\u4e2a\u76ee\u5f55\u4e0b":10,"\u5e76\u628a\u7ed3\u679c\u653e\u5230\u5f53\u524d\u5c42\u7684":42,"\u5e76\u6ca1\u6709paddle\u7279\u522b\u9700\u8981\u7684\u7279\u6027":45,"\u5e76\u6dfb\u52a0\u5934\u6587\u4ef6":41,"\u5e76\u88ab\u5b58\u50a8\u5728\u8bf8\u5982hadoop":11,"\u5e76\u9002\u5e94github\u7684\u7279\u6027\u505a\u4e86\u4e00\u4e9b\u533a\u522b":63,"\u5e76\u91cd\u65b0\u6253\u5305wheel\u5305":63,"\u5efa\u8bae":63,"\u5f00\u53d1\u4e86\u6a21\u578b\u9884\u6d4b\u7684\u6837\u4f8b\u4ee3\u7801":46,"\u5f00\u53d1\u8005\u4fee\u6539\u81ea\u5df1\u7684\u4ee3\u7801":63,"\u5f00\u53d1\u8005fork\u7684\u7248\u672c\u5e93\u4e2d":63,"\u5f00\u53d1\u8005fork\u7684\u7248\u672c\u5e93\u4f7f\u7528":63,"\u5f00\u5934":[41,42],"\u5f00\u59cb\u63d0\u4f9b\u670d\u52a1":10,"\u5f15\u5165\u4e86\u4ee5\u4e0b\u56db\u4e2aapi":41,"\u5f15\u5165\u4e86\u7c7b\u578b\u7684\u5934\u6587\u4ef6":46,"\u5f39\u51fa\u4e0b\u9762\u7684\u9009\u62e9\u6846":63,"\u5f53\u529f\u80fd\u5206\u652f\u5f00\u53d1\u5b8c\u6bd5\u540e":63,"\u5f53\u53ea\u505a\u63a8\u65ad":41,"\u5f53\u5f00\u542f":42,"\u5f53\u6253\u5f00":42,"\u5f53\u6570\u636e\u683c\u5f0f\u4e0epaddlepaddle\u9ed8\u8ba4\u7684":42,"\u5f53\u7136\u8fd9\u4e24\u8005\u4e5f\u53ef\u4ee5\u76f8\u7b49":42,"\u5f53\u7528\u6237\u4f7f\u7528\u5b8c\u8fd9\u4e2a\u53c2\u6570\u540e":46,"\u5f53\u7f51\u7edc\u51fa\u73b0\u5206\u652f\u4e14\u5728":42,"\u5f53destination\u6587\u4ef6\u4e0d\u5b58\u5728\u6216\u8005\u5927\u5c0f\u548csource\u6587\u4ef6\u4e0d\u4e00\u81f4\u65f6":27,"\u5f88\u96be\u4fdd\u8bc1\u591a\u8bed\u8a00\u4ee3\u7801\u98ce\u683c\u7684\u4e00\u81f4\u6027":45,"\u5f97\u4f7f\u7528":45,"\u5fc5\u8981":46,"\u5fc5\u987b\u5206\u522b\u4e0e":42,"\u60c5\u611f\u5206\u6790":63,"\u6211\u4eec\u4e5f\u53ef\u4ee5\u786e\u5b9a\u6bcf\u4e00\u4e2a\u53c2\u6570\u7684\u7c7b\u578b":46,"\u6211\u4eec\u4e5f\u5c06mklml\u5373":42,"\u6211\u4eec\u4f1a\u4fdd\u8bc1":42,"\u6211\u4eec\u4f1a\u5728\u7f51\u7edc\u8bad\u7ec3\u4e4b\u524d\u628a\u683c\u5f0f\u8f6c\u6362\u4e3amkl":42,"\u6211\u4eec\u4f1a\u5bf9\u6bd4\u5982\u4e0b2\u4e2a\u65b9\u9762":41,"\u6211\u4eec\u4f1a\u628amkl":42,"\u6211\u4eec\u4f1a\u6dfb\u52a0":[41,42],"\u6211\u4eec\u4f7f\u7528\u52a8\u6001\u5e93\u6765\u5206\u53d1paddl":45,"\u6211\u4eec\u51b3\u5b9a\u4f7f\u7528\u5df2\u6709\u7684":42,"\u6211\u4eec\u53ef\u4ee5\u5148\u5b8c\u6210\u5bf9\u539f\u6570\u636e\u7684packing\u64cd\u4f5c":41,"\u6211\u4eec\u603b\u7ed3\u51fa\u4e00\u4e9b\u7279\u522b\u9700\u8981\u6ce8\u610f\u7684\u70b9":42,"\u6211\u4eec\u63d0\u4f9b\u4e24\u4e2a\u8f6c\u6362\u65b9\u5f0f":11,"\u6211\u4eec\u63d0\u51fa\u4e86chunk\u7684\u6982\u5ff5":27,"\u6211\u4eec\u6700\u7ec8\u7684\u52a8\u6001\u5e93\u4e2d\u4e0d\u5d4c\u5165python\u6216\u8005\u5176\u4ed6\u4efb\u4f55\u8bed\u8a00\u7684\u89e3\u91ca\u5668":45,"\u6211\u4eec\u8ba1\u5212\u5c06":41,"\u6211\u4eec\u8ba1\u5212\u5c06\u82f1\u7279\u5c14\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\u6570\u5b66\u5e93":42,"\u6211\u4eec\u8bbe\u8ba1\u8bf4\u660e\u4e86\u540d\u4e3afilemanager\u7cfb\u7edf":27,"\u6211\u4eec\u9009\u62e9":11,"\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u65b0\u5f15\u5165\u7684gemm":41,"\u6211\u4eec\u90fd\u63d0\u4f9bpython\u7684\u8f6c\u6362\u5e93":11,"\u6216\u8005":[42,45,46],"\u6216\u8005\u5c06\u8fd9\u53f0\u8282\u70b9\u8fc1\u79fb\u5230\u53e6\u4e00\u4e2a\u8282\u70b9\u5e76\u542f\u52a8\u5373\u53ef\u6062\u590d\u8bad\u7ec3\u4efb\u52a1":10,"\u6216\u8005\u7528tuple\u8868\u793a\u7684\u591a\u4e2a\u503c":11,"\u6216\u8005\u7531\u5b83\u4eec\u7ec4\u6210\u7684list":11,"\u6216activ":42,"\u6240\u4ee5":[42,63],"\u6240\u4ee5\u4e00\u4e2a\u7248\u672c\u53f7\u7684wheel\u5305\u53d1\u5e03\u4e4b\u540e":63,"\u6240\u4ee5\u4e0d\u5b58\u5728\u8fd9\u4e2a\u95ee\u9898":42,"\u6240\u4ee5\u5728":42,"\u6240\u4ee5\u5728\u5199\u5165\u5feb\u7167\u7684\u8fc7\u7a0b\u4e2d":10,"\u6240\u4ee5\u6211\u4eec\u5b9a\u4e49\u4e86\u4e00\u4e2a":42,"\u6240\u4ee5\u6574\u4f53\u4e0a":42,"\u6240\u4ee5\u6dfb\u52a0\u4e86\u5bf9\u5e94\u7684":42,"\u6240\u4ee5\u7528\u6237\u9700\u8981\u9996\u5148\u5728":27,"\u6240\u4ee5\u9700\u8981\u5f15\u5165\u4e00\u4e2a\u8f6c\u6362\u65b9\u6cd5":42,"\u6240\u6709\u4e0e\u7c7b\u578b\u76f8\u5173\u7684\u51fd\u6570":46,"\u6240\u6709\u5916\u90e8\u7684\u8f6c\u6362\u5de5\u4f5c\u90fd\u4f1a\u5728reset\u7cfb\u5217\u51fd\u6570\u4e2d\u90fd\u51c6\u5907\u597d":42,"\u6240\u6709\u7684":41,"\u6240\u6709\u7684\u63a5\u53e3\u5747\u4e3ac\u63a5\u53e3":46,"\u6240\u6709\u76f8\u5173\u7684":41,"\u6240\u6709\u7c7b\u578b\u540d\u4e3a":46,"\u6240\u6709mkl":42,"\u624b\u5199\u591a\u8bed\u8a00\u7ed1\u5b9a":45,"\u624d\u80fd\u66f4\u597d\u7684\u53d1\u6325mkl":42,"\u6253\u5f00\u8fd9\u4e2a\u7f16\u8bd1\u9009\u9879":46,"\u6267\u884c":63,"\u628a":11,"\u628a\u4e4b\u524d\u793a\u4f8b\u4e2d\u8f6c\u6362\u5b8c\u6bd5\u7684random":11,"\u6307\u6df1\u5ea6\u5b66\u4e60\u8bad\u7ec3\u4e4b\u540e\u5f97\u5230\u7684\u6240\u6709\u53c2\u6570":10,"\u6309\u94ae":63,"\u63a5\u53e3":[45,46],"\u63a5\u53e3\u5c42\u505a\u8fc7\u591a\u5c01\u88c5":46,"\u63a5\u53e3\u662f":11,"\u63a5\u6536\u5904\u7406pfsclient\u7aef\u7684\u6587\u4ef6\u7ba1\u7406\u8bf7\u6c42":27,"\u63a7\u5236\u662f\u5426\u4f7f\u7528mkl":42,"\u63a7\u5236\u662f\u5426\u4f7f\u7528mklml\u5e93":42,"\u63a7\u5236\u7528\u6237\u6743\u9650":11,"\u63d0\u4f9b\u4e03\u5c42\u534f\u8bae\u7684\u53cd\u5411\u4ee3\u7406":27,"\u63d0\u4f9b\u5e38\u7528\u7684\u547d\u4ee4\u884c\u7ba1\u7406\u547d\u4ee4\u7ba1\u7406\u6587\u4ef6\u548c\u76ee\u5f55":27,"\u63d0\u4f9b\u7528\u6237\u7ba1\u7406\u6587\u4ef6\u7684\u547d\u4ee4":27,"\u63d0\u4f9b\u7ed9paddle\u4f5c\u4e3a\u8bad\u7ec3\u6570\u636e":11,"\u652f\u6301\u5927\u6587\u4ef6\u7684\u65ad\u70b9\u4e0a\u4f20":27,"\u6570\u636e":27,"\u6570\u636e\u8bfb\u53d6\u5747\u4ea4\u7531\u5176\u4ed6\u8bed\u8a00\u5b8c\u6210":45,"\u6570\u636e\u957f\u5ea6\u53ca\u6821\u9a8c\u503c\u7ec4\u6210":27,"\u6570\u636e\u96c6\u9700\u8981\u9884\u5148\u88ab\u8f6c\u6362\u6210paddlepaddle\u5206\u5e03\u5f0f\u8bad\u7ec3\u4f7f\u7528\u7684\u5b58\u50a8\u683c":11,"\u6570\u636e\u9884\u5904\u7406\u4efb\u52a1":11,"\u6587\u4ef6":45,"\u6587\u4ef6\u4f20\u8f93\u7684\u7684\u5173\u952e\u5728\u4e8e\u9700\u8981pfsclient\u7aef\u5bf9\u6bd4source\u548cdestination\u7684\u6587\u4ef6chunks\u7684checksum\u662f\u5426\u4fdd\u6301\u4e00\u81f4":27,"\u6587\u4ef6\u5185\u5bb9\u4e3a":45,"\u6587\u4ef6\u540d\u4e3a\u6b64uuid":10,"\u6587\u4ef6\u5bf9\u5e94\u7684data":11,"\u6587\u4ef6\u7684\u4e0a\u4f20\u548c\u4e0b\u8f7d\u90fd\u662f\u901a\u8fc7\u5bf9chunk\u7684\u64cd\u4f5c\u6765\u5b9e\u73b0\u7684":27,"\u65b0\u624b\u5165\u95e8\u7ae0\u8282":63,"\u65b0\u7248\u672c":42,"\u65b9\u4fbf\u6d4b\u8bd5\u4eba\u5458\u6d4b\u8bd5paddlepaddle\u7684\u884c\u4e3a":63,"\u65b9\u4fbf\u7528\u6237\u4e0a\u4f20\u81ea\u5df1\u7684\u8bad\u7ec3\u6570\u636e\u4ee5\u8fdb\u884c\u5206\u5e03\u5f0f\u8bad\u7ec3":27,"\u65b9\u4fbf\u7528\u6237\u5728python\u7aef\u9009\u62e9\u662f\u5426\u542f\u7528\u8fd9\u4e2a\u529f\u80fd":41,"\u65b9\u4fbf\u7528\u6237\u9009\u62e9\u4f7f\u7528mkl":42,"\u65b9\u5f0f\u7c7b\u4f3c\u4e8e":42,"\u65e0\u6cd5\u505a\u5230\u5bf9\u4e8e\u5404\u79cd\u8bed\u8a00\u9519\u8bef\u5904\u7406\u65b9\u5f0f\u7684\u9002\u914d":45,"\u65e0\u8bba\u5728\u672c\u5730\u8fd8\u662f\u5728\u4e91\u7aef":11,"\u65e0\u8bba\u662f\u4ece":11,"\u65e0\u8bba\u662f\u5728\u672c\u5730\u6216\u662f\u4e91\u7aef\u8f6c\u6362":11,"\u65e0\u8bba\u662f\u91cd\u6784\u524d\u7684layer\u8fd8\u662f\u91cd\u6784\u540e\u7684op":42,"\u65f6":[10,41,42],"\u65f6\u4e00\u8d77\u66f4\u65b0":42,"\u662f":[27,42],"\u662f\u4e00\u4e2a\u591a\u8bed\u8a00\u63a5\u53e3\u7684\u4ee3\u7801\u751f\u6210\u5668":45,"\u662f\u4e00\u4e2a\u7c7b\u578b\u7684\u6807\u5fd7":46,"\u662f\u4e0d\u5e38\u89c1\u7684\u505a\u6cd5":45,"\u662f\u5404\u4e2a\u5b9e\u73b0\u4e2d\u5171\u4eab\u7684\u5934\u6587\u4ef6":46,"\u662f\u5426\u6253\u5f00":41,"\u662f\u56e0\u4e3ac99\u652f\u6301":45,"\u662f\u5bf9\u7528\u6237\u6587\u4ef6\u5b58\u50a8\u7a7a\u95f4\u7684\u62bd\u8c61":27,"\u662f\u6307":46,"\u662f\u7528\u6237\u4f7f\u7528c":46,"\u662fc":46,"\u663e\u5f97\u76f8\u5bf9\u6765\u8bf4\u8f83\u4e3a\u8017\u65f6":41,"\u6682\u65f6\u4e0d\u8003\u8651\u591a\u4e2aparamet":10,"\u66b4\u9732\u8fd9\u4e2a\u6982\u5ff5\u5fc5\u8981\u51fd\u6570":46,"\u6700\u540e\u5220\u9664":63,"\u6700\u5e38\u89c1\u7684\u9519\u8bef\u5904\u7406\u65b9\u5f0f\u662fexcept":45,"\u6709\u6548\u63d0\u5347paddlepaddle\u5728\u82f1\u7279\u5c14\u67b6\u6784\u4e0a\u7684\u6027\u80fd":[41,42],"\u6709\u6807\u51c6\u7684":45,"\u6709\u7684\u65f6\u5019":45,"\u672c\u5217\u8868\u8bf4\u660epaddlepaddle\u53d1\u7248\u4e4b\u524d\u9700\u8981\u6d4b\u8bd5\u7684\u529f\u80fd\u70b9":63,"\u672c\u6587\u6863\u63cf\u8ff0paddl":46,"\u673a\u5668\u7ffb\u8bd1":63,"\u6765\u4fdd\u8bc1\u8bad\u7ec3\u8fc7\u7a0b\u53ef\u4ee5\u4ece\u4e2d\u95f4\u72b6\u6001\u91cd\u65b0\u542f\u52a8":10,"\u6765\u51b3\u5b9a\u662f\u5426\u5f00\u542fmkl":41,"\u6765\u5b9e\u73b0":42,"\u6765\u786e\u4fdd\u628a":45,"\u6765\u8868\u793apaddle\u5185\u90e8\u7c7b":45,"\u6765\u8bbf\u95ee\u7528\u6237\u81ea\u5df1\u7684\u6570\u636e":11,"\u6765\u8fdb\u884c\u8ba8\u8bba":46,"\u67e5\u770blatest":63,"\u6807\u51c6\u8868\u793apaddlepaddle\u7248\u672c\u53f7":63,"\u683c\u5f0f\u4e0d\u5339\u914d\u65f6":42,"\u68c0\u67e5\u70b9\u4fdd\u5b58\u7a0b\u5e8f\u6d41\u7a0b":10,"\u6a21\u578b\u53c2\u6570\u68c0\u67e5\u70b9\u901a\u8fc7\u5b9a\u671f\u5411\u78c1\u76d8\u4e0a\u4fdd\u5b58\u4e00\u4efd\u5b58\u50a8\u5728paramet":10,"\u6a21\u578b\u6570\u636e\u68c0\u67e5\u70b9\u7684\u5b9e\u73b0":10,"\u6a21\u578b\u914d\u7f6e\u89e3\u6790":45,"\u6b21\u8fed\u4ee3\u6267\u884c\u7684\u8f6c\u6362\u6b21\u6570\u4e3a":41,"\u6b64\u65f6\u6bcf\u4e2a\u5c0f\u5206\u652f\u7684":42,"\u6b64\u65f6master\u5c06\u8d1f\u8d23\u542f\u52a8\u4e00\u4e2a\u65b0\u7684train":10,"\u6bcf\u4e00\u4e2a":63,"\u6bcf\u4e00\u4e2a\u6587\u4ef6\u662f\u6570\u636e\u96c6\u7684\u4e00\u4e2ashard":11,"\u6bcf\u4e2a":42,"\u6bcf\u4e2a\u503c\u7684\u7c7b\u578b\u53ef\u4ee5\u662f\u6574\u5f62":11,"\u6bcf\u4e2a\u6d4b\u8bd5\u4f1a\u5bf9\u6bd4paddlepaddle\u4e2dcpu\u7b97\u51fa\u7684\u7ed3\u679c\u4e0emkl":42,"\u6bcf\u4e2adata":11,"\u6bcf\u4e2amkldnnlayer\u90fd\u5305\u542b\u7528\u4e8e\u5185\u90e8\u5b58\u50a8\u548c\u5916\u90e8\u5b58\u50a8\u7684\u4e00\u7cfb\u5217mkldnnmatrix":42,"\u6bcf\u4e2aparamet":10,"\u6bcf\u4e2ashard\u5206\u522b\u5b58\u50a8\u5728\u5176\u4e2d\u4e00\u53f0paramet":10,"\u6bcf\u6b21\u8c03\u7528\u65f6\u5bf9\u539f\u6570\u636e\u7684\u91cd\u590dpacking\u4fbf\u6210\u4e3a\u4e86\u5197\u4f59":41,"\u6bcf\u6b21\u8f93\u51fa\u4e00\u4e2adata":11,"\u6bcf\u969410\u5206\u949f":10,"\u6bd4\u5982":[11,42],"\u6bd4\u5982\u53ef\u80fd\u4f1a\u7528openmp\u6539\u8fdbsgd\u7684\u66f4\u65b0\u6027\u80fd":42,"\u6bd4\u5982\u5c06":63,"\u6bd4\u5982\u6bcf\u969410\u5206\u949f\u6700\u65b0\u7684\u5feb\u7167":10,"\u6bd4\u5982\u6d41\u5f0f\u6570\u636e\u5904\u7406":11,"\u6bd4\u5982imagenet\u8fd9\u4e2a\u6570\u636e\u96c6\u53ef\u80fd\u88ab\u5206\u62101000\u4e2ashard":11,"\u6ca1\u6709\u5fc5\u8981\u5728\u6bcf\u6b21\u524d\u5411\u4e2d\u6bcf\u4e2a\u65f6\u95f4\u6b65\u7684\u8ba1\u7b97\u65f6\u5bf9\u6743\u91cd\u8fdb\u884c\u91cd\u590d\u7684packing\u64cd\u4f5c":41,"\u6ce8":[10,63],"\u6ce8\u518clayer\u7684\u65f6\u5019\u4fdd\u8bc1":[41,42],"\u6ce8\u610f":42,"\u6d4b\u8bd5\u5206\u4e3a\u6bcf\u4e2alayer":42,"\u6d4b\u8bd5\u672c\u6b21release\u7684\u6b63\u786e\u6027":63,"\u6d4b\u8bd5\u7684\u6027\u80fd\u5bf9\u6bd4\u7ed3\u679c\u4f1a\u5728":42,"\u6d6e\u70b9\u578b\u6570\u636e":11,"\u6df1\u5165paddlepaddl":42,"\u6dfb\u52a0":41,"\u6dfb\u52a0\u7684\u76f8\u5173\u6587\u4ef6\u548c\u76ee\u5f55\u7ed3\u6784\u5982\u4e0b":[41,42],"\u6fc0\u6d3b\u51fd\u6570\u662f\u72ec\u7acb\u4e8e":42,"\u70b9\u51fb":63,"\u7136\u540e\u5728\u524d\u5411":41,"\u7136\u540e\u5728etcd\u7684":10,"\u7136\u540e\u5c31\u53ef\u4ee5\u5e76\u53d1\u5199\u5165\u591a\u4e2achunk":27,"\u7136\u540e\u624d\u80fd\u4f7f\u7528pfsclient":27,"\u7136\u540e\u6309\u7167\u4e0a\u8ff0\u7684\u65b9\u6cd5":63,"\u7136\u540e\u70b9\u51fb":63,"\u7248\u672c\u5206\u652f":63,"\u7248\u672c\u53f7":63,"\u7248\u672c\u53f7\u5bf9\u5e94\u7684tag\u5373\u53ef":63,"\u7248\u672c\u53f7rc":63,"\u7248\u672cfork\u51fa\u81ea\u5df1\u7684\u529f\u80fd\u5206\u652f":63,"\u7279\u6709\u7684\u8bbe\u5907id":42,"\u73b0\u9636\u6bb5\u7684\u4f18\u5316\u4e3b\u8981\u9488\u5bf9":41,"\u73b0\u9636\u6bb5paddle\u6709\u4e00\u4e2a\u95ee\u9898\u662f":45,"\u751f\u4ea7\u73af\u5883\u4e2d\u7684\u8bad\u7ec3\u6570\u636e\u96c6\u901a\u5e38\u4f53\u79ef\u5f88\u5927":11,"\u751f\u4ea7\u73af\u5883\u7684\u65e5\u5fd7\u6570\u636e\u4f1a\u901a\u8fc7\u5b9e\u65f6\u6d41\u7684\u65b9\u5f0f":11,"\u751f\u6210\u5404\u79cd\u8bed\u8a00\u7684\u7ed1\u5b9a\u4ee3\u7801":45,"\u751f\u6210\u6587\u6863":45,"\u751f\u6210\u7684":11,"\u751f\u6210\u7ed9\u5b9a":11,"\u751f\u6210api\u6587\u6863":45,"\u751f\u6210pfsclient\u548cpfsserver\u7684\u6846\u67b6\u90e8\u5206":27,"\u7528":27,"\u7528\u4e8e\u6d4b\u8bd5\u548c\u5bf9\u6bd4\u5728\u4f7f\u7528mkl":42,"\u7528\u4e8e\u7ba1\u7406mkl":42,"\u7528\u4e8e\u9009\u62e9\u662f\u5426\u4f7f\u7528\u76f8\u5173\u529f\u80fd":41,"\u7528\u4e8e\u9009\u62e9\u662f\u5426\u4f7f\u7528mkl":42,"\u7528\u4e8emkl":[41,42],"\u7528\u6237\u4e0a\u4f20\u6570\u636e\u540e":11,"\u7528\u6237\u4e5f\u53ef\u4ee5\u4e0a\u4f20label":11,"\u7528\u6237\u53ef\u4ee5\u5b89\u5168\u7684\u91ca\u653e\u67d0\u4e2ac":46,"\u7528\u6237\u53ef\u4ee5\u628a\u81ea\u5df1\u7684\u6570\u636e\u5206\u4eab\u7ed9\u522b\u4eba":11,"\u7528\u6237\u53ef\u4ee5\u76f4\u63a5\u4f7f\u7528\u8fd9\u4e2a\u52a8\u6001\u5e93\u6765\u5f15\u5165paddl":46,"\u7528\u6237\u5728\u672c\u5730\u8f6c\u6362\u597d\u518d\u4e0a\u4f20":11,"\u7528\u6237\u6587\u4ef6\u53ef\u80fd\u662f\u6bd4\u8f83\u5927\u7684":27,"\u7528\u6237\u901a\u8fc7c":46,"\u7531\u4e8e\u5728\u73b0\u6709\u7684\u67d0\u4e9b\u60c5\u51b5\u4e0b":41,"\u7531\u4e8e\u5bf9parameters\u7684\u66f4\u65b0\u9700\u8981\u83b7\u53d6parameters\u5185\u5b58\u7684":10,"\u7531\u4e8e\u96c6\u7fa4\u4e2d\u540c\u65f6\u5b58\u5728\u4e24\u53f0\u673a\u5668\u6545\u969c\u7684\u6982\u7387\u6781\u4f4e":10,"\u7531\u4e8ec":45,"\u7531\u4e8echunk\u6bd4\u8f83\u5c0f":27,"\u7531\u4e8emkl":42,"\u7531\u4e8epypi":63,"\u7531\u5206\u652f\u5904\u7684layer\u8d1f\u8d23\u6c42\u548c":42,"\u7533\u8bf7\u7528\u6237\u7a7a\u95f4":27,"\u7684\u4e00\u4e2a\u5b50\u96c6":42,"\u7684\u4fe1\u606f":42,"\u7684\u5355\u5143\u6d4b\u8bd5\u548c\u7b80\u5355\u7f51\u7edc\u7684\u6574\u4f53\u6d4b\u8bd5":42,"\u7684\u547d\u540d\u98ce\u683c\u5e76\u4e0d\u80fd\u9002\u5e94\u5176\u4ed6\u7b2c\u4e09\u65b9\u8bed\u8a00":45,"\u7684\u57fa\u672c\u903b\u8f91":42,"\u7684\u5934\u6587\u4ef6":45,"\u7684\u5b50\u7c7b\u53ea\u9700\u8981\u4f7f\u7528\u5185\u90e8\u5b58\u50a8\u5c31\u53ef\u4ee5\u4e86":42,"\u7684\u60c5\u51b5\u4e0b":41,"\u7684\u63a5\u53e3\u6837\u5f0f":45,"\u7684\u6570\u636e\u6d41\u56fe":11,"\u7684\u65f6\u5019":42,"\u7684\u683c\u5f0f\u59cb\u7ec8\u662f":42,"\u7684\u683c\u5f0f\u5b58\u50a8":42,"\u7684\u6982\u5ff5":42,"\u7684\u6e90\u7801\u91cc\u4f7f\u7528\u4e86":45,"\u7684\u7248\u672c":63,"\u7684\u7ed3\u679c":41,"\u7684\u7f29\u5199":27,"\u7684\u7f51\u7edc\u6a21\u578b":41,"\u7684\u89c4\u8303":45,"\u7684\u89d2\u5ea6":11,"\u7684\u914d\u7f6e\u5199\u5230\u914d\u7f6e\u6587\u4ef6\u4e2d":11,"\u7684flag":[41,42],"\u7684vanilla":41,"\u76ee\u524d\u53ea\u8003\u8651":42,"\u76ee\u524d\u53ea\u8003\u8651\u52a8\u6001\u6269\u5bb9trainer\u6570\u91cf":10,"\u76ee\u524d\u5728paddlepaddle\u4e2d":42,"\u76ee\u524d\u5728paddlepaddle\u4e2d\u6570\u636e\u90fd\u662f\u4ee5":42,"\u76ee\u524d\u5d4c\u5165python\u89e3\u91ca\u5668":45,"\u76ee\u524d\u6211\u4eec\u7528cephfs\u6765\u642d\u5efa":27,"\u76ee\u524d\u7684\u4f18\u5316":42,"\u76ee\u524dpaddle\u7684\u8fdb\u7a0b\u6a21\u578b\u662fc":45,"\u76ee\u524dpaddlepaddle\u91c7\u7528\u4e86":41,"\u76ee\u5f55\u4e0b":46,"\u76ee\u5f55\u4e0b\u5bf9\u5e94\u7684\u5730\u65b9":42,"\u76f4\u63a5\u4f7f\u7528c\u8bed\u8a00\u7684":45,"\u76f4\u63a5\u5220\u9664\u8fd9\u4e2a\u53c2\u6570\u5373\u53ef":46,"\u76f4\u63a5\u5bfc\u51fa\u5230c\u7684\u63a5\u53e3\u6bd4\u8f83\u56f0\u96be":45,"\u76f8\u5173\u5c42":41,"\u77e9\u9635\u5927\u5c0f\u662f":41,"\u793e\u533a\u53c2\u4e0e\u56f0\u96be":45,"\u793e\u533a\u8d21\u732e\u4ee3\u7801\u5b66\u4e60\u6210\u672c\u9ad8":45,"\u795e\u7ecf\u7f51\u7edc\u4e2d\u7684\u53c2\u6570":10,"\u79bb\u7ebf\u6279\u5904\u7406":11,"\u7b2c\u4e00\u4e2atag\u4e3a":63,"\u7b2c\u4e09\u6b65\u5b8c\u6210\u540e":63,"\u7b2c\u4e8c\u4e2a\u4e3a":63,"\u7b49":[42,46],"\u7b49\u5168\u90e8\u9759\u6001\u5e93\u4e2d\u7684\u76ee\u6807\u6587\u4ef6\u5168\u90e8\u6253\u5305\u540e\u4ea7\u751f\u7684\u6587\u4ef6":46,"\u7b49\u5f85\u7f16\u8bd1\u5b8c\u6210\u540e":63,"\u7b49\u6587\u4ef6":46,"\u7c7b\u4f3c":46,"\u7c7b\u540d\u548cc":45,"\u7c7b\u578b":45,"\u7ed3\u8bba":45,"\u7edf\u4e00\u7528":11,"\u7f16\u8bd1\u5668\u6ca1\u6709":45,"\u7f16\u8bd1\u578b\u8bed\u8a00":45,"\u7f16\u8bd1\u65f6\u4f1a\u628a\u5bf9\u5e94\u7684\u5934\u6587\u4ef6\u548c\u5e93\u653e\u5728":42,"\u7f16\u8bd1\u8fd9\u4e2a\u7248\u672c\u7684docker\u53d1\u884c\u955c\u50cf":63,"\u7f16\u8bd1\u8fd9\u4e2a\u7248\u672c\u7684python":63,"\u7f16\u8bd1c":46,"\u800c\u4e0d\u5fc5\u5728\u610fpaddl":46,"\u800c\u4e0d\u652f\u6301pypy\u89e3\u91ca\u5668":45,"\u800c\u4e0d\u66b4\u9732\u6982\u5ff5\u7684\u5b9e\u73b0":46,"\u800c\u4e14\u5728\u4f20\u8f93\u7684\u8fc7\u7a0b\u4e2d\u4e5f\u53ef\u80fd\u51fa\u73b0\u7f51\u7edc\u4e0d\u7a33\u5b9a\u7684\u60c5\u51b5":27,"\u800c\u51fa\u73b0\u9636\u6bb5\u6027\u7684\u8fd0\u884c\u505c\u6ede":10,"\u800c\u5728cpp\u91cc\u9762\u5b9e\u73b0\u8fd9\u4e2ac\u7684\u63a5\u53e3":45,"\u800c\u591a\u8bed\u8a00\u63a5\u53e3\u9700\u8981\u76f4\u63a5\u8bfb\u53d6\u751f\u6210\u7684\u4e8c\u8fdb\u5236":45,"\u800c\u5bf9\u4e8egolang":45,"\u800c\u5bf9\u4e8egolang\u9519\u8bef\u5904\u7406\u5e94\u8be5\u4f7f\u7528\u8fd4\u56de\u503c":45,"\u800c\u662f\u76f4\u63a5\u4fee\u6539paddl":46,"\u800c\u662f\u76f4\u63a5\u7528api\u7684\u63a5\u53e3\u8fdc\u7a0b\u8bbf\u95ee":11,"\u800cswig\u53ea\u80fd\u7b80\u5355\u7684\u66b4\u9732c":45,"\u81ea\u52a8\u6302\u8f7d\u5206\u5e03\u5f0f\u5b58\u50a8\u76ee\u5f55":10,"\u81f3\u4e8e\u4e3a\u4ec0\u4e48\u9700\u8981c":46,"\u826f\u597d\u7684\u6587\u6863":45,"\u8282\u7701\u4e86\u4e0d\u5fc5\u8981\u7684\u64cd\u4f5c":42,"\u83b7\u53d6\u6700\u65b0\u7684\u68c0\u67e5\u70b9\u7684\u6587\u4ef6uuid":10,"\u867d\u7136\u4e0d\u9f13\u52b1\u8fd9\u6837":46,"\u8868\u793a\u5bf9\u8f93\u5165\u6570\u636e":42,"\u89e3\u91ca\u578b\u8bed\u8a00\u53ea\u80fd\u8c03\u7528\u52a8\u6001\u5e93":45,"\u89e3\u91ca\u6027\u8bed\u8a00\u5b9e\u9645\u8fd0\u884c\u7684\u4e8c\u8fdb\u5236\u662f\u89e3\u91ca\u5668\u672c\u8eab":45,"\u8ba1\u5212\u5728":[41,42],"\u8ba1\u7b97\u8fd9\u4e2a\u6587\u4ef6\u7684md5":10,"\u8ba9paddle\u6838\u5fc3\u4e2d":46,"\u8bad\u7ec3\u4efb\u52a1\u7684\u8fd0\u884c\u53ef\u80fd\u4f1a\u5360\u6ee1trainer\u548cparamet":10,"\u8bad\u7ec3\u548c\u7eaf\u4f7f\u7528":63,"\u8bad\u7ec3\u6a21\u578b\u6b63\u786e\u6027":63,"\u8bb0\u5f55\u4e0b\u6240\u6709\u5931\u8d25\u7684\u4f8b\u5b50":63,"\u8bbe\u7f6e":46,"\u8bc6\u522b\u6570\u5b57":63,"\u8bcd\u5411\u91cf":63,"\u8be5\u6587\u4ef6\u5bf9\u76f8\u5173gemm":41,"\u8be5\u7c7b\u7ee7\u627f\u4e8epaddlepaddle\u7684\u57fa\u7c7b":42,"\u8be6\u7ec6\u8bbe\u8ba1":27,"\u8bed\u610f\u89d2\u8272\u6807\u6ce8":63,"\u8bf4\u660e":10,"\u8bf7\u53c2\u8003":46,"\u8f6c\u6362\u5185\u5b58\u7684\u5de5\u4f5c":42,"\u8f6c\u6362\u5197\u4f59":41,"\u8f6c\u6362\u51fd\u6570":42,"\u8f6c\u6362\u751f\u6210\u7684\u6587\u4ef6\u540d\u4f1a\u662f\u4ee5\u4e0b\u683c\u5f0f":11,"\u8f6c\u6362\u8017\u65f6":41,"\u8f93\u5165\u68af\u5ea6":42,"\u8f93\u51fa\u6570\u636e\u548c\u8f93\u51fa\u68af\u5ea6":42,"\u8f93\u51fa\u6570\u636e\u548c\u8f93\u51fa\u68af\u5ea6\u7684\u8f6c\u6362":42,"\u8fbe\u5230\u5bb9\u707e\u7684\u76ee\u7684":10,"\u8fc7\u7a0b\u4e2d\u6240\u6709\u65f6\u95f4\u6b65":41,"\u8fd1\u671f\u76ee\u6807":42,"\u8fd4\u56de\u7b2c\u4e8c\u6b65":63,"\u8fd8\u662f\u4ece":11,"\u8fd9\u4e00\u5c42\u8fdb\u884c\u5c01\u88c5":46,"\u8fd9\u4e00\u6570\u636e\u683c\u5f0f\u7684\u8f6c\u6362\u64cd\u4f5c":41,"\u8fd9\u4e00\u6982\u5ff5\u4e0d\u518d\u7410\u788e":46,"\u8fd9\u4e09\u4e2a\u5206\u652f":63,"\u8fd9\u4e2a\u51fd\u6570\u672c\u8eab\u4f1a\u5728\u8ba1\u7b97\u524d\u5c06\u539f\u6570\u636e\u8f6c\u6362\u4e3a\u66f4\u9002\u5408\u82f1\u7279\u5c14\u5e73\u53f0\u7684\u5185\u90e8\u683c\u5f0f":41,"\u8fd9\u4e2a\u52a8\u6001\u5e93\u7684\u8fde\u63a5\u53c2\u6570\u4e0epaddle\u7684\u5176\u4ed6\u4e8c\u8fdb\u5236":46,"\u8fd9\u4e2a\u53c2\u6570\u4e5f\u4e0d\u4f1a\u4e00\u5e76\u5220\u9664":46,"\u8fd9\u4e2a\u5934\u6587\u4ef6\u4e0d\u5047\u8bbe\u5176\u4ed6\u6587\u4ef6\u7684\u5f15\u7528\u987a\u5e8f":46,"\u8fd9\u4e2a\u63a5\u53e3\u9700\u8981\u505a\u5230":45,"\u8fd9\u4e2a\u6587\u4ef6\u5177\u6709\u72ec\u7279\u7684\u8bed\u6cd5":45,"\u8fd9\u4e2a\u76ee\u5f55\u4e2d\u9664\u4e86":46,"\u8fd9\u4e2a\u7ed3\u6784\u4f53\u4e2d\u7684\u53e6\u4e00\u4e2a\u9879\u76ee\u662f":46,"\u8fd9\u4e2a\u7ed3\u6784\u4f53\u5305\u542b\u4e24\u4e2a\u9879\u76ee":46,"\u8fd9\u4e2a\u9009\u62e9":[41,42],"\u8fd9\u4e2a\u9759\u6001\u5e93\u5305\u542b\u4e86paddle\u7684\u5168\u90e8\u7b26\u53f7":46,"\u8fd9\u4e2ainstance\u53ef\u4ee5\u662f\u5355\u4e2a\u503c":11,"\u8fd9\u4e9b\u4f1a\u5728":[41,42],"\u8fd9\u4e9b\u51fd\u6570\u4f1a\u6839\u636e\u8f93\u5165\u53c2\u6570\u91cd\u65b0\u8bbe\u7f6e\u5185\u90e8\u548c\u5916\u90e8\u5b58\u50a8":42,"\u8fd9\u4e9b\u5206\u5e03\u5f0f\u5b58\u50a8\u670d\u52a1\u901a\u5e38\u4f1a\u628a\u6570\u636e\u5207\u5272\u6210\u591a\u4e2a\u5206\u7247\u5206\u5e03\u5f0f\u7684\u5b58\u50a8\u5728\u591a\u4e2a\u8282\u70b9\u4e4b\u4e0a":11,"\u8fd9\u4e9b\u955c\u50cf\u4e5f\u53ef\u4ee5\u4ece":63,"\u8fd9\u5bf9\u4e8e\u901a\u5e38\u7684java\u7684\u5f00\u53d1\u8005\u6765\u8bf4":45,"\u8fd9\u662f\u56e0\u4e3a":45,"\u8fd9\u6837":46,"\u8fd9\u6837\u4e0b\u4e00\u4e2acpu":42,"\u8fd9\u6837\u4fdd\u8bc1":63,"\u8fd9\u6837\u5c31\u53ef\u4ee5\u5728\u4e91\u7aef\u6267\u884c\u591a\u79cd\u6570\u636e\u7c7b\u8ba1\u7b97\u4efb\u52a1":11,"\u8fd9\u6837\u5df2\u7ecf\u4f20\u8f93\u6210\u529f\u7684\u90e8\u5206\u5c31\u4e0d\u7528\u91cd\u65b0\u4f20\u8f93\u4e86":27,"\u8fd9\u6837\u5e26\u6765\u7684\u597d\u5904\u5c31\u662f\u4e0d\u9700\u8981\u4e00\u76f4\u6e05\u7a7amemori":42,"\u8fd9\u6837\u65e2\u4f7f\u5f97\u6700\u7ec8\u4fdd\u5b58\u7684\u53c2\u6570\u683c\u5f0f\u4e0epaddlepaddle\u4e00\u81f4":42,"\u8fd9\u90fd\u9700\u8981\u8fd9\u4e2a\u63a5\u53e3\u6309\u7167\u7ea6\u5b9a\u4fd7\u6210\u7684\u89c4\u5219\u6765\u6ce8\u91ca\u5b8c\u5907":45,"\u8fd9\u91cc":42,"\u8fd9\u91cc\u7684dockerimage\u4f5c\u4e3a\u7f16\u8bd1\u73af\u5883\u4ee5\u652f\u6301\u66f4\u591a\u7684linux":63,"\u8fd9\u91cc\u9009\u62e90":63,"\u8fd9\u91cc\u9700\u8981\u7528\u6237\u989d\u5916\u6ce8\u610f":10,"\u8fdb\u4e00\u6b65\u4f18\u5316":42,"\u8fdb\u5165":63,"\u8fdb\u800c\u8fdb\u884c\u4ee3\u7801\u8bc4\u5ba1":63,"\u9009\u62e9\u662f\u5426\u7f16\u8bd1mkl":42,"\u9009\u62e9\u9700\u8981\u53d1\u5e03\u7684\u7248\u672c":63,"\u900f\u4f20\u7528\u6237\u8eab\u4efd\u7684\u529e\u6cd5":27,"\u901a\u5e38":46,"\u901a\u5e38\u5305\u542b\u4e00\u4e2acpu\u7248\u672c\u548c\u4e00\u4e2agpu\u7248\u672c":63,"\u901a\u5e38\u6307\u5c06\u4e00\u4e2a\u6574\u4f53\u62c6\u5206\u6210\u591a\u4efd\u7684\u5176\u4e2d\u7684\u4e00\u4efd":10,"\u901a\u8fc7\u4f7f\u7528\u8fd9\u4e9bapi":41,"\u901a\u8fc7\u6a21\u578b\u63a8\u65adapi\u7684\u5b9e\u73b0\u4f5c\u4e3a\u4e00\u4e2a\u6837\u4f8b":46,"\u903b\u8f91\u5212\u4e0a\u6587\u4ef6\u5206\u5757\u7684\u5355\u4f4d":27,"\u9075\u5faa\u4ee5\u4e0b\u6d41\u7a0b":63,"\u90a3\u4e48":46,"\u90a3\u4e48\u5bf9\u5e94\u7684\u5185\u90e8\u5b58\u50a8\u4e5f\u4f1a\u4e0e\u5b83\u4eec\u5171\u4eab\u5185\u5b58":42,"\u90a3\u4e48\u5c31\u4f1a\u4f7f":42,"\u90fd\u4e0d\u4f1a\u60f3\u8981\u77e5\u9053next":42,"\u90fd\u662f\u4e94\u4f4d\u7684\u6570\u5b57":11,"\u90fd\u662f\u4ee5ext\u5f00\u5934":42,"\u90fd\u662fabi\u8c03\u7528\u6807\u51c6\u7684":45,"\u90fd\u7ee7\u627f\u4e8epaddlepaddle\u7684\u57fa\u7c7b":41,"\u914d\u7f6e\u7684\u65b9\u6cd5\u53c2\u8003":27,"\u91ca\u653e\u5bf9paramters\u5185\u5b58\u7684\u9501\u5b9a":10,"\u91cc\u6240\u6709\u7684\u7b26\u53f7\u90fd\u5199\u5165\u81ea\u5df1\u7684\u7a0b\u5e8f\u7684\u4e8c\u8fdb\u5236\u6587\u4ef6\u91cc":45,"\u91cc\u9009\u62e9\u9700\u8981\u53d1\u5e03\u7684\u5206\u652f":63,"\u91cc\u9762\u6dfb\u52a0":42,"\u91cd\u5199\u7236\u7c7blayer\u7684":42,"\u91cd\u547d\u540d\u6210":45,"\u94fe\u63a5\u5230\u81ea\u5df1\u7684\u7a0b\u5e8f\u91cc":45,"\u9519\u8bef\u5904\u7406":45,"\u9519\u8bef\u5904\u7406\u65b9\u5f0f\u662f\u8fd4\u56de\u503c":45,"\u9519\u8bef\u5904\u7406\u7684\u65b9\u5f0f\u4e5f\u4e0d\u5c3d\u76f8\u540c":45,"\u9664\u6784\u9020\u67d0\u79cd\u7c7b\u578b\u7684\u51fd\u6570":46,"\u96c6\u6210\u5230":41,"\u96c6\u6210\u5230paddlepaddl":42,"\u9700\u8981":11,"\u9700\u8981\u4fee\u6539build":63,"\u9700\u8981\u53ef\u4ee5\u8de8\u5e73\u53f0\u6267\u884c":27,"\u9700\u8981\u5728cmake\u7684\u65f6\u5019":46,"\u9700\u8981\u5c06bugfix\u7684\u5206\u652f\u540c\u65f6merge\u5230":63,"\u9700\u8981\u5f15\u7528":46,"\u9700\u8981\u6709\u7a33\u5b9a\u7684\u5bfc\u51fa\u7b26\u53f7":45,"\u9700\u8981\u6ce8\u610f\u7684\u662f":[42,63],"\u9700\u8981\u7d2f\u52a0\u4e0d\u540clayer\u4f20\u8fc7\u6765\u7684\u68af\u5ea6":42,"\u9700\u8981\u88ab\u66b4\u9732\u5230\u5176\u4ed6\u8bed\u8a00":46,"\u9700\u8981\u91cd\u547d\u540dwheel\u5305\u4e2dplatform\u76f8\u5173\u7684\u540e\u7f00":63,"\u9ed8\u8ba4256k":27,"\u9ed8\u8ba4\u8bbe\u7f6e\u4e3a":41,"abstract":[18,19,26,30,53,62,64,74,105],"api\u4e2d\u4f7f\u7528":45,"api\u5bfc\u51fa\u7684\u52a8\u6001\u5e93":46,"api\u5bfc\u51fa\u7684\u9759\u6001\u5e93":46,"api\u63a5\u53d7\u7684\u7c7b\u578b\u5168\u662f":46,"api\u63a5\u53e3":27,"api\u63a5\u53e3\u7684\u53c2\u6570\u8f6c\u53d1\u7ed9":46,"api\u65f6":46,"api\u65f6\u6240\u552f\u4e00\u9700\u8981\u5f15\u5165\u7684\u5934\u6587\u4ef6":46,"api\u662f\u591a\u8bed\u8a00api\u7684\u57fa\u7840\u90e8\u5206":46,"api\u66b4\u9732\u7684\u7c7b\u578b":46,"api\u751f\u6210\u7684\u4e8c\u8fdb\u5236\u6587\u4ef6\u4f1a\u88ab\u5b89\u88c5\u5230":46,"api\u7684\u5b9e\u4f8b":46,"api\u7684\u5b9e\u73b0\u7ec6\u8282":46,"api\u7684\u63a5\u53e3":46,"api\u7684\u65f6\u5019\u63a8\u8350paddle\u4e0d\u5d4c\u5165python\u89e3\u91ca\u5668":46,"api\u7684\u7f16\u8bd1\u9009\u9879\u9ed8\u8ba4\u5173\u95ed":46,"api\u76ee\u5f55\u7ed3\u6784\u5982\u4e0a\u56fe\u8868\u6240\u793a":46,"api\u83b7\u5f97\u4e86\u795e\u7ecf\u7f51\u7edc\u7684\u53c2\u6570\u5b9e\u4f8b":46,"apis\u505a\u4e86\u5c01\u88c5":41,"block\u6784\u6210\u4e00\u4e2amodel":10,"book\u4e2d\u6240\u6709\u7ae0\u8282\u529f\u80fd\u7684\u6b63\u786e\u6027":63,"boolean":[26,28,36,45,69],"break":[8,67,69,70,71],"bugfix\u5206\u652f\u4e5f\u662f\u5728\u5f00\u53d1\u8005\u81ea\u5df1\u7684fork\u7248\u672c\u5e93\u7ef4\u62a4":63,"bugfix\u5206\u652f\u9700\u8981\u5206\u522b\u7ed9\u4e3b\u7248\u672c\u5e93\u7684":63,"byte":[27,44],"c99\u662f\u76ee\u524dc\u6700\u5e7f\u6cdb\u7684\u4f7f\u7528\u6807\u51c6":45,"c\u6709\u6807\u51c6\u7684abi":45,"c\u8bed\u8a00\u662f\u6709\u5bfc\u51fa\u7b26\u53f7\u7684\u6807\u51c6\u7684":45,"case":[12,18,20,21,26,30,40,46,53,57,59,60,69,74,75,93,97,106,110,116,121],"char":14,"ci\u73af\u5883\u4f7f\u7528":63,"ci\u7f16\u8bd1wheel\u5b8c\u6210\u540e\u4f1a\u81ea\u52a8\u5c06docker\u955c\u50cfpush\u5230dockerhub":63,"class":[4,7,18,19,20,21,24,25,29,30,31,32,34,35,37,40,45,49,50,55,56,60,61,62,64,65,66,68,70,75,76,77,104,111],"compute\u51fd\u6570":41,"const":[7,12,14,19,29,31,37,38,39,54,55,57,61,64,66,68,70,74,75,76,77],"core\u4e2d\u7684\u6a21\u578b\u8fd8\u5728\u4f7f\u7528\u8fd9\u4e2a\u53c2\u6570":46,"core\u4e2d\u8fd9\u4e00\u7c7b\u578b\u63a5\u53e3\u7684\u667a\u80fd\u6307\u9488":46,"core\u662f\u5426\u8fd8\u5728\u4f7f\u7528\u8fd9\u4e2a\u5b9e\u4f8b":46,"core\u6982\u5ff5":46,"data\u5230\u5206\u5e03\u5f0f\u5b58\u50a8\u8865\u5145\u8bad\u7ec3\u6570\u636e":11,"default":[0,1,3,4,7,8,18,20,24,33,37,44,47,48,57,58,64,65,66,69,71,72,75,76,92,95,97,99,105,107,109,118,119,120,121],"device\u5c31\u80fd\u62ff\u5230\u6b63\u786e\u7684\u6570\u636e":42,"dnn\u4e09\u8005\u5173\u7cfb\u5982\u4e0b\u8868":42,"dnn\u4e2d\u7684":42,"dnn\u4e2d\u7684\u6392\u5217\u65b9\u5f0f\u4e0d\u6b62\u8fd9\u4e00\u79cd":42,"dnn\u4f1a\u4f5c\u4e3a\u7b2c\u4e09\u65b9\u5e93\u96c6\u6210\u8fdbpaddlepaddl":42,"dnn\u4f1a\u7528\u5230":42,"dnn\u5171\u540c\u4f7f\u7528":42,"dnn\u524d\u540e\u7684cnn\u7f51\u7edc\u6027\u80fd":42,"dnn\u5728\u53d1\u5e03":42,"dnn\u5b9e\u73b0":42,"dnn\u5e0c\u671b\u7684\u683c\u5f0f":42,"dnn\u6570\u636e\u7684\u4e0d\u540c\u683c\u5f0f\u4ee5\u53ca\u76f8\u4e92\u4e4b\u95f4\u7684\u8f6c\u6362":42,"dnn\u7684":42,"dnn\u7684\u5e93\u76ee\u524d\u53ea\u6709\u52a8\u6001\u5e93":42,"dnn\u7684\u6027\u80fd":42,"dnn\u7684\u60c5\u51b5\u4e0b":42,"dnn\u7684\u64cd\u4f5c\u90fd\u662f\u76f4\u63a5\u8986\u76d6\u7684\u5f62\u5f0f":42,"dnn\u7684\u6d4b\u8bd5":42,"dnn\u7684\u73af\u5883\u4e0b":42,"dnn\u7684\u76f8\u5173\u529f\u80fd":42,"dnn\u7684\u7ed3\u679c":42,"dnn\u7684\u9ad8\u6027\u80fd\u683c\u5f0f\u4e0epaddlepaddle\u539f\u6709\u7684":42,"dnn\u7684layer":42,"dnn\u7684layers\u90fd\u4f1a\u7ee7\u627f\u4e8e":42,"enum":[12,14,20,47,55,56,65,66,71,76],"export":[1,30,35,78,92],"final":[5,6,21,35,48,49,67,70,74,75],"float":[24,29,37,39,66,68,74,75,76,77,107,110],"function":[4,6,7,9,13,14,15,17,18,19,20,21,24,25,29,31,34,37,43,48,49,53,54,55,56,57,59,60,61,62,64,66,70,74,75,76,77,92,93,105,109,110,116,121],"golang\u53ef\u4ee5\u4f7f\u7528":45,"golang\u7684":45,"gpu\u7b49":63,"h\u5e76\u4e0d\u56f0\u96be":45,"images\u6570\u636e\u96c6\u4e0a\u4f20\u5230\u4e91\u7aef\u7684":11,"import":[3,4,7,8,18,20,23,31,33,35,36,43,48,49,56,64,67,75,87,92,93,97,110,116,120],"ingress\u9700\u8981\u628apfsclient\u7684\u8eab\u4efd\u4fe1\u606f\u4f20\u7ed9pfsserv":27,"instance\u4e0e\u751f\u6210\u6570\u636e\u96c6\u65f6":11,"instance\u5305\u6db5\u4e24\u4e2a\u503c":11,"instance\u662f\u4e00\u6a21\u4e00\u6837\u7684":11,"int":[6,7,12,13,14,17,18,20,22,36,37,41,42,43,45,46,55,56,58,59,65,66,68,70,74,76,77,92,107],"interface\u6587\u4ef6\u7684\u5199\u6cd5\u975e\u5e38":45,"layer\u65f6":42,"layer\u7684\u540e\u9762\u63a5\u6709cpu":42,"list\u4f5c\u4e3a\u68c0\u67e5\u5217\u8868":63,"long":[20,110],"mkl\u5e93\u7684":41,"mklml\u4ee5\u53camkl":42,"mklml\u53ef\u4ee5\u4e0emkl":42,"mklml\u7684\u5e93\u76ee\u524d\u90fd\u662f\u52a8\u6001\u5e93":42,"mode\u4e0b\u7684\u7ed3\u679c":41,"model\u505a\u5206\u652f\u7ba1\u7406":63,"ndarray\u7c7b\u578b\u7684\u503c\u548c\u6574\u578b\u7684\u503c":11,"new":[0,5,6,7,8,9,12,13,14,15,16,19,20,21,24,29,30,40,41,43,47,49,53,58,59,60,62,66,67,70,72,87,97,99,121],"note\u7684\u4e66\u5199":63,"null":[35,74,105],"op\u7684\u4fe1\u606f":42,"openmp\u7528\u4e8e\u63d0\u9ad8mklml\u7684\u6027\u80fd":42,"org\u76ee\u524d\u9075\u5faa":63,"packed\u4f18\u5316\u540elayer\u7684\u6d4b\u8bd5":41,"packed\u76f8\u5173\u529f\u80fd":41,"paddle\u4e00\u4e2a\u52a8\u6001\u5e93\u53ef\u4ee5\u5728\u4efb\u4f55linux\u7cfb\u7edf\u4e0a\u8fd0\u884c":45,"paddle\u5185\u5d4c\u7684python\u89e3\u91ca\u5668\u548c\u5916\u90e8\u4f7f\u7528\u7684python\u5982\u679c\u7248\u672c\u4e0d\u540c":45,"paddle\u5185\u90e8\u7684\u7c7b\u4e3ac":45,"paddle\u7684\u591a\u8bed\u8a00\u63a5\u53e3\u5b9e\u73b0\u5305\u62ec\u4e00\u4e0b\u51e0\u4e2a\u65b9\u9762":45,"paddle\u7684\u7c7b\u578b\u5168\u90e8\u9000\u5316\u6210":46,"paddle\u7684\u94fe\u63a5\u65b9\u5f0f\u6bd4\u8f83\u590d\u6742":45,"paddle\u7684c":46,"paddle\u8bad\u7ec3\u4efb\u52a1":11,"paddle\u8def\u5f84\u4e0b":46,"paddle\u9700\u8981\u4e00\u4e2a\u591a\u8bed\u8a00\u63a5\u53e3":45,"paddle\u9700\u8981\u66b4\u9732\u7684api\u5f88\u591a":46,"paddle\u9759\u6001\u5e93\u94fe\u63a5\u590d\u6742":45,"paddle_\u7c7b\u578b\u540d":46,"paddle_\u7c7b\u578b\u540d_\u51fd\u6570\u540d":46,"paddlepaddle\u4e2d\u7684cudnn\u90e8\u5206\u4f7f\u7528\u7684\u4e5f\u662f":42,"paddlepaddle\u4f7f\u7528git":63,"paddlepaddle\u5f00\u53d1\u8fc7\u7a0b\u4f7f\u7528":63,"paddlepaddle\u63d0\u4f9b\u4e13\u7528\u7684":11,"paddlepaddle\u6bcf\u6b21\u53d1\u65b0\u7684\u7248\u672c":63,"paddlepaddle\u6bcf\u6b21\u53d1\u7248\u672c\u9996\u5148\u8981\u4fdd\u8bc1paddlepaddl":63,"paddlepaddle\u7684\u4e3b\u7248\u672c\u5e93\u9075\u5faa":63,"paddlepaddle\u7684activation\u4f1a\u76f4\u63a5\u4f7f\u7528":42,"patch\u53f7":63,"patch\u53f7\u52a0\u4e00":63,"pfsclient\u9700\u8981\u548cingress\u4e4b\u95f4\u505a\u53cc\u5411\u9a8c\u8bc1":27,"pfsclient\u9700\u8981\u5728\u4f20\u8f93\u5b8c\u6bd5\u6700\u540e\u4e00\u4e2achunk\u7684\u65f6\u5019\u68c0\u67e5destination\u6587\u4ef6\u7684md5\u503c\u662f\u5426\u548csource\u6587\u4ef6\u4e00\u81f4":27,"pfsserver\u63d0\u4f9brest":27,"public":[7,19,29,32,37,55,61,64,66,67,68,70,74,75,76,77,92,97,99],"py\u4e2d":63,"pypi\u4e0a\u7684package\u540d\u79f0\u4e3apaddlepaddle\u548cpaddlepaddl":63,"pypi\u4e0d\u652f\u6301\u8986\u76d6\u4e0a\u4f20":63,"reader\u7684\u4f7f\u7528\u65b9\u5f0f\u90fd\u662f\u4e00\u81f4\u7684":11,"reader\u8f93\u51fa\u7684data":11,"resnet\u7684mkl":42,"return":[4,5,6,7,11,12,14,17,18,19,20,25,29,31,32,33,35,37,38,40,43,48,49,50,55,56,57,61,64,66,68,70,74,75,76,77,93,97,116],"rnn\u90e8\u5206\u4e2d":41,"s3\u4e4b\u7c7b\u7684\u5206\u5e03\u5f0f\u5b58\u50a8\u4e4b\u4e0a":11,"server\u4e4b\u4e0a":10,"server\u4e4b\u95f4\u7684\u7f51\u7edc\u5e26\u5bbd":10,"server\u4f1a\u6682\u505c\u53c2\u6570\u66f4\u65b0\u5e76\u7b49\u5f85":10,"server\u4f1a\u83b7\u53d6parameters\u5185\u5b58\u7684":10,"server\u5185\u5b58\u4e2d\u7684\u6a21\u578b\u6570\u636e\u7684\u5b8c\u6574\u955c\u50cf":10,"server\u540c\u6b65\u7684\u4fdd\u5b58\u4e00\u4e2a\u7279\u5b9a\u65f6\u95f4\u70b9\u7684\u5168\u5c40\u68c0\u67e5\u70b9":10,"server\u5728\u96c6\u7fa4\u4e2d\u542f\u52a8\u540e":10,"server\u6545\u969c\u540e\u88abkubernetes\u91cd\u65b0\u542f\u52a8":10,"server\u6b64\u65f6\u8fd8\u9700\u8981\u901a\u8fc7\u7f51\u7edc\u8bbf\u95ee\u5206\u5e03\u5f0f\u5b58\u50a8\u4ee5\u4fdd\u5b58\u5feb\u7167":10,"server\u751f\u6210\u4e00\u4e2auuid":10,"server\u7684\u5355\u70b9\u6216\u591a\u70b9\u540c\u65f6\u6545\u969c":10,"server\u7684\u6570\u636e\u5feb\u7167":10,"server\u7684\u68c0\u67e5\u70b9\u5404\u81ea\u72ec\u7acb\u4fdd\u5b58":10,"server\u7b2c\u4e00\u6b21\u542f\u52a8\u6216\u4efb\u610f\u65f6\u95f4paramet":10,"short":[29,33,58,64,67,70,75],"static":[14,46,64,66,97,119,121],"super":[58,74],"swig\u652f\u6301\u7684\u8bed\u8a00\u6216\u8005\u89e3\u91ca\u5668\u6709\u5c40\u9650":45,"swig\u66b4\u9732\u7684\u63a5\u53e3\u4fdd\u7559\u4e86c":45,"swig\u751f\u6210\u7684\u4ee3\u7801\u4e0d\u80fd\u4fdd\u8bc1\u591a\u8bed\u8a00\u4ee3\u7801\u98ce\u683c\u7684\u4e00\u81f4\u6027":45,"swig\u76f4\u63a5\u8bfb\u53d6c":45,"swig\u9700\u8981\u5199\u4e00\u4e2ainterface\u6587\u4ef6":45,"switch":[7,46,97],"tag\u4e3a":63,"tag\u53ef\u4ee5\u662flatest\u6216latest":63,"tag\u7684\u66f4\u65b0\u65f6\u95f4\u662f\u5426\u5728\u4e0a\u8ff0\u7f16\u8bd1wheel\u5305\u5b8c\u6210\u540e\u662f\u5426\u6700\u65b0":63,"throw":97,"true":[4,6,7,12,30,36,41,50,52,56,57,58,59,63,66,70,74,92,97,105,107,116],"try":[0,1,3,8,9,12,13,14,30,35,40,43,59,64,67,93,110,119],"type\u5b57\u6bb5\u5747\u4e0d\u5c3d\u76f8\u540c":46,"var":[6,7,18,19,20,24,31,32,34,36,39,50,52,56,57,58,60,64,69,70,78],"void":[7,12,14,19,26,29,31,32,37,39,43,44,45,46,56,57,65,66,68,74,75,76,77],"wheel\u5305":63,"while":[7,16,19,20,30,35,38,40,49,53,54,59,62,64,68,75,77,105,116],AGE:[97,99],AWS:[11,96,101,102],Added:67,And:[12,16,17,26,33,35,47,51,52,55,59,64,68,97,107,116],But:[3,5,32,38,47,55,64,72,94,121],For:[0,1,4,6,7,13,14,15,17,18,19,21,24,25,30,31,32,34,37,39,40,44,47,48,49,53,54,55,56,57,58,59,60,61,62,65,66,67,68,71,72,74,75,76,77,87,92,93,104,105,107,109,110,116,118,120,121],IDE:[0,119],IDs:[16,20,49],IPs:92,IRs:21,Into:97,Its:[31,65,75,97,116],K8s:121,NOT:[58,75],Not:[0,3,4,9,40,67,121],OPs:[21,23,93],One:[5,16,39,44,47,52,64,67,74,105,116],Ops:[60,62,66,75],PFS:27,PRs:78,QoS:99,Such:[37,58,67,70],TLS:[4,27,97],That:[1,19,51,105,107],The:[0,1,3,4,5,6,8,9,13,15,16,17,19,20,21,23,24,25,28,29,31,35,38,39,40,43,44,46,48,49,51,52,53,55,56,57,58,59,62,64,65,66,67,68,70,71,72,74,75,76,77,78,93,94,95,96,97,99,103,105,107,109,110,111,116,118],Their:9,Then:[1,3,18,21,32,37,40,51,55,57,74,93,94,97,99,100,109,110,116],There:[4,7,8,9,14,16,17,20,21,28,29,30,35,40,47,48,49,52,53,54,55,58,62,64,65,68,75,97,110,118,119],These:[0,6,7,19,24,29,34,50,62,65,66,67,71,96,107],Use:[0,4,22,28,59,60,67,74,92,94,97,105,106,109,110,120],Used:[60,68],Uses:[40,119],Using:[0,9,30,53,59,60,62,64,77,99],VMs:0,VPS:97,WITH:120,With:[18,19,24,30,51,56,67,70],YES:17,Yes:[0,1,42],___fc_layer_0__:97,__align__:29,__cuda_align__:29,__device__:29,__doc__:66,__file__:17,__forceinline__:29,__fp16:29,__global__:29,__hadd:29,__half:29,__half_raw:29,__impl__:66,__init__:[24,25,33,40,50,58,70,74,109],__main__:33,__metaclass__:75,__name__:33,__rnn_step__:116,__va_args__:61,__x:29,_addup_repetitive_outputs_:6,_append_backward_ops_:[6,24],_append_backward_vars_:6,_binari:8,_create_global_var:58,_def:40,_dtype:35,_filer:43,_filter:43,_fwd:43,_fwd_pd:43,_input:43,_librari:8,_live_in:40,_live_out:40,_loss:33,_op:[35,75],_output:43,_presucessor:40,_program:40,_remove_no_grad_branch_:6,_reorder_input:43,_source_language_embed:116,_src_input:43,_sucessor:40,_target_language_embed:116,_test:8,_update_op:25,_use:40,_value_index:35,a75:29,a_op:75,a_prev:67,aaaaa:11,aaaaaaaaaaaaa:97,abi:118,abil:33,abl:[4,6,19,20,21,37,50,55,58,121],about:[1,3,5,7,8,17,23,28,31,40,48,59,64,66,67,68,75,87,97,104,105,109,110,118,120],abov:[0,1,2,4,5,6,7,8,9,13,18,20,21,29,30,31,32,34,39,43,48,49,50,51,53,55,56,58,66,67,69,70,72,75,76,77,87,93,97,99,109,110,118,121],abs:[5,33],abs_numerical_grad:5,absolut:[1,5,118,120],acc:21,acceler:[1,10,42,51,53,107],accept:[4,60,118],access:[3,4,8,13,16,17,18,20,21,58,92,116],accessmod:97,accessor:58,accord:[5,6,14,21,23,34,49,60,70,75,92,104,105,107],accordingli:74,account:[60,72,121],accoust:67,accrodingli:12,accumul:[9,14,25,51,52,53,67],accur:[5,16,55],accuraci:[25,67,74],achiev:[19,23,51,52,53,67,68,110],ack:105,acquir:30,across:[6,21,48,67],act1:35,act2:35,act:[7,19,21,35,49,58,70,87,93,111,116],act_output:66,act_typ:35,action:[20,97],activ:[8,35,40,49,52,55,58,62,66,74,77,78,87,105,116],actual:[12,24,30,33,35,39,43,47,53,66,68,77],actual_layout:43,adagrad:[53,65,92],adagradoptim:50,adam:[4,14,21,33],adapt:[39,55],add:[0,5,6,7,8,12,16,19,20,21,23,25,29,32,36,38,50,52,53,57,58,60,62,64,68,69,72,74,75,77,107,110,111,120],add_activ:58,add_bia:58,add_depend:8,add_execut:8,add_input:[48,74],add_memori:48,add_output:48,add_scalar:[7,49,56],add_sum:58,add_test:[8,74],add_two:[7,48],add_unittest_without_exec:74,addattr:[52,66,75],addbia:74,addcom:[66,75],added:[7,23,24,29,47,51,53,62,72,74,75,76],adding:[62,72],addinput:[52,66,75],addit:[6,20,51,55,60,62,70,71,75,77,93],addition:48,addmemori:43,addop:[32,77],addoutput:[52,75],addprimit:43,addprimitivedesc:43,addr:9,address:[1,9,14,18,21,93,95,100,103,105,110,121],addrow:74,addtyp:66,adjust:[6,24],admin:121,administr:[0,16,121],adopt:[29,33],advanc:[5,105,110,116],advantag:[5,29,30,53,59,103],adversari:[33,59],advic:110,affect:7,affili:49,afford:13,aforement:8,after:[0,1,6,7,8,13,14,16,21,22,23,24,26,28,29,40,43,51,54,55,58,67,72,74,75,76,77,95,97,99,103,105,107,109,116,118,119],aftern:55,again:[4,5,9,53,110],against:97,aggreg:[25,51,97],ago:8,agre:72,agreement:72,ahead:67,aid:110,alexnet_pass1:107,alexnet_pass2:107,algo:43,algorithm:[6,13,24,39,40,43,49,53,62,67,116],all:[0,1,4,6,7,8,9,12,14,16,17,18,19,20,21,22,24,26,28,30,33,34,35,38,40,43,44,46,47,48,49,50,51,52,53,55,56,58,60,66,67,68,74,77,92,93,94,95,97,99,104,105,107,110,111,116,118,119,120,121],all_output_nam:6,alloc:[14,17,40,43,52,68,74,77,107,111],allow:[4,14,18,19,21,24,30,52,53,62,72,74,97,105,110],allow_only_one_model_on_one_gpu:[104,105,107],allreduc:[51,52],almost:[0,95],alpha:[8,62],alreadi:[1,8,9,30,43,58,64,95,97,105,110,120],alreali:104,also:[0,1,3,4,6,7,8,12,15,19,20,21,29,30,32,33,34,35,38,40,47,48,49,52,53,54,55,56,57,58,59,62,64,66,67,68,70,71,74,75,78,93,94,99,103,110,116,118,121],altern:[75,109],although:[6,51],altogeth:121,alwai:[8,20,44,65,97,105],amazon:[97,99],amazonaw:97,amazonec2fullaccess:97,amazonelasticfilesystemfullaccess:97,amazonroute53domainsfullaccess:97,amazonroute53fullaccess:97,amazons3fullaccess:97,amazonvpcfullaccess:97,ambigu:[59,67],amd64:97,amd:47,amend:72,amodei:67,among:97,amort:51,amount:110,analys:55,analysi:[55,109,110],analyz:40,ancestor:[56,58],andd:97,andrew:40,android:120,android_abi:118,android_api:118,android_arm_neon:118,android_native_api_level:118,android_standalone_toolchain:118,android_toolchain:118,ani:[0,4,8,9,14,16,17,18,20,21,26,29,30,37,39,40,44,49,51,52,53,58,59,61,62,67,75,77,93,97,100,110,120],annoi:95,announc:29,anoth:[0,4,6,7,17,19,20,30,31,39,43,49,58,64,66,68,97,105],anroid_arm_mod:118,ans:97,answer:[18,30,72,97],anymor:51,anyth:[49,59,97],anytim:33,anywai:[109,118],apach:42,apart:71,api:[0,3,4,6,8,14,15,17,18,19,25,27,32,33,35,48,54,55,60,63,70,71,74,75,92,93,96,97,109,110,114,118,119,120,121],api_shar:8,api_test:8,api_trainer_config_helpers_lay:116,apiserv:97,apivers:[97,99],app:119,appar:6,appear:[18,30,34,68],appel:40,append:[6,24,25,49,58,59,67,72,74,92,116],append_backward:[6,50,52,109],append_clip_op:24,append_op:[24,38,58],append_oper:58,append_optim:52,appl:119,appleyard:110,appli:[33,34,51,52,55,64,74,116],applic:[18,20,29,30,31,34,58,60,72,75,96,97,99,109,110,121],applyl1:12,appoint:75,appreci:[67,72],approach:[21,22,23,51,53,54,62,67,118,120,121],approxim:53,apt:[1,109],arbitrari:[21,44,77],arch:118,archetectur:67,architectur:[0,29,67,94,118,119],archiv:[45,46],area:33,arg:[6,35,50,66,75,104],argu:57,argument:[0,6,7,12,13,21,50,54,57,58,72,74,94,105,106,116],arithmet:29,arm64:[118,119],arm64_standalone_toolchain:118,arm:[29,118,119,120],arm_standalone_toolchain:118,armeabi:118,armv7:[29,119],armv8:29,arn:97,around:[16,40,58,97,121],arrai:[14,18,20,34,49,56,58,59,60,70,75],arrang:70,array_to_lod_tensor:40,arrow:33,articl:[0,31,34,93,99,120],artifact:[3,63,97],artifici:40,arxiv:[33,67],as_step_input:7,asap:76,asduplic:52,asgd:53,ask:[6,9,16],asr:67,assgin:40,assign:[6,13,18,22,24,29,31,51,67,77,92,93,97,103,105,121],assigne:67,assignmemt:40,associ:[54,61,77],assum:[1,7,21,43,93,107,116],assumpt:21,ast:18,astyp:[59,75],asyc:9,async:[9,23,52,104],async_count:105,async_lagged_grad_discard_ratio:[92,105],async_lagged_ratio_default:[104,105],async_lagged_ratio_min:[104,105],asynchron:[9,20,51,55,94,105],atom:22,attr:[7,18,35,38,43,56,57,58,66,75,116],attr_map:66,attrdesc:56,attribu:43,attribut:[6,7,23,24,38,56,58,60,64,66,70,74,75],attributemap:75,attrproto:66,attrtyp:[56,66,75],attrvalu:66,auc:[25,104],aucvalidationlay:105,audio:67,augment:67,authent:97,author:[27,67,97],auto:[0,7,12,22,31,39,43,45,57,60,64,70,72,74,75,77,110],autom:[96,97],automat:[0,4,6,14,21,23,24,32,50,60,66,67,72,74,75,93,95,97,104,105,109,116,119],avail:[3,9,14,23,29,30,40,97,121],averag:[13,105],average_test_period:[104,105],avg:[93,110],avg_cost:[21,93,111],avg_loss_valu:93,avoid:[0,5,7,9,20,21,38,43,51,52,53,54,110],avx2:0,avx:[0,1],awai:30,await:99,awar:[4,18,20,25,31,48,58,97,109],awk:100,awni:67,aws:27,aws_account_id:97,awsaccountid:97,awskeymanagementservicepowerus:97,axiom:20,b363:99,b8561f5c79193550d64fa47418a9e67ebdd71546186e840f88de5026b8097465:99,ba5f:97,back:[1,6,9,21,29,33,53,75],background:[52,62,67,92],backpropag:[5,6],backward:[5,7,12,14,24,33,41,42,50,52,53,54,57,61,62,69,74,105,107,111,116],backward_first:116,backward_op:5,backwardactiv:74,bacth:19,baidu:[30,67,99],bake:21,balanc:[23,51,97,105],bandwidth:[29,51],bare:[93,99,121],barrier:[94,105],barrierstatset:110,basci:35,base:[4,13,19,24,25,29,30,37,43,47,50,51,53,55,60,61,62,68,70,74,75,97,105,109,110,111,116,118,119],baseerrorclipattr:24,baselin:67,basematrix:74,bash:[0,1,97,99,103],basic:[21,35,43,55,56,60,61,67,70,74,86],batch:[4,7,9,11,12,19,20,21,25,26,30,33,36,47,48,49,51,53,67,70,72,74,93,97,99,105],batch_id:33,batch_im:33,batch_label:33,batch_norm:[33,67],batch_read:[11,19,59],batch_siz:[21,33,41,49,93],batch_szi:33,batch_z:33,batchnorm:[33,67],batchread:19,batchsiz:74,bazel:8,bbbbb:11,bcm2708:120,bdist_wheel:63,beacus:35,beam:[105,116],beam_gen:116,beam_search:[49,116],beam_siz:[49,104,105,107,116],becaus:[0,4,5,7,8,9,14,29,39,49,54,58,59,62,64,65,69,70,71,74,77,93,97,107,109,116,118],becom:[22,23,64,68,110],been:[6,8,13,20,30,72],befor:[0,1,6,9,16,20,24,28,31,34,47,53,54,55,59,62,72,75,76,97,109,118,119,121],begin:[6,12,14,19,25,28,34,49,51,74,92],beginiter:4,beginn:116,beginpass:4,begintrain:4,behavior:[20,110],behind:[30,70,93],being:[6,16,24,30,57,59,77,109],belong:[19,21,64],below:[0,3,7,9,14,21,23,29,30,44,54,59,62,70,71,74,92,94,97,103,110,116,118,119],benchmark:[44,67],benefit:[16,17,49],besid:[3,21,40,47,51],best:[0,1,8,43,105],besteffort:99,beta:33,better:[8,30,39,40,43,49,97,119,121],between:[1,6,8,9,14,21,23,29,30,43,46,51,54,61,64,75,77,97,114],bia:[49,58,74,116],bias:74,bias_attr:[58,116],biases_:74,biasparameter_:74,biassiz:74,bidi:99,bidirect:[67,116],big:[18,23,40,110,121],bigger:9,bilinearfwdbwd:110,bin:[0,1,92,97,99,103],binari:[0,2,3,8,17,21,29,31,33,44,97,109,110],bind:[18,20,29,32,64,68],bit:29,bitcod:119,bla:[1,119],black:[33,119],blank:[97,119],blob:0,block0:[40,52],block1:[40,52],block2:[40,52],block3:52,block:[6,10,12,14,18,20,21,22,23,24,25,26,30,37,40,47,48,50,68,71,74,75,77,105,110],block_expand:67,block_id:[18,26],blockdesc:[7,34,52,58,60],blockdescbind:37,blockingcount:22,blueprint:49,book:[60,67,78,111,116],book_distribut:93,bool:[7,19,29,36,38,41,42,43,57,58,65,66,70,71,74,76,105,107],boost:[47,67,68,76],boot:[116,121],boot_lay:116,boot_stat:70,bootstrapp:121,borrow:[33,70],bos_id:116,both:[4,7,8,9,16,19,20,21,23,29,30,33,37,40,47,49,51,55,57,65,67,68,74,75,77,78,93,94,97,110,116,119],bottl:51,bottleneck:[55,110],bottom:67,bound:40,boundari:21,box:[33,110],brace:[7,34],brain:16,branch:[3,4,7,8,21,30,36,56,63,69,72,75,78],breadth:105,break_if:70,brief:[8,14,29,68,77],briefli:110,bring:[30,40,77],broadcast:[9,51,60,121],broken:72,browser:[1,78,97,109],bsd:[0,20,51],bsp:20,bucket_nam:97,buddy_alloc:72,buf:12,buf_siz:[21,93],buffer:[12,20,43,44,53,59,64,105,111],buffer_s:20,buffered_read:59,bug:[72,97],build:[1,3,8,17,21,34,35,40,42,53,62,63,66,67,72,75,81,87,89,93,97,101,102,103,105,109,117],build_android:118,build_model:33,builder:72,buildtool:63,built:[0,3,8,18,21,29,31,40,47,51,66,67,70,109,118,120,121],bulk:20,bunch:[44,103,110],button:[72,78,97],c11:45,c703c041:72,c99:46,c99e:97,cach:[0,29],cacul:[25,92],caff:[7,30],caffe2:[7,18,20,30],calcul:[1,5,6,9,14,22,25,29,40,74,94,105,107,110,116],calcut:40,calendar:55,call:[0,4,5,6,7,12,13,14,15,17,18,20,21,24,31,33,34,40,48,49,50,52,55,58,60,61,64,66,68,70,74,75,76,77,87,93,97,103,105,109,110,116],callback:[24,74],caller:[5,97,109],can:[0,1,2,3,4,5,6,7,8,9,12,13,16,17,18,19,20,21,23,24,26,29,30,31,32,33,34,35,37,38,39,40,43,47,48,49,50,51,52,53,55,56,57,58,59,60,61,62,66,68,70,71,72,74,75,76,77,78,87,92,93,94,95,96,97,99,100,103,104,105,107,109,110,116,118,119,120,121],cancel:16,candid:[49,67],cannot:[19,39,60,64,70,74,75],cantain:35,capabl:[29,54,60],capac:[62,97],capi:[0,45],capi_prvi:46,caption:49,captur:95,card:[51,93,103],care:[1,17,40,59,67,68,104,105,121],carefulli:[67,105],carpedm20:33,cast:[29,39],cast_to_op_attr:66,cat:[1,100],categor:75,categori:9,categoryfil:99,caus:[3,9,28,75],caution:[97,99],cbla:41,cc_:8,cc_binari:8,cc_test:8,cclient:15,cduadevicecontext:[47,68],cento:[0,3,87,121],central:[62,93],ceph:[11,99],cephf:[11,17,27],cer:67,certain:[19,38,47,50,55,64,68,76,104],certif:[4,27,97],cffi:45,cfg:[40,99],cgo:45,ch1:20,ch2:20,chain:[6,34,74],challeng:[5,9,30,36,68],chan:20,chanc:[4,29,74],chang:[0,8,13,17,21,30,43,54,56,59,61,63,64,67,72,74,75,77,92,93,97,105,110,116,118],changes:43,channel:[18,71,110],chapter:[48,49,67,93],chapter_data:48,chapter_out:48,charact:[20,67],characterist:107,check:[0,3,6,7,8,24,43,57,60,69,72,75,78,97,103,105,107],check_attr:66,check_eq:74,check_grad:[5,75],check_l:74,check_output:75,check_sparse_distribution_batch:[104,105],check_sparse_distribution_in_pserv:[104,105],check_sparse_distribution_ratio:[104,105],check_sparse_distribution_unbalance_degre:[104,105],check_styl:72,checker:[5,60],checkgrad:105,checkgrad_ep:105,checkmark:121,checkout:72,checkpoint:[23,57],checksum:27,child:7,china:[1,118],chines:78,chip:30,chmod:97,choic:[1,8,30],choos:[0,1,2,20,38,76,105,118],chosen:[33,47],chunk:[13,27],circl:34,circular:20,circumst:68,claim:97,claimnam:97,clang:[29,45,72,118],clariti:49,classic:[40,67],classif:[34,107],classifi:33,claster:97,clean:[0,7,8,26,54,60,72],clear:[8,39,49,54,64,120],clearer:[54,58],clearli:64,cli:97,click:[3,72,78,97,109,110],client:[12,15,60],clip:105,clip_op:24,clip_op_desc:24,clipe:52,clone:[0,72,78,109,118,120],close:[20,59,72],close_channel:20,cloud:[8,9,17,27,28,60,121],cludform:97,cluster:[4,7,9,14,21,67,81,92,94,99,103,104,105],cluster_test_fil:92,cluster_train:95,cluster_train_fil:92,cluster_train_v2:[95,96,100],cm469:97,cmake:[0,46,72,74,75,78,93,109,110,118,119],cmake_build_typ:[109,118,119,120],cmake_c:119,cmake_install_prefix:118,cmake_system_nam:[118,119,120],cmakelist:[8,41,42,74],cmatrix:[45,46],cmd:99,cname:97,cnn:99,coars:32,code:[0,1,3,4,6,8,16,19,20,21,23,26,29,32,33,34,38,44,47,50,52,53,54,55,57,59,60,61,62,66,70,73,74,75,76,77,78,87,92,93,97,99,110,116],codebas:[60,72],collabor:9,collect:55,collectbia:74,column:[34,59,74,109],com:[0,1,8,33,63,72,78,97,99,109,111,118,120,121],combin:[40,50,60,64],come:[21,25,40,56,67,70,71],comma:[14,93,105],command:[0,1,3,8,12,17,28,72,74,75,78,87,93,94,95,97,99,100,101,102,103,108,109,110,118,119,120],commandlin:110,comment:[8,35,66,67,72,75],commit:[8,72,92],common:[11,62,68,74,104,116],commonli:[28,62,107,109,110,116,120],commun:[9,14,15,20,21,23,51,72,74,92,93,97],compani:30,compar:[0,5,8,18,60,74,75],comparis:114,comparison:[8,30,76],compat:[29,32,51],compil:[8,21,30,35,37,40,47,51,61,65,66,71,74,78,103],complaint:8,complet:[2,6,7,9,13,14,24,27,34,44,47,60,74,75,77,93,97,99,109,121],complex:[16,20,40,49,60,110,116],complic:[21,32,39,59,70],compon:[20,21,35,67,70,71,74,76],compos:[4,20,32,35,48,58,60],composit:32,compress:13,compris:6,compromis:0,comput:[0,1,4,5,9,20,21,23,26,29,30,31,35,39,40,44,47,50,51,52,53,55,61,64,67,68,71,72,74,75,76,93,96,97,103,107,109,110,111,116,118,119,120],computation:116,computationgraph:35,con:51,concat:[33,116],concaten:[33,48,70],concentr:60,concept:[4,18,19,20,30,32,33,35,43,48,49,53,54,56,64,70,71,86,116],conceptu:[20,26,30,33,35],concern:[4,20,25,119],concis:[33,70],conclud:75,concret:[60,68,75],concurr:[9,16,23,55,94],concurrentremoteparameterupdat:105,cond:[7,30,36,56],condit:[13,21,30,36,43,67,69,99,116],condtion:33,conduct:110,conf:95,conf_paddle_gradient_num:97,conf_paddle_n:97,conf_paddle_port:97,conf_paddle_ports_num:97,conf_paddle_ports_num_spars:97,config:[11,28,49,74,97,99,104,105,121],config_:[12,105],config_arg:[104,105,107],config_lay:74,config_len:14,config_pars:[41,42,74],config_proto:14,configmap:21,configur:[0,6,12,14,16,17,21,23,30,35,38,39,58,67,68,72,74,75,77,81,87,92,93,105,110,114,118,120,121],confirm:28,conflict:[64,72],confus:[33,38],congest:105,connect:[17,18,21,23,67,74,92,93,97,99,103,121],consid:[0,6,57,68,107,110,121],consider:[47,67],consist:[13,19,20,31,44,56,59,60,61,66,67,71,75],consol:[97,110],consolid:[7,78],constant:[35,37,38,47,74,75],constraint:64,construct:[4,26,35,40,48,58,60,64,66,76,116],constructbackwardgraph:34,constructoptimizationgraph:34,constructor:[24,29,55,58,60,64,66,74,75],consum:[9,109],consumpt:40,contact:16,contain:[0,2,3,4,6,7,13,26,33,35,43,44,47,54,55,58,60,61,64,65,66,67,70,71,75,92,93,97,100,103,116,119],container:96,containerport:97,content:[14,28,44,49,78,99],content_dir:78,content_len:14,context:[24,43,64,65,68,75,77,111,116],contin:97,continu:[6,9,44,67,94,105,118],contrib:62,contribut:[62,67,73],contributor:[60,72],control:[7,18,20,69,92,97,99,105,121],controlflowgraph:40,conv2d:[33,76],conv:[33,39,43],conv_fwd:43,conv_pool_2:21,conveni:[4,6,35,50,66,67],convent:[6,14,72,75],converg:95,convers:[29,30],convert:[11,21,22,23,29,30,31,43,59,61,67,93],convlut:67,convolut:[33,47,58,68],convolution_algorithm_opt:43,cool:72,cooper:67,coordin:[9,14],copi:[0,4,13,16,28,34,48,49,51,53,70,72,92,97,100],copy_from:24,copyvariablewithtensor:39,core:[0,6,35,38,46,53,54,70,105,111],coreo:[97,121],corner:60,corpu:67,correct:[5,6,29,51,74,75,76,97],correctli:[6,29,33,74],corresond:29,correspoind:4,correspond:[4,6,7,8,24,29,35,36,43,47,48,49,58,60,61,62,66,68,69,74,75,76,77,109,119],correspondingli:119,corss_entropi:4,cortex:29,cos:66,cosin:66,cosineop:66,cosineopproto:66,cosineopprotomak:66,cost:[4,6,21,34,39,50,51,56,57,93,105,111],cost_np:57,could:[0,4,5,13,18,20,21,22,23,29,30,31,48,50,53,54,56,58,59,61,69,76,95,97,109,110,118],count:[9,17,25,57,59,67,92,93,99,105,107,110],counter:[9,13,22,34],cours:[0,17,47],cover:[30,67,77],cp27:3,cp27m:[3,63],cp27mu:[3,63],cpp:[5,12,32,41,42,45,46,54,60,71,74,76,110],cprofil:109,cprofilev:109,cpu:[0,1,5,17,29,38,39,47,52,53,54,55,60,62,63,68,75,76,77,99,105,109,110,111],cpu_avx_mkl:[1,3],cpu_avx_openbla:[3,87],cpu_kernel:38,cpu_noavx_openbla:3,cpu_ns_:55,cpu_per_pserv:21,cpu_per_train:21,cpudevicecontext:[47,68,75,76],cpuelapsedu:55,cpuengin:42,cpuinfo:1,cpuplac:[21,38,39,43,47,52,68,75,76,77,93,111],cpusparsematrix:46,crash:[9,95,105,110],creat:[1,4,5,7,9,14,18,19,22,24,25,26,27,28,29,30,32,33,34,43,47,48,50,51,53,54,58,61,62,67,72,74,75,78,87,92,93,100,105,118,121],create_backward_pass:50,create_bias_paramet:74,create_block:58,create_doc_str:66,create_input_paramet:74,create_local_scop:26,create_oper:32,create_optimization_pass:50,create_paramet:58,create_python_ops_creatation_funct:66,create_rnn:7,create_rnn_op:48,create_tmp_var:58,create_tmp_vari:58,create_var:58,create_whileloop:70,creategradientoper:61,creatememori:43,createop:66,createoper:7,createprimitivedesc:43,createstack:97,createvari:7,creation:[32,97],creationd:97,creator:[11,60,61],creator_:61,credenti:28,crf:[39,68],critic:[33,109],crlf:72,crop:68,crop_grad:68,cropgradkernel:68,cropkernel:68,cross:[58,75],cross_compil:120,cross_entropi:[4,21,33,39,40,52],crt:27,csc:74,csr:74,ctc_error_evalu:67,ctest:[0,75],ctor:58,ctrl:[0,95],ctx:[39,43,75,77],cubla:[47,76],cublas_handle_:68,cublashandle_t:68,cuda7:[3,87],cuda8:[0,1,3],cuda:[1,3,8,31,47,55,60,68,75,76,93,103,105,110],cuda_context:31,cuda_dir:[104,105],cuda_fp16:29,cuda_so:1,cudaconfigurecal:110,cudadevicecontext:[31,47,68,75],cudadevicegetattribut:110,cudaelapsedu:55,cudaevent_t:55,cudaeventcr:110,cudaeventcreatewithflag:110,cudafre:110,cudagetdevic:110,cudagetdevicecount:110,cudagetdeviceproperti:110,cudagetlasterror:110,cudahostalloc:110,cudalaunch:110,cudamalloc:110,cudamemcpi:110,cudaplac:[39,47,68,76],cudaprofilerstart:110,cudaprofilerstop:110,cudaruntimegetvers:110,cudasetdevic:110,cudasetupargu:110,cudastream_t:68,cudastreamcr:110,cudastreamcreatewithflag:110,cudastreamsynchron:110,cudeviceget:110,cudevicegetattribut:110,cudevicegetcount:110,cudevicegetnam:110,cudevicetotalmem:110,cudnn:[8,38,39,43,47,68,76,105],cudnn_conv_workspace_limit_in_mb:[104,105],cudnn_dir:[104,105],cudnn_kernel:38,cudnnconvopkernel:76,cudnnv5:0,cudrivergetvers:110,cuinit:110,cumtim:109,cur_mem:49,curl:97,curli:[7,34],current:[0,1,3,6,7,8,9,12,14,18,23,25,30,38,39,47,48,49,53,54,55,58,64,70,74,78,92,93,95,97,105,116,119],current_block:[56,58],current_endpoint:93,current_oper:56,current_word:116,curv:4,custom:[4,17,29,33,49,53,60,67,74,97],custom_batch_read:59,cut:70,cxx:119,cxx_compil:[118,119,120],cxx_flag:119,cxxabi_1:3,cycl:9,cython:45,d3e0:97,d_b0:33,d_b1:33,d_b2:33,d_block:33,d_f:33,d_g:33,d_h0:33,d_h0_bn:33,d_h0_relu:33,d_h1:33,d_h1_bn:33,d_h1_relu:33,d_h2:33,d_loss:33,d_loss_fak:33,d_loss_real:33,d_optim:33,d_step:33,d_t:33,d_w0:33,d_w1:33,d_w2:33,daili:72,dandroid_abi:118,dandroid_arm_mod:118,dandroid_arm_neon:118,dandroid_standalone_toolchain:118,dangl:0,dario:67,darwin:97,dash:33,dat:11,data:[4,5,7,11,12,13,20,23,25,27,29,30,33,34,35,37,38,40,43,44,47,48,49,50,51,52,53,54,56,58,60,62,64,65,66,67,68,70,71,74,75,77,87,89,92,93,94,100,101,104,105,107,110,111,116],data_grad:52,data_i:33,data_lay:12,data_layout_:39,data_read:59,data_reader_creator_random_imag:59,data_shar:70,data_typ:[39,44,65,67,71,76,87,116],data_type_:[38,39,47],data_x:33,datacent:[11,28],datacenter1:11,datacenter2:11,datacenter_1:11,datacenter_2:11,datacenter_nam:11,datafeed:[93,111],dataflow:35,dataflow_analysi:40,datalayout:39,dataparallel:21,dataprovider_convert:67,dataset:[11,17,21,53,59,67,87,93,94,105,109,116],datatransform:39,datatyp:[38,39,43,65,67],date:92,dcgan:33,dcmake_install_prefix:[118,119,120],dcmake_system_nam:[118,119,120],dcuda_arch_nam:0,dcudnn_root:0,ddim:[19,47,68,77],dead:9,deal:[6,121],debug:[1,5,6,21,28,30,58,72,109],debug_str:35,decai:52,decayr:12,decent:13,decid:[4,16,33,44,53,61,62,65],declar:[7,33,48,71],decod:[67,116],decoder_boot:116,decoder_dim:49,decoder_group_nam:116,decoder_input:[49,116],decoder_mem:[49,116],decoder_s:116,decoder_st:116,deconv:33,decor:[19,74],decrement:22,decrementcount:22,decrypt:97,deduc:60,deep:[1,6,16,20,26,33,34,40,42,55,60,62,67,68,93,110,119],deeper:[1,31],deepspeech2:41,def:[4,5,6,11,17,24,25,32,33,35,38,40,48,49,50,58,59,66,70,74,75,116],def_block:33,defalut:[105,107],default_block:33,default_devic:107,default_main_program:[93,111],default_param_attr:58,default_st:70,default_start_up_program:52,default_startup_program:[93,111],default_valu:107,defaultdict:40,defaultinfervartyp:37,defect:54,defer:16,defin:[4,6,7,8,9,16,18,19,22,23,24,29,30,31,32,33,35,38,40,47,48,51,56,58,59,60,64,66,68,70,71,74,77,92,93,95,105,109,111,116],definit:[1,6,7,9,13,21,26,31,38,52,56,61,66,70,75,109,111],definiton:32,deivc:76,delai:[53,68,77,105],delet:[17,27,72],deletestack:97,deliv:121,delta:5,demand:[9,68],demo:[60,95,99,101],demolish:99,demonstr:[77,116],denot:[75,107],dens:[14,15,65,67,74,97],dense_vector:87,densescann:67,dep:8,depart:67,depend:[1,7,8,9,17,19,20,21,23,35,51,57,65,75,92,94,107,118,119,120,121],dependent_var:57,deploi:[95,107,121],deploy:[35,44,60,95,96,97,119,121],deprec:67,depth:[7,30,67],dequeu:23,deriv:[4,19,21,24,36,50,118],desc:[7,24,43,44,58,66,70],desc_:7,descend:70,descent:[9,53,94],descproto:44,describ:[4,6,7,8,13,18,21,26,31,38,39,43,44,48,49,54,56,58,60,65,66,71,74,75,76,77,97,99],describestack:97,describestackev:97,describestackresourc:97,descripotor:43,descript:[0,3,7,8,37,39,42,44,47,61,65,67,71,72,75,92,97,106],descriptor:[20,39,43],deseri:[44,54],deserializ:60,desgin:34,design:[6,12,19,38,40,45,53,55,62,75,121],desir:[9,21,53,97,99],destin:[14,28],destroi:[7,26],destruct:64,destructor:[55,74],detail:[0,5,6,13,17,21,23,28,30,33,35,39,40,43,44,47,48,55,58,62,64,68,70,71,74,75,76,77,78,87,93,95,97,99,106,107,109,110,116,120,121],detect:[0,37,72,118],determin:[7,21,40,47,60,74],dev:[0,1,60,109,118,121],dev_ctx:[7,43,55],devel:63,develop:[0,1,3,6,8,30,37,54,55,58,61,63,67,72,77,78,92,104,105,109,111,117,119,120],deverlop:105,devic:[1,5,18,21,25,29,35,39,42,43,47,51,54,55,60,75,77,92,105,111,119],device_:55,device_context:[43,75],devicecontext:[7,47,55,75,76],deviceid:[42,107],deviceid_:42,deviceplac:68,devid:105,devot:67,devtools2:0,dhcp:121,diagnos:95,diagram:[48,94],diamond:33,dict:[6,58,92,100],dict_siz:[12,49],dictionari:[4,5,58,107],did:[1,54],diff_mat:5,differ:[1,5,6,7,8,9,14,16,19,21,22,23,24,25,26,29,30,33,35,36,39,40,43,47,49,51,53,55,57,61,64,67,69,70,71,74,75,76,77,92,94,95,97,99,105,109,116,119],differenti:32,difficult:[0,5,30],dig:[1,97,110],digit:92,digraph:35,dilat:43,dim0:75,dim1:75,dim:[12,43,44,48,60,65,68,71,74,75,77],dim_:[68,77],dimens:[33,60,65,67,68,70,74,75,77,107],dimension:[74,77,116],dios_arch:119,dios_enable_bitcod:119,dios_platform:119,dios_use_veclib_for_bla:119,dir:118,direcit:67,direct:[30,40,53,67,109],directli:[0,2,8,15,17,19,21,29,38,39,54,66,70,95,99,118],director:75,directori:[0,1,8,11,16,27,28,68,77,78,92,95,99,100,105,110,118,119,120],disabl:55,disadvantag:[53,58],discard:[9,13,49,92,105],discov:9,discoveri:97,discrep:110,discrim:33,discuss:[4,7,13,14,15,21,43,67],disk:[0,44,99],dispatch:[21,54,95,96,105],displai:[17,28,72],dist:[0,63],dist_train:[4,17],distinguish:[8,95],distribut:[3,7,13,14,15,16,18,20,25,31,51,60,67,71,92,95,99,101,102,108,121],distribute_test:[104,105],distributedli:[21,74],distributetranspil:93,disucss:4,divid:[6,25,66,71,104,109],diy_beam_search_prob_so:[104,105],django:78,dnn:[0,43,67],dns:97,do_forward_backward:59,doc:[0,35,48,70,75,77,78,92],doc_cn:78,docker:[0,2,63,72,78,97,101,102,103,121],docker_build:4,docker_clust:[95,100],docker_push:4,dockerfil:[118,120],dockerhub:1,document:[0,5,19,21,27,34,48,49,55,60,67,72,73,75,76,77,93,103,107,119],doe:[0,3,9,13,14,16,17,18,19,21,23,26,29,35,40,48,54,58,60,61,62,74,75,77,110,111],doesn:[0,1,4,7,18,20,59,72,99,109,110,118,119],doing:[12,16,21,34,52,110],domain:97,don:[0,1,4,8,32,34,40,59,67,72,75,78,97],done:[6,8,9,13,14,21,22,37,40,44,53,61,62,67,72,97,109,110],dot:[75,105],dot_period:[105,107],doubl:[0,21,29,34,39,55,75,76,105],down:[67,110],download:[0,1,3,8,9,12,16,27,92,94,119,121],dozen:8,draw:49,drive:64,driver:[1,93,103],drop:[19,49],dropout:74,drpi_arm_neon:120,drpi_toolchain:120,drwxr:99,ds2:67,dst:[14,43],dst_primitive_desc:43,dtoh:110,dtype:[20,21,35,58,93,111],due:[13,16,33,40,49,58,109],dummi:13,dump:44,duplic:[23,52],durat:[13,110],dure:[6,7,9,13,16,17,25,30,40,51,53,55,58,60,67,71,74,75,97,104,105,121],duse_eigen_for_bla:118,dwith_c_api:[46,118,119,120],dwith_distribut:93,dwith_doc:93,dwith_gpu:[0,93,120],dwith_profil:110,dwith_python:[46,93,120],dwith_swig_pi:[46,93,118,119,120],dwith_test:[0,75,119],dwith_tim:110,dynam:[0,14,46,48,58,59,105,110],dynamic_cast:74,dynamic_recurrent_op:70,e2e:121,each:[5,6,8,9,12,13,14,16,17,18,19,20,21,24,25,26,31,34,37,39,40,43,47,48,49,51,52,54,55,57,58,59,60,61,64,65,66,67,68,69,70,71,74,76,92,93,94,95,97,103,105,107,109,116,121],eager:30,earli:[29,31,72,75],eas:[37,75],easi:[5,6,49,53,59,60,62,72,74,95],easier:[0,4,23,29,30,59,69,70,72,74],easili:[4,33,51,55,59,61,64,68],echo:1,edg:40,edit:[0,1,20,97],editor:[0,58],edu:[97,99],eeoi3ezpr86c:97,effect:[0,97,105,119],effici:[0,21,44,59,67,68,74,116],effort:[21,67],efs:97,efs_dns_nam:97,efsvol:97,egd:40,eigen:[29,47,53,60,62,68,75,118,119],eigen_device_:68,eigen_test:77,eigen_use_gpu:75,eigenmatrix:77,eigenscalar:77,eigentensor:77,eigenvector:77,either:[2,4,21,33,36,37,48,53,62,110,119],elabor:67,elb:97,elbapis:97,electr:40,electron:99,element:[5,13,20,23,35,49,60,75,77],element_typ:[14,76],elementari:60,elif:[4,66,69],els:[0,1,4,12,17,20,21,23,24,30,33,36,37,38,40,64,66,69,74,75],elsewher:55,emac:0,email:72,emailweixu:8,emb1:12,emb2:12,emb:99,embed:[4,7,12,23,37,49,65,70,92,116],embedding_lay:12,embedding_nam:116,embedding_s:116,emphas:110,emplace_back:74,emploi:[6,24,66,116],empti:[6,9,19,20,49,75],emul:29,enabl:[7,8,13,18,23,24,35,55,72,92,97,105,110,119],enable_grad_shar:[104,105],enable_parallel_vector:105,enc_proj:116,enc_vec:116,encapsul:14,encod:[13,49,116],encoded_proj:116,encoded_sequ:116,encoded_vector:116,encoder_ctx:49,encoder_ctx_expand:49,encoder_ctx_proj:49,encoder_dim:49,encoder_out_seq:49,encoder_s:116,encount:12,encourag:[21,26],encrypt:97,encrypt_decrypt:97,end2end:121,end:[3,6,7,21,24,31,35,40,49,54,55,59,64,67,69,72,105,116],end_pass:4,endian:44,endif:[47,55],enditer:4,endpass:4,endpoint:[11,97],endtrain:4,enforc:[71,119],engin:[17,42,43,67,110],english:[67,78],enjoi:1,enough:[0,6,7,38,40,47],enqueu:23,ensur:[0,1,3,9,43,51,64,74,119],enter:[7,26],enterpris:60,entir:[14,16,75],entiti:[7,64],entranc:26,entri:[0,13,17,37,72,74,97,118],entropi:58,entry_point:17,enumer:47,env:[78,97,109],environ:[0,3,4,19,21,72,92,96,97,99,104,105,109,110,119],environmenterror:92,eos_id:116,epoch:33,epol:20,equal:[9,70,75,76,105],equat:[40,75],equip:116,equival:[4,7,18,20,24,30,36,66,121],eras:19,erlang:20,error:[4,5,13,28,29,30,43,64,67,74,75,95,97,105],error_clip:24,error_clip_callback:24,errorclipbyvalu:24,especi:[0,42],essenc:[4,6],essenti:[4,26,29,77],establish:18,estim:[4,23,53],eta:99,etc:[0,7,20,21,25,43,51,53,59,64,67,76,96,97,104,107,121],etcd:[9,13,14,16],etcd_addr:14,eth0:[92,97],etyp:20,eval:[7,25,33,60],eval_program:25,eval_result:25,evalu:[16,35,57,67,110,111],even:[0,4,29,51,58,59,72,105,110,119],evenli:[14,97],event:99,event_:55,event_block:55,event_handl:4,eventkind:55,eventlist:55,eventu:[21,70],everi:[4,9,13,14,16,24,25,34,35,37,39,40,43,47,48,51,58,64,66,72,74,75,77,92,94,100,103,105,111,116],everyon:72,everyth:[21,23,33,118],everywher:0,evid:54,evolv:30,exactli:[19,97],exampl:[0,7,17,19,21,23,25,28,30,31,32,33,34,35,37,39,40,43,47,48,49,54,55,56,58,59,60,61,62,65,68,69,70,72,74,75,76,77,92,93,96,97,99,104,105,107,109,110,111,116,120],except:[16,18,30,34,55,67,70,107],excess:40,exchang:54,exe:[21,93,111],exec:105,execut:[8,9,13,17,18,19,20,21,25,26,31,33,35,40,43,51,52,55,61,71,74,97,109,110],executioncontext:[39,43,75,76,77],executor:[18,21,25,29,30,31,33,39,50,52,56,58,71,93,109,111],exist:[0,3,4,7,9,19,28,30,49,58,59,61,66,68,70,74,77,97,105,119],exit:[14,28,93,99,105],expand:[1,49,74],expect:[39,110],expected_desc:43,expected_kernel_kei:39,experi:[44,67,107],experienc:72,expert:8,expir:9,explain:[9,18,30,32,34,72,76,93,94,109],explan:[17,18,21,39,64],explicit:[19,55,70,74,76],explicitli:[4,21,26,75,77,119],explod:24,explor:[49,62],expos:[6,15,20,43,44,68,70,97],express:[4,23,25,35,40,75,97],extend:[53,70],extens:[16,23,49,75,118],extent:46,extern:[8,42,45,46,60,67],external_librari:8,extingrad_:42,extinval_:42,extoutgrad_:42,extoutval_:42,extra:[21,62,68,121],extraattr:107,extract:[30,54,67,75,97],extrem:[18,30,110],f120da72:99,f7e3:97,fa0wx:99,fabric:96,face:[8,62],fact:[18,30,51,56,58],factori:45,fail:[9,13,49,75,99,105,107],failur:[9,14,75],fake:33,fake_imag:59,faked_imag:33,fall:[29,57],falloc:27,fals:[5,6,7,30,36,38,41,48,56,57,59,65,71,74,75,87,92,99,105,107,116],false_block:[7,36,56],false_label:59,false_neg:25,false_posit:25,false_read:59,familiar:20,faq:117,far:[24,70],fashion:21,fast:[13,30,110],faster:[1,9,30,52,110,116,118],fastest:30,fastli:72,fat:119,father:6,fault:[0,13,60],favorit:0,fbd1f2bb71f4:99,fc1:[35,74,107],fc1_bia:35,fc1_weight:35,fc2:[35,107],fc3:[35,107],fc4:107,fc8a365:97,fc8a:97,fc_grad:52,fc_layer:[58,66,74,107],fc_op:66,fc_out:7,fc_output:66,fc_without_b:7,fclayer:74,fcop:32,feasibl:53,featur:[6,21,29,35,51,52,55,67,72,105],feed:[4,21,34,48,62,93,111],feed_dict:33,feed_list:[93,111],feed_minibatch:71,feeder:[21,93,111],feel:72,fetch:[9,12,19,21,57,74,111,116],fetch_list:[21,58,71,93,111],fetch_op:57,few:[0,8,9,20,21,40,53,59,65,67],fewer:[20,58],fft:67,field:[7,35,37,44,57,58,61,65,66,97,110],fifth:34,figur:[4,8,21,23,33,42,48,55,58,67,74,110,116],file:[0,1,3,4,6,8,9,11,13,14,16,17,19,20,27,28,30,31,35,44,46,59,60,67,68,71,72,74,75,76,77,87,92,93,95,100,105,111,116,118,119,120,121],filelist:67,filenam:[11,58,109],fileoffset:27,filesystem:[16,17,21,27,97],fill:[9,13,47,58,97],fill_zero_grad:60,fill_zeros_like_op:6,filter:[24,43],find:[0,3,7,9,16,20,29,35,39,43,49,64,93,100,110,118,119],find_var:5,findmemori:43,findop:7,findprimit:43,findprimitivedesc:43,findvar:[7,64],fine:[13,32],fingerprint:97,finish:[0,9,13,16,17,26,40,51,66,92,94,95,97,99],finit:74,finnal:1,first:[0,4,6,7,9,13,16,17,18,21,26,28,30,33,34,35,43,48,49,52,56,57,58,60,65,66,67,68,69,70,72,74,75,77,93,97,105,107,110,116,121],first_seq:116,firstli:76,firstseen:99,fit:[29,38,40,44,49,60,93],fit_a_lin:93,five:[56,110],fix:[21,40,45,58,67,72,109],flag:[41,42,55,72,75,78,105],flatten0:35,flatten:[35,56,58,77],flexibl:[4,14,21,30,34,38,48,49,53,59,68,70,116],flist:92,fliud:18,float16:20,float16_t:29,float32:[21,29,32,33,58,59,75,93,111],float_to_half_rn:29,flow:[7,18,20,48,55,63,69],fluid:[6,19,21,23,26,39,47,52,55,58,68,69,76,109],fluid_cuda_create_tensor:31,fluid_cuda_mult:31,fluid_cuda_read:31,fly:6,fmt:20,fnt03:97,focu:[20,35,109,110],focus:75,folder:[8,11,17,28,93,97],follow:[0,1,3,4,5,6,7,8,9,13,17,18,19,20,21,23,26,29,30,31,32,33,34,35,36,37,39,40,43,47,48,49,51,52,53,55,56,57,58,59,60,61,62,64,65,66,67,68,69,70,71,72,74,75,77,78,87,93,97,99,100,101,102,107,109,110,111,116,118,119,120,121],footprint:31,forbid:4,forc:[39,51,58],force_cpu:38,force_cudnn:38,force_load:45,forest:7,forev:20,forget:4,fork:72,form:[3,19,20,25,110],formal:39,format:[5,13,19,21,29,30,47,49,67,70,72,74,75,77,87,92,97,105],former:[4,8,30,40,53],formula:[5,40],forth:33,forward:[5,6,7,12,14,24,30,33,41,42,43,44,50,54,56,59,60,61,62,65,74,107,116],forward_infer:43,forward_list:55,forward_op:5,forward_train:43,forwardactiv:74,found:[29,56,62,64,76,93,96,116,120],four:[20,25,30,34,43,47],foward:57,fp16:[29,60,71],fp32:[39,47,60,71],fp64:[47,71],fpga:[47,111],fpgadevicecontext:68,fpgaengin:42,fpgaplac:[47,68],frame:[26,60,67,70],framework:[4,6,7,20,24,25,29,30,35,47,51,52,53,55,56,60,62,64,66,68,72,74,75,96,109,111,119],free:[31,68,72,121],freememoryop:31,frequenc:[67,110],frequent:[13,59,60,62,68,95,118,119],fresh:16,friend:64,friendli:33,from:[1,3,5,6,7,8,9,11,12,13,14,18,19,20,21,23,24,25,28,29,30,32,33,34,35,36,38,39,40,43,48,49,50,51,52,54,56,58,59,60,61,64,67,68,69,70,71,72,74,75,76,77,92,93,94,96,97,99,105,107,109,110,116,118,119,121],fromfil:59,front:[35,40],fulfil:110,full:[9,16,20,48,51,53,74,76,116,121],full_matrix_project:116,fulli:[21,23,67,74,93,110,121],fullsiz:12,fullyconnect:[35,58],fullyconnectedlay:74,func:[13,18,31,61],functor:[32,35],fundament:[19,20,23,29,60],further:[19,66,121],futur:[16,21,29,40,48,60,118],fvs:66,fwd_desc:43,fwd_op:61,fwd_primit:43,fwd_primitive_desc:43,fwd_var:24,g_b0:33,g_b1:33,g_b2:33,g_block:33,g_command_config_arg:[41,42],g_h0:33,g_h0_bn:33,g_h0_relu:33,g_h1:33,g_h1_bn:33,g_h1_relu:33,g_h2:33,g_im:33,g_loss:33,g_optim:33,g_program:58,g_state:55,g_step:33,g_w0:33,g_w1:33,g_w2:33,gan:4,gangliao:8,gap:105,gatedrecurrentlay:41,gather:[6,40,51,54,74,75],gaussian_normal_random:33,gcc:[0,29,31,45,60,109,118,120],gcc_3:3,gcreators_:66,gemm:41,gemmconvkernel:76,gendrated_id:49,gener:[0,4,5,6,7,8,9,11,13,14,16,18,21,30,32,37,40,43,47,51,53,56,57,58,59,60,61,65,66,67,68,69,70,72,75,97,100,105,107,110,118,120],generated_id:49,generated_scor:49,generatedinput:116,get:[0,1,3,5,6,7,8,9,13,14,16,17,19,27,30,33,35,38,39,40,41,42,43,47,48,49,55,58,60,61,64,66,70,72,74,75,76,92,95,97,100,109,110,116,117],get_all_op_proto:66,get_block:58,get_config_arg:107,get_data:99,get_data_from_prefetch_queu:52,get_dim:5,get_float_el:5,get_grad_op_desc:6,get_input_lay:74,get_numeric_gradi:5,get_numerical_gradi:5,get_output:5,get_plac:52,get_program:40,get_pserver_program:93,get_startup_program:93,get_support:3,get_symbol:35,get_tensor:5,get_trainer_program:93,get_vari:7,get_worker_addr:18,getactualkerneltyp:38,getattr:24,getbatchs:74,getdeviceid:76,geteigendevic:77,getengin:43,getenv:[4,17,92],getexpectedkerneltyp:[38,39,43],getinfervartyp:37,getinput:74,getinputgrad:74,getinputvalu:74,getkerneltyp:29,getkerneltypeforvar:39,getlibrari:43,getmat:12,getoptconfig:12,getoutputgrad:74,getoutputvalu:74,getparam:12,getparameterconfig:12,getparameterptr:74,getparameterspars:12,getparametersremot:12,getplac:[43,68,75,76,77],getsiz:74,gettask:13,gettempl:97,gettensor:39,gettranspos:74,getw:74,getweight:74,getwgrad:74,git:[0,63,72,78,118,120],github:[0,8,33,47,63,72,78,109,111,118,120],give:[0,9,39,48,58,60,72,74,97,110],given:[6,14,16,20,23,24,30,32,33,49,59,62,70,74,105],glibc:[3,118,120],glibc_2:3,glibcxx_3:3,glide:8,global:[0,4,7,8,9,31,35,38,39,54,55,60,64,66,68,97,105,110],global_block:58,globalstat:110,globalstatinfo:110,glog:72,glog_v:72,glog_vmodul:72,gnueabihf:120,go_librari:8,go_test:8,goal:[19,20,23,29,34,51,60,67,110],gob:13,godep:8,godoc:45,goe:[9,30,36,64,111],going:[6,32,53,103,109,121],golang:8,good:[20,33,53,58,59,62,93,109,110,121],googl:[4,55,60,72,96,109,118],googleapi:97,googlenet:42,goroutin:[18,20],got:[38,64],gpg2:97,gpg:97,gprotos_:66,gpu:[0,3,5,17,20,25,29,39,40,47,51,52,53,54,55,60,62,63,68,76,77,87,92,93,103,108,111,121],gpu_id:[105,107],gpu_per_train:21,gpu_plac:52,gpudevic:68,gpugpu_id:104,gpukernel:60,grab:9,grad:[5,6,14,24,42,52,58,65,105],grad_info_map:6,grad_n:24,grad_nam:24,grad_op:24,grad_op_class:60,grad_op_desc:24,grad_op_maker_:61,grad_op_typ:[60,61],grad_op_type_:61,grad_s_block:6,grad_share_block_num:[104,105],grad_to_var:[6,24],grad_var_nam:5,gradient:[9,13,20,22,24,34,37,50,51,52,53,54,58,60,65,75,92,93,94,105,109],gradient_flat:5,gradient_machin:46,gradientmachin:[46,54],gradientmachine_:12,gradopdescmak:[37,61],gradopdescmakerbas:61,gradopmak:61,gradual:110,grain:32,gram:67,grant:97,graph:[6,7,8,9,18,20,21,22,23,25,30,33,48,51,53,56,77],great:[23,67,121],greater:[24,53,92],greaterthan:66,greedi:67,green:[18,33],grep:[1,100],groudtruth:116,group:[13,35,43,68,75,114,121],group_input1:116,group_input2:116,group_input:116,grow:72,grpc:121,gru:[49,67,116],gru_decod:116,gru_decoder_with_attent:116,gru_out:49,gru_step:[49,116],grumemori:116,gserver:[41,42,74],gsizex:110,gtx:40,guarante:[43,58,74],guard:12,guest:[0,3],gui:[109,110],guid:[27,40,60,72,74,93,97,99,110,116,119],gzip:[13,99],h0_bn:33,h1_grad:52,h2_grad:52,h_prev:7,hadoop:4,half:[29,97],half_to_float:29,hand:[40,60,67,68,77,93,94],handi:8,handl:[4,6,17,18,21,35,40,43,47,54,59,64,68,70,76,111],handler:7,hannun:67,happen:[13,66],hard:[0,21,30,49,67,70,97],hardwar:[0,30,31,68,76,110],has:[0,4,5,6,7,8,9,13,14,16,19,20,21,23,24,25,29,30,33,35,39,40,44,47,49,51,55,56,60,65,66,68,72,74,97,99,110,111,116,119,121],has_kei:[6,24],has_var_recurs:6,hasdependentvar:57,hash:[47,51],hasn:30,hasnext:19,have:[0,1,4,5,6,7,8,9,13,14,16,17,19,20,21,23,24,26,29,30,31,32,33,34,38,39,40,43,44,47,48,49,51,52,53,54,55,56,58,59,60,61,64,65,67,68,71,72,74,75,76,92,97,105,107,110,116,119,120,121],haven:[0,30],hdf:11,head:[72,75,92,100],header:[14,44,46,60,68,74,76,118,119,120],headip:100,heard:0,heavi:95,height:[7,45,59,74,75],height_:65,held:9,hello:4,help:[0,7,28,30,35,43,49,59,60,70,72,95,109],helper:[21,43,61,70,74],henc:[21,53,58,61,62,64],here:[0,1,3,4,8,9,15,20,23,24,26,28,30,34,35,43,47,48,59,62,66,71,72,75,76,78,92,93,95,97,99,104,107,116,119,120,121],heterogen:[21,23,55],heurist:[23,49,105],hidden:[50,58,97,116],hidden_out:7,hierarch:[56,58,60,114,116],hierarchi:60,high:[29,51,67,68,74,96,121],higher:[32,48,70,72],highest:7,highli:[67,70,107,116],him:4,hint:[38,109],histor:[32,76],hl_get_sync_flag:74,hold:[4,6,9,13,15,19,20,29,33,35,37,39,40,64,66,68,77,93,97],holder_:[68,77],home:[1,11,28,97,99,100,109],honor:13,host:[8,17,55,97,99,118,119,120],host_c:[118,119,120],hostfil:100,hostnam:97,hostpath:99,hostport:97,hour:0,hourli:72,hous:87,how:[4,7,9,13,18,19,20,21,26,28,30,32,35,38,39,43,48,49,54,55,62,66,76,93,94,97,99,105,109,116,117,120],howev:[5,6,16,20,21,26,30,39,40,47,53,54,58,59,61,62,65,66,67,68,97,104,105,116],howto:92,hpp:[29,45],htod:110,http:[0,1,8,17,33,63,72,78,97,99,109,111,118,120,121],hub:63,huge:53,human:[55,67],hundr:76,hyper:[33,74],hyperparamet:62,i1117:110,i386:119,iOS:120,iamfullaccess:97,iamusersshkei:97,icc:31,icml:67,id_rsa:100,idea:[8,20,30,31,53,59,62,93,109],ideal:[21,39],ident:[61,75,97],identifi:[36,47,74],ids:[49,74],idx:[13,19,33,40,74],ies:28,if_els:69,if_else_op:6,ifdef:[47,55],ifels:[7,56],ifelseop:56,ignor:105,iii:67,illustr:[9,14,20,21,32,48,74,110,116],im_siz:33,imag:[0,4,21,30,33,34,49,50,56,59,67,72,97,101,102,107,121],image_a:59,image_b:59,image_conv_lay:67,image_fil:59,image_lay:59,image_nam:4,image_path:59,image_reader_cr:59,imagenet:11,imagepullpolici:97,images_reader_cr:59,imagin:34,imgsiz:110,imgsizei:110,imgsizex:110,imikolov:92,immedi:[0,40,43,53,62,97],immutable_paramet:4,imper:18,imperfect:60,implement:[7,13,14,15,16,17,18,20,21,23,30,32,35,36,37,39,40,43,45,46,47,49,52,54,57,64,66,67,68,69,70,116],implemet:12,impli:8,implicitli:18,imposs:[19,49,121],impractic:39,improv:[22,23,40,60,67,97,109,110],inarg:12,inbound:97,includ:[0,3,4,7,8,14,17,19,20,29,30,33,35,40,45,46,48,49,55,56,58,60,66,74,75,96,97,99,105,109,110,116,118,119,120],inclus:49,incom:[18,38],increas:[9,13,29,94,105],increment:[20,25,34,40,105],incupd:74,inde:20,independ:[5,6,14,22,52,64,68,121],index:[5,6,7,9,13,18,56,58,70,76,97],indic:[6,7,14,26,33,48,56,61,65,68,70,95,97,118],indice_map:70,indices_map:70,individu:[9,51,97],industri:[9,44,121],ineffici:[39,54],infer:[4,6,7,9,25,30,36,37,38,39,40,41,45,47,57,58,60,65,67,87,119,120],infer_shap:58,infer_var_type_:37,inferenc:119,inferer:67,inferfac:37,inferior:16,infernec:120,infershap:[7,58,60,75,77],infershapecontext:[75,77],infervartypefn:37,info:[29,48,74,95,121],inform:[7,17,28,35,38,40,43,44,47,48,51,58,62,64,65,72,74,75,77,97,105,109,110,118],infrastructur:[30,97],ingor:105,ingrad_:42,ingredi:[20,67],inherit:[7,19,50,60,68,75],ininst:4,init:[7,22,33,42,48,49,52,74,87,92,97,107],init_attr:58,init_model_path:[104,105,107],initi:[6,8,13,18,21,22,23,25,34,48,51,53,58,62,66,70,74,75,87,105,111,116],initialize_op_attr:58,initrd:121,inlin:[68,76,77,97],inner:74,input0:77,input1:77,input:[5,6,7,12,16,18,19,21,22,23,24,25,29,30,31,32,33,34,35,37,38,39,40,42,43,47,48,49,53,54,57,58,59,60,61,64,66,67,68,70,72,74,75,76,77,87,89,93,100,107,111,114,116],input_data:74,input_data_target:74,input_hassub_sequence_data:74,input_index:74,input_label:74,input_lay:74,input_nam:4,input_seg:70,input_sequence_data:74,input_sequence_label:74,input_sparse_float_value_data:74,input_sparse_non_value_data:74,input_t:74,input_to_check:5,input_valu:5,input_var:[5,58],inputbuff:12,inputdef:74,inputgradi:61,inputlayers_:74,inputs_to_check:5,inputsizechang:43,insert:[6,24,31,51,57,60,61,72],insid:[0,1,6,9,21,23,24,25,38,43,54,55,59,60,61,71,97],inspir:55,instal:[0,1,17,42,63,72,78,81,89,99,103,109,117],install_android:118,instanc:[5,7,9,11,15,18,19,21,22,24,26,31,36,43,48,49,53,58,60,61,74,77,93,105,110,116],instance_ip:97,instanti:[9,26,111],instead:[0,5,6,8,12,17,18,20,21,29,30,34,35,67,72],instrins:29,instruct:[1,7,34,110,118],int16:71,int32:[47,56,70,71,105],int64:[21,27,39,47,65,71],int64_t:55,int8:47,integ:[13,17,18,20,29,45,49,74],integer_value_sequ:[49,67,116],integr:[0,121],intel:[30,47,68,76],intellig:40,inteloptimizedpaddl:42,intend:0,intens:67,inter:21,interact:[1,21,97],interchang:[34,60],interconnect:51,interest:[18,29,51,110],interfac:[0,7,13,17,19,28,35,51,54,60,61,67,68,75,77,97,119,121],intermedi:[0,21,28,31,33,40,50,67,118,120],intern:[29,67,94,95,97,109],internel:42,internet:[8,9,121],interpret:[0,26,30,31,71,110],inth:77,intrins:[18,26,29,120],introduc:[7,9,19,33,41,44,62,64,66,75,92,94,96,99,109],introductori:0,intuit:[16,60],inval_:42,invalid:[59,64],invent:30,invoc:[8,32,60],invok:[6,19,21,24,39,54,58,60,61,66,72,97,110],involv:[49,75],ios:119,ios_arch:119,ios_deployment_target:119,ios_development_root:119,ios_enable_bitcod:119,ios_platform:119,ios_sdk_root:119,ios_use_veclib_for_bla:119,ipad:119,iphon:119,ips:97,ipt:[58,66,116],ipx:121,ipython:4,is_async:92,is_cpu_plac:43,is_mkldnn_librari:43,is_seq:116,is_target:57,is_tensor:66,is_test:43,is_traget:57,isinst:24,ismkldnnkernel:43,isn:110,isspars:74,issu:[0,1,3,8,33,67,72,110],istag:63,item:[16,29,59,87,121],iter:[4,9,21,30,31,40,43,53,55,59,67,70],iter_multiple_input_and_param:58,its:[3,4,6,7,9,13,18,19,20,23,24,25,30,31,33,34,35,37,39,40,44,48,49,51,53,54,57,58,60,61,64,65,66,68,74,75,76,77,94,97,105,110],itself:[6,9,16,19,31,43,53,64],ivs:66,java:[7,45,56,60],jeremi:110,job:[1,6,16,18,21,24,60,92,93,94,96,104,105,107],job_desc:21,job_dispatch_packag:95,job_nam:[17,97],job_namespac:97,job_path:97,job_workspac:95,jobdesc:21,jobnam:21,jobpath:97,jobport0:97,jobport1:97,jobport2:97,jobport3:97,jobserv:17,join:9,journei:1,jpg:19,json:[35,67,97,99],juditski:53,jupyt:[1,17],just:[3,8,13,14,18,21,30,31,33,37,43,53,54,58,59,60,61,62,64,65,72,95,97,107,118],jx4xr:97,jypyt:4,k8s:[18,121],k8s_data:97,k8s_job:4,k8s_token:4,k8s_train:97,k8s_user:4,kafka:11,kcpu:55,kcuda:55,kcudnn:76,kdisabl:55,kebilinearinterpbw:110,kebilinearinterpfw:110,keep:[0,9,20,30,31,34,49,53,58,64,66,72,121],kei:[0,5,6,7,9,11,13,27,29,38,43,60,61,66,67,70,72,75,110],kenlm:67,kept:[40,58],kera:62,kernel0:76,kernel1:76,kernel:[5,20,29,31,38,39,42,53,55,62,65,67,68,75,77,110],kernel_hint:38,kernel_type_for_var:39,kerneltyp:[38,43],key1:105,key2:105,key_pair_nam:97,keyid:97,keymetadata:97,keypair:97,keyserv:97,keystat:97,keyusag:97,keyword:[58,69],kforcecpu:38,kill:[9,97],kind:[1,4,5,9,15,19,21,24,31,34,38,39,43,50,51,55,68,71,76,97,99],kind_:55,kinput:52,kmark:55,kmkldnn:76,kms:97,knchw8c:47,knchw:47,knhwc:47,know:[4,13,18,19,40,44,72,74,92,97,109,110,118],knowledg:67,known:[6,7,20,30,32,48],koutput:52,kparallelblock:52,kparallelscop:52,kparamet:52,kplace:52,kplain:76,kpoprang:55,kpushrang:55,kqueue:20,kselectedrow:65,kstate:55,kube_cluster_tl:4,kube_ctrl_start_job:4,kube_get_workers_addr:18,kube_list_containers_in_job_and_return_current_containers_rank:4,kubeconfig:97,kubectl:[95,99,100],kuberent:[9,97],kubernet:[4,9,18,21,60,96,101,102,121],kubernetes_service_host:4,kusecudnn:38,kusemkldnn:38,kwarg:[25,35,58,66],l1_regularization_op:62,l2_regularization_op:62,l2regular:92,l93:12,label:[21,25,30,33,34,35,39,50,52,56,59,67,93,99,111],label_fil:59,label_lay:59,label_path:59,lag:105,lambda:[18,24],lan:103,languag:[18,20,30,34,40,55,60,64,67,69,107],larg:[21,23,24,40,44,53,67,72],larger:[40,69],larger_than:[7,36,56],last:[0,6,24,40,48,55,56,105,116],last_seq:49,lastseen:99,latenc:[29,67,95,97],later:[0,3,8,60,62,67,68,75,77,97],latest:[0,1,3,7,8,9,16,63,78,99,118,119],latter:[53,70,109],launch:[43,97,105],launcher:4,layer:[6,7,12,18,20,21,23,30,33,34,36,50,52,53,56,59,60,62,66,67,68,70,87,93,104,105,111,114,116],layer_0:74,layer_attr:[107,116],layer_help:38,layer_num:107,layer_typ:[41,42],layerbas:74,layerconfig:74,layergradutil:74,layerhelp:[38,58],layermap:74,layout:[39,43],layout_:[38,47],layouttyp:38,lazi:[53,62],lead:[19,40,47,110],leaki:33,learing_r:50,learn:[0,1,4,6,14,16,20,21,23,26,33,34,40,42,49,51,53,55,59,60,62,68,74,75,78,93,110,116,119],learning_r:[14,21,92,93,111],leas:9,least:[3,9,118],leav:[7,97],lectur:40,left:[7,77,119],legaci:1,legal:66,len:[14,18,27,30,58,74,87],length:[14,29,41,44,48,49,60,67,70,99,105,116],leran:40,less:[4,24,93,121],less_equ:69,less_than:[4,40],let02:99,let:[4,7,16,18,20,31,32,34,38,39,43,47,48,49,50,61,68,75,93,97,109],level:[29,32,35,44,48,49,55,68,70,71,72,95,105,118],lgtest:8,lgtest_main:8,lib64:[1,105],lib:[0,1,46,92,109,118,119,120],libapi:8,libari:46,libc:3,libcuda:1,libgcc_:3,libgoogl:109,libiomp5:42,libmkldnn:42,libmklml_intel:42,libnvidia:1,libpaddl:[45,46,60,109],libpaddle_capi:46,libpaddle_gserv:46,libpaddle_math:46,libpython2:0,librari:[0,3,8,15,20,21,39,42,43,46,51,67,75,76,92,93,103,105,119,120],library_:47,library_typ:76,library_type_:39,librarydevicecontext:47,librarytyp:[39,76],libstdc:3,licens:[42,51,72],life:9,lifecycl:[55,121],lifetim:[3,64],lightweight:32,like:[0,3,6,7,8,9,12,17,18,20,26,30,31,32,33,34,35,37,39,43,47,51,52,53,58,59,60,61,62,64,65,67,69,70,71,72,92,96,97,104,107,109,110,111,116,118,119,120,121],limit:[30,40,44,49,60,62,105,110],linaro:120,line:[0,8,12,17,20,28,34,53,56,58,60,62,72,93,94,95,97,107,108,109,110],linear:[49,87],lineno:109,link1:29,link2:29,link:[3,8,27,28,64,75,97,121],linux:[0,1,3,20,27,72,97,103,120],linux_x86_64:[3,63],lipo:119,list:[0,4,6,7,8,13,17,18,26,28,30,33,47,50,52,54,55,58,61,64,70,74,75,87,92,93,97,105,107,109,116,120],listdir:92,listen:[9,18,21,92,93,105],listen_and_do:18,listenanddo:18,littl:[14,38,44,105],live:[75,111],live_in:40,live_out:40,load:[0,4,9,21,33,51,58,75,97,105],load_missing_parameter_strategi:[104,105,107],load_mnist:33,load_persist:93,loadsave_parameters_in_pserv:[12,104,105],local:[0,1,5,7,9,15,16,20,34,40,48,52,56,58,60,72,81,95,99,104,105,109],local_scop:5,local_w1_grad:52,local_w2_grad:52,localhost:[1,78],localpath:28,locat:[8,30,47,55,68,70,74,92,116,120],lock:[8,9,13,14],lod:[20,44,48,65,70,71],lod_desc:65,lod_expand:49,lod_level:[58,65,71],lod_rank_t:71,lod_tensor:[48,65,71],lod_tensor_arrai:71,lodtenosr:19,lodtensor:[19,20,37,44,60,71],lodtensordesc:[44,65],log:[3,13,21,28,33,74,92,95,97,99,100,105,120],log_barrier_abstract:105,log_barrier_lowest_nod:[104,105],log_barrier_show_log:[104,105],log_clip:[104,105],log_error_clip:[104,105],log_period:[99,105,107],log_period_serv:[104,105],logic:[16,21,23,24,33,37,50,51,54,64,70,75],login:[3,100],logit:[33,39],longer:[9,21,40],look:[7,17,18,20,30,31,34,52,53,58,61,62,67,71,92,93,97,99,104,111],lookahead:67,lookup:[37,49,111],lookup_t:40,loop:[5,7,19,30,40,55,59,64],loop_var:70,loss:[6,21,33,35,50,52,53,62,67,74],loss_gard:52,lot:[21,47,49,53,58,62,68,92,104,121],low:[50,51,67,68,70],low_rnn:48,lower:[29,48,49,72,95],lower_level_rnn:48,lowest:105,lpaddle_capi_shar:46,lpaddle_capi_whol:46,lrelu:33,lstm:[99,116],lstmemori:116,lstmlayer:41,luckili:40,mac:[0,46,72,118],machin:[0,3,21,23,30,33,40,42,51,53,62,74,97,99,100,104,105,107,120,121],machine_transl:116,maco:[0,3,87,119],macro:[32,47,61,75],made:[9,14,30,116],mai:[0,1,5,7,21,25,29,31,38,39,40,43,51,55,59,60,64,67,71,77,78,92,97,110,120],main:[3,18,20,24,30,31,35,51,56,60,94,97,109],main_program:[6,25,52],mainli:[15,40,47,68,75,105],mainlin:3,maintain:[7,13,53,58,60,97],majel:8,major:[21,29,39,118],make:[0,4,6,7,8,9,13,14,16,20,21,22,29,30,34,48,49,52,53,54,58,59,60,62,67,70,72,74,75,76,92,93,94,97,109,110,118,119,120,121],make_chan:20,make_channel:20,make_ddim:77,make_function_oper:32,make_vari:66,maker:[60,61],malloc:[68,74],man:27,manag:[3,9,14,15,18,20,21,28,55,64,68,78,96],mandarin:67,mandatori:119,mani:[0,6,8,13,18,20,30,33,38,39,40,49,54,55,58,60,61,64,65,66,69,70,105],manili:35,manipul:[30,58,61,95,119],manner:[53,62,67,68],mantain:40,manual:[21,50,53,61,95,118,119,121],manufactur:30,manylinux1:3,manylinux1_x86_64:[3,63],manylinux:63,map:[4,7,13,24,43,47,58,61,64,66,68,70,105,121],map_fn:70,mapreduc:[4,92],mark:[6,23,33,34,48,49,55,64,109,116,121],marker:55,market:29,master:[4,16,60,63,105,120],mastermind:8,mat:[45,46],mat_cache_row:12,mat_norm:12,mat_normal_shar:12,mat_sparse_row:12,mat_sparse_row_auto_grow:12,mat_sparse_row_id:12,mat_sparse_row_prefetch:12,mat_sparse_row_prefetch_full_s:12,mat_value_shar:12,match:[3,8,29,69,110],matchbox:121,math:[42,45,60,72,74,75,110],mathemat:62,matmul:[7,35,48,70,75],matric:[74,116],matrix:[12,45,46,74,75,104,107,119],matrixptr:74,matrixtyp:46,mattyp:12,matur:96,max:[5,22,24,40,58,105,107,110],max_diff:5,max_length:[49,116],max_relative_error:[5,75],maxim:24,maximum:[7,14,52,75,105,110,116],maxoutfunctor:68,mayb:[7,43,75],md5:10,mean:[0,1,6,8,21,22,24,35,39,49,52,57,59,64,67,75,93,97,105,107,109,110,111,116,121],meant:70,measur:[25,110],mechan:[6,15,19,25,43,58,61,76,97,116],mem:[7,17,49],mem_per_pserv:21,mem_per_train:21,member:[4,24,34,35,47,54,58,64,75],memcpi:[54,110],memori:[0,6,7,12,13,17,29,31,39,42,43,44,47,49,53,55,60,72,74,77,99,105,107,110,111,116],memory_optim:40,memory_threshold_on_load_data:105,memoryalloc:68,memorydesc:43,mention:[0,6,8,13,21,23,30,48,51,53,55],merg:[14,16,22,25,42,48,51,52,54,72,78,105,119],messag:[7,18,20,26,30,31,34,44,55,56,57,58,60,61,65,71,72,99,105],metaclass:75,metadata:[27,97,99],metal:[93,121],metaphor:34,metaplotlib:4,method:[0,1,3,5,7,16,18,19,21,22,24,29,33,34,35,38,39,50,51,58,59,60,64,65,70,74,75,77,78,105,107,109,110],methodolog:53,metric:[25,55],microarchitectur:29,might:[0,7,8,18,20,30,40,56,67,72,74,97,109,118],mileag:110,million:107,min:[22,24,58,97,107,110],min_block:7,min_count:23,min_desc:7,mind:109,mini:[7,9,20,25,26,30,36,48],mini_batch:59,minibatch:[7,25,34,36,56],minim:[7,21,23,24,30,33,50,60,93,105,111,118,119,120],minimum:[67,93,119],minimun:105,minsizerel:[118,119,120],minu:61,minus_grad:61,minusgradop:61,minusop:61,minusopgradmak:61,minusopprotoandcheckermak:61,minut:[0,1,9,16,97],mirror:[1,8,118],mislead:14,miss:[33,105],mistak:30,misus:76,mit:97,mix:[52,55,70,116],mixtur:109,mkdir:[28,78,97,100],mkl:[0,1,39,43,60,68,76],mkl_packed_:41,mkldnn:[39,42,47],mkldnn_:42,mkldnnactiv:42,mkldnnbase:42,mkldnnlayer:42,mkldnnmatrix:42,mkldnnstream:42,mkldnntester:42,mklml:42,mklpack:41,mklpackedgatedrecurrentlay:41,mklpackedgemm:41,mklpackedlstmlay:41,mklpackedrecurrentlay:41,mlp:35,mnist:[11,21,33,34,56,59,60,109],mnist_random_image_batch_read:59,mnist_train:59,mnist_train_batch_read:59,mobil:[29,30,40,60,78],mod:92,mode:[29,41,51,52,72,105],model:[6,7,9,10,18,21,23,24,25,34,39,40,41,50,51,53,60,62,67,70,72,74,78,81,87,93,94,97,105,108],model_list:[105,107],model_path:107,modelparallel:21,modern:40,modif:[2,67],modifi:[21,29,35,62,74,75,92,95,97,116],modul:[21,32,33,49,67,70,75,109],modular:49,moment:109,momentum:[64,76],momentumop:109,mon:99,monitor:[20,55],month:8,more:[0,1,4,5,6,8,9,13,16,17,19,20,21,23,28,29,30,31,32,34,38,40,43,47,48,49,50,55,58,59,60,62,67,68,69,70,74,75,76,77,78,87,92,93,94,99,107,109,110,111,116,120,121],most:[3,4,6,8,16,20,21,31,34,35,47,49,53,55,59,62,67,68,69,74,104,109,110,111,116,121],mostli:[29,121],motiv:60,mount:[0,1,17,92,97,99],mountpath:[97,99],move:[1,9,13,19,28,30,53,97,110,121],movement:110,movidiu:30,mpi:[20,51,100],mpirun:100,mse:[30,34,50,56],much:[9,30,43,50,59,62,70,110],mul:[32,40,58,74,75],mul_grad:75,mul_op:75,mul_result:58,mulgradkernel:75,mulkernel:75,mulop:[32,75],mulopgrad:75,mulopmak:75,mult:[18,31],multi:[25,39,51,54,74,93,95,104,105,109,121],multigradientmachin:54,multipl:[4,5,13,14,16,18,20,21,23,25,30,31,32,38,39,51,52,55,60,67,71,74,75,92,94,97,105,107,109,116],multiple_input:58,multiple_param_attr:58,multipli:[18,74],multithread:52,must:[0,6,14,19,24,40,43,44,47,55,57,58,59,60,66,71,74,75,77,92,93,97,105,107,116,118,120],mutabl:[68,77],mutable_data:[43,68,75,77],mutex:20,mutuable_data:[68,77],mxnet:[7,18,20,30],my_cluster_nam:97,my_external_dns_nam:97,my_lib:92,my_net:52,myerrorclip:24,mypaddl:99,naiv:18,name:[1,3,4,5,6,7,9,11,12,14,17,18,19,21,25,29,32,35,38,42,43,44,46,47,49,55,56,58,60,63,65,66,70,71,74,75,76,87,92,93,99,101,102,105,107,110,111,116,119,121],name_:55,name_prefix:11,namespac:[7,36,45,58,74,75,99],nativ:[29,72],natur:[13,16,23,49,70,107],navig:78,ncall:109,nccl1:51,nccl2:51,ncclinit:51,nchw8:39,nchw8c:39,nchw:[42,47],ndarrai:11,ndk:118,nearest:29,nearli:[5,19],necess:70,necessari:[6,7,14,16,24,25,40,44,49,54,58,66,70,74,100],necessarili:[18,74],neck:51,need:[0,1,2,3,4,5,6,8,12,13,14,16,17,19,20,21,23,24,25,28,30,31,32,33,38,40,43,47,49,50,51,52,53,54,55,57,58,60,61,62,64,65,66,67,68,70,71,74,75,76,77,78,87,92,93,97,99,103,104,105,107,116,118,119,120,121],neighberhood:51,neon:[29,118,120],nervana:30,nessesari:67,nest:[6,7,55,56,71],net:[0,7,33,48,64],netop:[7,60],network:[4,5,6,7,9,12,21,23,25,33,35,39,40,41,42,48,50,53,55,58,59,62,64,66,67,68,71,74,75,77,87,92,93,94,105,110,121],network_config:107,networkadministr:97,neural:[4,6,7,9,21,35,39,40,41,42,48,53,62,64,68,71,77,87,93,94,105,110],neuralnetwork:54,neuron:74,never:[40,59,64,97,99],new_block_idx:58,new_op_and_kernel:76,new_op_desc:24,new_scop:39,new_stat:48,newblock:58,newbuff:43,newer:118,newest:14,newli:[29,119,121],newop:7,newopdesc:58,newprogram:58,newscop:39,newvardesc:58,next:[6,9,15,19,20,24,49,51,70,74,75,97,99,105,109,110,116],nextlay:42,nfs4:97,nfs:97,nfsver:97,nic:[92,104,105],nil:[13,20],nnz:74,no_grad_dict:6,no_grad_set:[5,6,75],no_gradi:6,node1ip:100,node2ip:100,node3ip:100,node:[8,16,18,21,23,35,40,49,51,60,74,92,93,95,96,97,99,100,103,105,121],node_0:97,node_1:97,node_2:97,node_id:92,nodeattr:35,nodeentri:35,nodefil:95,nodesep:35,nohup:92,nois:[9,33,94],noisi:33,non:[9,20,29,30,65,74,75,97,105],none:[4,5,6,7,20,24,25,33,35,36,48,49,50,56,58,66,70,93,111,116],noneedtran:43,nonlinear:74,nor:[0,18],norm:[33,47],normal:[52,53,67,74,99,103,105,116],notat:40,note:[0,1,4,6,7,12,13,17,39,40,44,47,51,59,60,68,75,77,78,92,93,97,105,107,110,119],notebook:[1,17],notest_dist_fit_a_lin:93,noteworthi:30,noth:[0,38,58,64,105],notic:[24,30,51,61,72,74,116],notif:72,notimplementederror:24,notin:39,notingradi:75,notion:70,notori:5,now:[6,8,9,19,20,23,33,44,47,53,60,61,62,64,93,97,105,119],nproc:0,nullptr:[43,55,61,64,74],num:[92,105],num_class:35,num_gradient_serv:[92,104,105],num_hidden:35,num_input:72,num_parameter_serv:4,num_pass:[99,104,105,107],num_pserv:21,num_row:65,num_shard:11,num_step:70,num_train:21,number:[0,7,9,11,23,25,40,53,55,59,60,66,70,74,92,93,96,97,105,109],numdevices_:107,numer:75,numeric_grad:5,numerical_grad:5,numlogicaldevices_:107,numpi:[0,11,29,33,58,59,75],numreal:12,numsampl:110,numtimeout:13,nv_:8,nv_librari:8,nv_test:8,nvcc:[8,29,31],nvidia:[1,29,47,51,68,105,110],nvlink:51,nvprof:55,object:[4,12,19,21,24,25,33,35,40,45,50,55,58,60,62,64,77,110],observ:[74,110],obtain:[16,19,53,68],obvious:[8,47,109],occup:40,occupi:[29,55],occur:40,occurr:7,oct:99,off:[0,1,46,93,103,118,119,120,121],offer:[7,60,66],offici:[8,72,78,97,118],offlin:[9,11,121],offset:12,often:[12,35,40,47,72,92,109],ograd:74,old:[5,14,16,49,60,105],older:[30,118],omega:62,omit:0,omp_num_thread:109,ompi_comm_world_rank:92,onc:[9,13,18,21,23,25,30,34,53,69,72,74,78,97],one:[0,1,4,5,6,7,9,12,13,14,16,17,18,19,20,21,24,25,26,29,30,31,32,33,35,37,38,39,43,44,47,48,49,50,51,53,54,56,57,58,59,60,61,64,65,67,68,69,70,72,74,75,76,93,95,97,99,100,105,107,111,121],onehotcrossentropyopkernel:75,ones:[32,33,60,72],onli:[0,2,3,4,5,6,8,12,13,14,15,16,17,18,19,21,23,24,25,26,28,29,30,33,34,39,40,43,48,49,50,51,54,55,58,60,65,66,67,68,69,70,71,74,75,76,77,78,93,97,99,104,105,107,110,116,119,121],onlin:[9,11,40,59],only_cpu:5,onnx:30,onto:[21,23,97,100],op1:[39,40],op1_2_op2:39,op1_to_op2:39,op2:[39,40],op3:40,op_:75,op_check:75,op_class:[60,66],op_desc:[24,37,57],op_info:111,op_kei:43,op_maker_class:[60,66],op_proto:66,op_registri:111,op_siz:24,op_test:75,op_typ:[60,75,76],op_unique_kei:43,opattrcheck:75,opcreat:66,opdesc:[7,24,34,56,57,58,60,61,66,71],opdescbind:[37,61],opdescbuild:7,opeartor:52,open:[4,11,30,33,42,59,72,96,97,109],openbla:[0,1,118],openmp:109,openmpi:96,opensourc:51,oper:[5,7,18,20,21,22,23,25,26,29,30,31,33,34,35,37,38,39,48,49,50,51,52,55,57,62,64,67,68,71,72,74,76,77,97,105,110,111,116,118],operand:29,operat:52,operator_grad:5,operator_list:55,operatorbas:[7,32,60,61,66,75],operatorwithkernel:[39,75],opinfo:[37,60,61],opinfomak:37,opinfomap:61,opkernel:77,opkernelbas:76,opkernelkei:60,opkerneltyp:[39,47,76],opmak:66,opproto:75,opprotoandcheckermak:[61,75],opprotomak:[66,75],opregist:66,opregistri:66,ops:[5,6,7,8,18,19,31,34,35,52,53,56,57,58,60,68,75,121],ops_:7,ops_test:8,opt:[0,4,50,57,66],opt_op_list:50,optest:75,optestmeta:75,optim:[5,6,21,22,23,31,33,51,53,54,56,60,62,65,67,74,92,93,94,109,110,111,118,120],optimis:50,optimize_op:93,optimize_op_attr:58,optimizer_op:93,option:[4,8,21,33,38,44,56,57,58,60,65,66,67,71,74,92,93,95,107,109,118,119,121],optmization_op_list:50,opts_np:57,optyp:[37,66],opwithkernel:65,order:[0,6,34,44,55,59,62,70,74,94,97,99,105,109,121],oregon:97,org:[11,27,33],organ:89,orient:66,origin:[5,29,33,64,70,72,77],other:[0,7,9,14,18,28,29,30,31,37,39,40,43,47,48,53,57,62,64,66,67,68,71,72,93,94,97,99,107,109,111,116,118,119,120,121],otherwis:[4,6,9,14,16,33,37,43,59,67,72,95,107,116],our:[0,3,4,6,8,19,20,21,23,33,37,40,47,51,53,64,70,72,74,93,97,99,109,116,118],out:[4,7,8,13,16,19,21,24,30,35,39,40,43,48,49,58,75,77,87,97,99,100,105,109,110,116],out_dir:97,out_mem:116,outgrad_:42,outlin:106,output:[0,4,5,6,7,11,16,18,19,23,24,28,31,32,33,34,35,36,37,39,40,43,44,48,49,52,53,56,57,58,59,60,61,64,65,66,68,70,72,74,75,76,77,89,92,99,105,107,109,110,116,118],output_:[42,74],output_all_step:48,output_arg_nam:24,output_lay:87,output_mem:116,output_nam:5,output_num:48,output_path:11,output_seg:70,outputbuff:12,outputgradi:61,outsid:[21,64],outupt:70,outv:74,outval_:42,over:[4,30,40,51,52,53,70,72,74,110],overal:[33,53,55,72,121],overfit:62,overhead:110,overlap:74,overload:[29,38],overrid:[7,9,28,43,68,74,75,77],overview:[13,14,15,68],overwhelm:72,overwrit:[28,92],own:[0,6,14,16,24,26,35,37,50,51,53,62,66,75,92,95,97,118,119],owner:[0,72],pack:[70,118],packag:[0,1,13,17,18,32,42,63,72,75,97,109],pad:[43,67],paddl:[0,1,3,4,7,8,9,11,17,19,21,28,31,32,33,36,41,42,43,44,45,46,48,49,54,56,60,62,63,66,67,68,70,72,74,75,76,78,87,92,93,95,97,99,100,103,105,107,109,110,111,116,118,121],paddle_begin_init_param:14,paddle_dir:75,paddle_element_typ:14,paddle_element_type_float32:14,paddle_element_type_float64:14,paddle_element_type_int32:14,paddle_element_type_int64:14,paddle_element_type_uint32:14,paddle_element_type_uint64:14,paddle_enforc:[7,19,43],paddle_enforce_eq:[75,77],paddle_error:[45,46],paddle_exampl:17,paddle_finish_init_param:14,paddle_get_param:14,paddle_gradi:14,paddle_init_num_gradient_serv:92,paddle_init_param:14,paddle_init_port:92,paddle_init_ports_num:92,paddle_init_ports_num_for_spars:92,paddle_init_pserv:92,paddle_init_trainer_count:92,paddle_init_trainer_id:92,paddle_init_use_gpu:92,paddle_job:17,paddle_manylinux_devel:0,paddle_matrix:[45,46],paddle_matrix_cr:46,paddle_matrix_get_shap:45,paddle_matrix_shap:45,paddle_new_etcd_pserver_cli:14,paddle_new_pserver_cli:14,paddle_on_cloud:17,paddle_output:99,paddle_paramet:14,paddle_pserver2:95,paddle_pserver_cli:14,paddle_pserver_client_releas:14,paddle_save_model:14,paddle_send_grad:14,paddle_train:[46,63,95],paddle_with_cuda:55,paddle_with_mkldnn:47,paddlepaddl:[2,8,9,11,14,15,16,17,18,21,27,28,32,33,34,36,38,44,48,49,50,54,55,58,59,60,64,70,71,72,74,75,76,87,92,94,95,96,100,101,102,103,109,110,116,121],paddlepaddle_gpu:3,paddlepaddlebook:1,paddlepaddlehub:[1,118],page:[72,97],pair:[6,7,21,34,50,55,60],pakcag:8,panic:20,paper:[33,67],para:12,paradigm:[18,26,60],paragraph:48,paragraph_data:48,paragraph_out:48,parallel:[0,18,20,21,23,39,51,52,55,60,92,94,96,97,99,105,107,110],parallel_do_grad:52,parallel_do_op:52,parallel_for:18,parallel_nn:[104,105],paralleldo:[22,52],parallelfor:18,paralleliz:67,param:[5,7,14,52,54,58,68,77],param_attr:[12,58,116],param_config_proto:14,paramattr:116,paramet:[0,5,6,7,8,10,12,16,18,21,22,24,26,28,30,31,33,34,35,37,44,48,50,51,56,59,64,66,67,70,72,74,75,76,77,81,87,94,95,107,108,111,119],parameter_block_s:[104,105],parameter_block_size_for_spars:[104,105],parameter_list:[6,50],parameter_nam:4,parameter_serv:4,parameter_valu:12,parameterattribut:12,parameterclient_:12,parametermap:74,parametermutex_:12,parameters_:74,parameters_and_grad:50,parameterserver2:12,parameterset:4,parameterupdat:54,parameterupdater_:12,params_grad:[50,93],paramt:[92,97],paraspars:74,parent:[7,18,56,58,60,74],parent_:[7,64],parent_block:52,parent_idx:58,parenthes:60,pars:[0,8,21,35,97,107],part:[6,7,16,21,30,43,44,56,58,67,68,74,92,93,94,109,110,116,121],parti:[110,118,119,120],particip:75,particular:[34,39,44,60,67,110],partit:[9,11,21,23,60,94,97],pass:[6,7,9,20,24,25,30,33,40,44,50,52,53,54,57,58,59,60,62,64,67,69,70,72,74,92,93,95,97,99,105,110],pass_id:[21,93],pass_idx:59,pass_num:93,passtyp:74,password:100,past:[1,4,87,97],patch:27,path:[0,1,9,13,14,17,40,49,59,67,92,97,99,105,107,118,119,120],path_to_paddlepaddle_working_directori:78,pattern:[9,45,53,62,97],paus:[9,16],pcie:51,peer:51,pem:[4,11,97],pend:[9,13],peopl:0,pep425tag:3,pep8:72,per:[9,14,51,53,59,62,75,105],percal:109,perf_test:109,perfectli:67,perfom:[105,107],perform:[0,5,14,20,21,25,29,30,33,39,40,51,54,55,59,60,62,67,68,74,75,93,96,104,108,116,118,119,120],perftool:[55,109],period:[9,16,105],permiss:97,persist:[26,65,67,71,97],persistentvolum:97,persistentvolumeclaim:97,person:[4,38],perspect:[60,110],perturb:[5,74],pex:121,pfs:[11,28],pfsclient:11,pfspath:28,pgp:97,phase:[43,49,51,53,59,61,67,121],philosophi:[53,62],photo:33,physic:[119,121],pick:[97,119],pickl:[92,100],pictur:51,piec:[18,55,77,93],pil:92,pillow:17,ping:72,pip:[0,2,63,72,78,87,109],pipelin:[25,67],pivot:43,pixel:21,place:[6,7,9,16,21,23,26,38,39,43,51,52,60,74,77,93,110,111],place_:[38,39,47,68],place_list:71,place_typ:76,placehold:[33,68,77],placement:23,plain:[17,44,46,47],plan:[9,18,43,60,67,74,118],platform:[3,7,31,39,43,47,55,68,72,75,76,77,92,96,97,111,118,119,120],pleas:[0,1,3,4,9,13,14,15,18,20,31,35,47,48,58,59,60,67,68,71,72,74,75,77,78,93,97,103,109,116,118,119,120],plot:4,plug:[51,53],pne:75,pnpairvalidationlay:105,pnpairvalidationpredict_fil:104,pod:[11,17,18,71,97,99],pod_nam:97,podtyp:71,point:[0,7,9,17,20,29,40,43,51,68,72,75,77,109,110,118,121],pointer:[7,14,35,40,47,58,60,64,68,77],polici:97,poll:20,pollut:16,polyak:53,ponit:35,pool3:74,pool:[22,40,67],pop:[7,26],popul:14,popular:[8,33,35,55],port:[8,18,92,93,97,99,104,105,109],port_num:104,portabl:35,portal:78,ports_num:[92,105],ports_num_for_spars:[12,92,104,105,107],pose:9,possibl:[4,7,13,20,23,40,58,62,71,110],post:[0,17,27],postpon:62,potenti:[29,110],power:[29,40,51,67,77,121],ppo_workspac:78,pprof:109,practic:[74,116],pre:[0,4,14,38,40,72,97,99,118,120],pre_activ:58,pre_bia:58,pre_stat:[48,70],preambl:58,precis:[0,25,29,53],precompil:26,pred:[35,40],predecessor:40,predetermin:105,predict:[21,52,62,81,87,105,116],predict_fil:105,predict_output_dir:[104,105],prefer:[30,38],prefetch:[12,52,74],prefix:[9,11,49,67,97],pregel:20,pregrad:74,prepand:58,prepar:[5,17,54,67,94,101,116],prepend:58,prepend_oper:58,preprocess:[67,70,99],present:[4,6,7,55,70],preserv:28,prev_batch_st:[104,105],prevent:[4,9,13,16,24,62,109],preview:[60,78],previou:[6,9,23,28,48,49,74,97,105,109],previous:99,previous_memori:7,price:[60,87],prim:43,primari:[30,34],primarili:[53,62],primer:72,primit:[29,42,43,51,52,70],primitive_desc:43,primitivedesc:43,principl:[4,8,47],print:[3,4,20,21,30,35,58,69,87,100,105,109],print_graphviz:35,printallstatu:110,println:20,printstatu:110,prior:52,prioriti:60,privat:[7,46,55,58,64,65,66,68,70,72,77],privileg:[0,97],pro:51,prob:87,probabl:[1,49,67,72,116],problem:[0,3,4,5,8,16,19,30,33,34,53,60,62],proc:1,proce:[1,9,59,97],procedur:[7,44,77,119],process:[0,4,6,7,11,12,13,16,18,19,20,21,25,26,30,31,35,39,40,42,44,51,52,62,66,72,95,96,97,99,105,107,109,116],processor:[29,110],produc:[9,30,35,59],product:[17,30,74,97,99],productgraph:99,prof:109,profil:[28,55,67],profilerst:55,proflier:[55,110],program:[4,6,11,14,16,21,23,26,34,36,40,50,51,52,55,59,60,64,69,71,95,105,109,110],programdesc:[18,21,26,30,40,44,52,57,58,61,71],programm:[21,30,58],progress:[9,13,105],project:[17,46,67,72,74,75,116],promis:49,prompt:[28,30,93],prone:4,pronunc:67,prop_kind:43,propag:[6,30,53,75,105,107],proper:[38,103],properli:[0,38,93],properti:[35,62,105],propos:[7,22,23,49,50,51,53,70],proprietari:42,protect:[19,29,66,74,75],proto:[20,38,44,47,56,60,66,71,75],proto_:66,protobuf:[7,17,18,21,26,30,31,34,35,40,44,56,58,60,61,66],protoc:[118,120],protocol:[105,111,121],provi:92,provid:[1,4,7,14,17,18,25,26,29,30,33,35,37,38,47,51,53,55,58,62,66,67,68,69,70,77,87,92,93,95,96,97,109,110,118,119,121],providermemory_threshold_on_load_data:104,provis:[97,121],prune:7,ps_desir:9,pserver:[12,14,15,17,60,92,93,97,104,105],pserver_addr:14,pserver_cpu:17,pserver_endpoint:93,pserver_id:10,pserver_mem:17,pserver_num_thread:[12,104,105],pserver_prog:93,pserver_startup:93,pserverstart_pserv:104,pseudo:[4,6,17,61,70],pseudocod:70,psize:74,ptr:[46,68],pub:100,publish:118,pull:[8,60,63,72,118],purpos:[9,21,23,38,110],push:[7,26,30,55,63,72],push_back:74,put:[8,9,12,23,40,43,58,68,74,93,99,118],pvc:97,pwd:[0,1,78,118],pxe:121,pybind:[7,20,29],pypi:3,python2:109,python3:3,python:[0,1,3,4,7,15,19,20,25,26,30,32,33,34,35,38,45,49,52,54,55,60,63,68,70,72,78,87,92,93,100,111,116],pytorch:[30,55],qualcomm:29,queri:97,question:[4,18,23,66,93,97],queue:[20,23],quick:[35,86,105],quick_start:[17,97,99,101],quick_start_data:99,quickli:[49,58,60],quickstart:99,quit:[49,110],r14b:118,rais:[24,35,92],rajathkmp:33,ran:[23,110],rand:[33,105,107,110],random:[11,20,33,47,54,58,59,75,94,105],random_imag:11,randomli:16,randomnumberse:104,rang:[11,18,21,29,33,40,55,59,66,72,93,105,107],rank0:51,rank1:51,rank:[4,70,77,97],rankdir:35,rapid:61,raspberry_pi:120,raspberrypi:120,raspbian:120,rate:[14,67,74],rather:[6,17,33,70,97],ratio:105,raw:44,rdma:105,rdma_tcp:[104,105],reach:[9,40,51,110],read:[0,1,4,6,9,11,18,19,20,21,23,30,31,59,60,67,70,78,94,97,103,116,118,121],read_from_arrai:40,read_from_realistic_imag:4,read_from_rng:4,read_lock:10,read_minibatch:30,read_mnist_imag:4,read_ranking_model_data:4,readabl:[55,60,109],reader:[11,21,29,33,34,56,67,71,92,93,109],reader_cr:11,reader_creator_bool:59,reader_creator_random_imag:59,reader_creator_random_image_and_label:59,readi:[9,20,97,99,121],readlockguard:12,readm:46,readnext:19,readwritebuffer_:12,readwritemani:97,real:[12,33,59,92],realist:4,realiti:67,realiz:[7,48],realli:[30,62],reaon:76,reason:[4,5,9,20,30,72,93,99],receiv:[9,17,20,21,23,48,93],recent:[40,53],reciev:105,recognit:67,recommend:[0,1,2,4,72,74,78,95,103,105,116,118],recompil:110,record:[13,43,55,66,97],recordev:55,recordio:[4,11,13,19],recov:[9,70],recover:60,recoveri:13,recurr:[41,48,64,67,114],recurrent_group:[67,116],recurrent_op:70,recurrentgradientmachin:[46,49,70],recurrentlay:[41,105],recurs:[6,7,8,28,40,60],recv:[18,21,23,51,97],recvparametertyp:12,red:[33,109],reduc:[1,23,29,51,60,72,95,105,107,109],reduce_by_kei:60,reduce_mean:33,refactor:[21,23,34,49,53,54,58,62,70],refer:[0,1,7,9,13,14,15,18,29,35,43,47,48,51,56,58,60,62,64,68,70,71,74,75,77,93,99,116,118,119],referenc:13,reflect:13,reformat:72,refrain:75,reg:66,regard:[19,121],region:[64,110],regist:[20,39,40,47,61,68,74,110],register_gpu_profil:110,register_lay:74,register_op:[32,60,61,66,75],register_op_cpu_kernel:[68,75],register_op_cuda_kernel:[68,75],register_op_kernel:76,register_op_without_gradi:[60,75],register_oper:[37,61],register_tim:12,register_timer_info:110,registerop:66,registr:[75,76,111],registri:[17,37,68,99,121],regular:[6,74,92,97],reinit:19,reiniti:[19,43],reinstal:0,rel:[5,16,62,75,118],relat:[9,16,17,29,39,47,52,55,64,72,99,109,119,120,121],relationship:[61,68],releas:[63,67,97,118,119,120],relev:75,reli:[5,18,49,50,53,62,75,109],reliabl:[9,62],relu1:35,relu2:35,relu:[33,35,40,74],relwithdebinfo:109,remain:70,rememb:72,remot:[8,12,21,60,72,74,97,105,107],remoteparameterupdat:[12,15,105],remov:[6,21,28,30,49,72,105,118,119],removing_docker_contain:0,renam:[3,6,28,29],reorder:43,reorder_primit:43,repeat:[7,34,56,57,65,66,71,109],repeatedli:[34,40],replac:[8,13,37,53,61,67],repli:72,replic:21,replicaset:17,repo:[8,72,78,120],report:[13,29,30,55,110],reportdataset:13,repositori:[78,118],repres:[6,7,13,18,21,23,24,30,35,44,47,49,52,53,58,60,62,65,68,70,71,74,97,116],represent:[14,21,31,33,34,40,47,49,65],reproduc:0,request:[8,9,12,16,18,60,63,72,97,99,121],requir:[3,4,6,9,14,16,17,19,21,23,24,28,29,35,40,42,48,52,53,55,56,57,60,62,65,66,67,71,72,74,75,78,92,97,99,118,120,121],requisit:40,rerun:75,research:[21,30],reserv:28,reserveoutput:74,reset:[9,25],reset_program:25,resetingrad:42,resetinvalu:42,resetoutgrad:42,resetoutvalu:42,resetxxx:42,reshap:[5,59,77],resid:0,residu:20,resiz:[12,68,75,77],resolv:[8,72,99],resourc:[21,26,51,55,68,76,97],respect:[5,19,24,29,33,48,74,105,116],respons:[12,20,21,25,33,51,53,54,62,97,99],rest:[7,17,27,31,39,121],restart:[9,14,97,99,121],restartpolici:[97,99],restor:[5,53],restrict:[62,64,105,109],result:[5,6,13,20,25,33,34,35,40,44,49,50,51,54,75,77,97,105,109,110,111],resum:16,retain:77,retran:97,retriev:[7,49,64,74,99,109],retriv:92,reuqest:63,reus:[7,16,49,59,60,74,75],rev:0,revamp:21,reveal:[4,109],revers:[6,116],review:[18,99],reviews_electronics_5:99,rewrit:[8,20,75],rid:[19,30],right:[5,6,7,8,17,25,40,60,62,72,76],ring:51,risk:6,rkt:[0,17],rmsprop:53,rmspropoptim:53,rnn:[7,30,33,49,58,60,64,67,104,108],rnn_bias_attr:116,rnn_layer_attr:116,rnn_out:116,rnn_output:70,rnn_use_batch:[41,104,105],rnnalgorithm:49,rnnstep:70,roadmap:[67,70],rocmplac:47,role:[4,13,14,21,51,93,97],rollback:58,root:[0,6,51,97,99,119],roughli:67,round:[29,51],routin:[29,42,51],row:[12,20,74],rows_:65,rpc:13,rpcserver:13,rpi:120,rpi_arm_neon:120,rpi_toolchain:120,rsize:97,rtk:121,rule:[6,21,24,30,34,74,97],run:[2,3,4,5,6,7,8,9,17,18,19,20,21,22,23,25,29,30,31,32,33,34,35,39,40,43,47,48,50,51,52,53,55,56,57,58,60,63,64,65,67,68,69,72,74,76,77,78,87,92,93,94,95,96,97,100,101,102,103,105,109,110,118,119,120,121],run_test:0,runinitfunct:110,runnabl:23,running_on_cloud:17,runserv:78,runtim:[1,7,18,20,21,37,48,60,71,76,95,118],runtime_table_:7,s_block:6,s_recurrent_group:116,safe:17,sai:[19,31,34,36,40,59,105,107],said:30,sake:74,same:[0,4,5,13,14,16,18,19,20,21,32,33,35,38,39,40,48,49,51,58,60,61,64,67,70,75,76,77,93,95,97,107,116],sampl:[1,25,33,58,66,92,95,105,107],sampler:33,satifi:40,satisfi:[3,8,43,65,97],save:[0,9,11,13,14,17,18,21,34,35,40,44,53,65,71,92,97,99,105,107],save_dir:[99,105,107],save_only_on:[104,105],save_persist:93,saving_period:[104,105],saving_period_by_batch:[104,105,107],scalabl:60,scalar:[6,7,36,69,70],scale:[21,23,53,61,66,67,75,96],scaleop:75,scaleopmak:[60,75],scan:[6,13,40,60],scatter:[6,51],scenario:[49,104],scene:104,schdule:97,schedul:[13,17,23,97],scheme:[12,62,75],scienc:40,scope:[5,18,22,26,31,39,52,111],score:49,scorer:67,scp:100,script:[0,51,75,92,95,96,97,100,118],sdk:119,search:[0,9,64,105,116],second:[4,18,28,30,33,35,48,49,56,57,59,64,66,75,95],secret:97,section:[6,23,30,58,72,74,94,97,109,116],see:[4,6,9,18,20,23,29,30,58,67,72,75,77,93,97,109,110],seed:[105,110],seem:[3,8,20,29,30,67],seen:[62,75],segment:[48,70,77],sel:20,select:[49,97],selected_generation_scor:49,selected_id:49,selected_row:[65,71],selected_rows_desc:65,selected_scor:49,selectedrow:[37,71],selector:99,self:[5,24,25,33,35,40,41,42,44,50,58,70,74,75],self_addr:18,semant:[4,49,63],semaphor:20,semat:4,send:[9,14,18,21,23,38,51,60,66,72,93,94,97,105],send_back_parameter_typ:12,sendbackparameterspars:12,sendbackparametertyp:12,sendparameterrequest:12,sendparameterrespons:12,sens:[53,62,72,109],sent:[4,14,18,21,60,66,71,99],sentenc:[30,48,49,70,116],sentence_input:70,separ:[14,21,32,53,61,62,92,93,105],seper:[52,70],seq_len:70,sequenc:[6,7,18,26,30,34,41,50,56,67,70,72,74,114],sequenti:[7,18,20,116],seri:[3,19],serial:[7,13,44,52,54,60,71],serializ:[60,71],serv:[1,21,29,60,70,92,97,110],server:[0,4,8,12,15,16,21,31,51,60,74,94,95,103,104,121],server_endpoint:93,serverless:9,servic:[92,109,121],sess:[33,35,50],session:[35,50,57,110],set:[0,4,6,9,17,19,33,37,40,43,47,48,49,55,57,58,60,61,64,67,68,70,72,74,75,77,78,81,92,93,95,97,99,104,105,107,108,109,110,116,119,120],set_active_typ:74,set_attr:24,set_drop_r:74,set_float_el:5,set_input:24,set_output:24,set_shap:19,set_siz:74,set_typ:[24,74],setdatatyp:65,setdefault:75,setp:97,setq:0,settup:74,setup:[21,53,63,74,75,121],seven:67,sever:[0,5,12,21,23,33,48,49,51,54,55,58,65,68,70,95,96,97,107],sexstant:121,sgd:[4,9,17,23,52,53,54,65,93,94,111],sgd_optim:[93,111],sgdasync_count:104,shall:[6,8],shape:[5,6,7,19,21,33,36,47,48,56,58,60,65,67,68,75,77,93,111],shapes_:19,shard:[9,10,11,12,13,14,16,21,23,94,97],share:[0,8,19,33,46,54,58,60,62,67,68,70,75,105,110],shared_librari:8,shared_ptr:[43,45,46,64,68,77],shell:[1,97],shoul:5,should:[4,5,6,7,14,17,20,21,24,25,29,31,32,33,37,38,39,43,47,48,49,50,53,54,55,56,59,60,61,62,65,66,67,69,70,71,75,76,78,87,93,95,97,116,118],should_be_fals:4,should_be_tru:4,show:[0,3,6,7,9,19,20,28,30,36,40,44,48,51,53,56,69,70,77,94,97,99,105],show_check_sparse_distribution_log:[104,105],show_layer_stat:[104,105],show_parameter_stats_period:[99,104,105,107],shown:[4,21,25,51,52,55,67,74,77,97,110,116],shrink:74,shrunk:24,shuffl:[19,21,93],shuffleread:19,sid:97,side:[21,25,40,54,77,94],sig:97,sigint:95,sigmod:66,sigmod_op:66,sigmod_output:66,sigmoid:[7,66,70,74],sign:[27,44,97],signal:95,signatur:97,signific:[67,110],silent:92,similar:[7,18,20,21,23,26,30,39,49,53,55,59,60,62,67,68,70,75,97,109,121],similarli:[30,40,75],simpl:[18,23,29,31,34,35,40,48,53,56,62,64,66,67,70,93,105,110],simple_attent:116,simple_gru:116,simple_rnn:116,simpler:54,simplest:97,simpli:[1,4,14,21,87,110,116],simplifi:[4,49,58,66,67,74,99],simul:[30,119],simultan:97,sinc:[9,13,15,16,20,21,22,23,30,37,40,43,47,53,58,59,61,62,70,77,93,97,110,119,121],sincer:72,singl:[6,9,19,21,23,25,29,38,51,52,60,64,67,74,87,94,99,109],singleton:[18,22],sit:21,site:[8,97,109],situat:[6,39,57],size:[1,9,11,12,14,20,21,29,33,40,44,49,52,53,58,59,65,66,67,68,70,74,75,77,87,93,105,111,116,118,119,120],size_in_byt:43,size_t:[12,19,68,70,74],sizeof:7,skip:[6,59,72,95,97],slice:18,slide:9,slight:30,slightli:33,slow:110,slowli:[0,109],small:[5,18,31,33,42,49,72,74,105],small_messag:[104,105],smaller:[5,9,29,49,72],smart:64,snap:99,snapdragon:29,snapshot:[10,16,97],snippet:[20,32,50,74,97,110,116],sock:17,sock_recv_buf_s:[104,105],sock_send_buf_s:[104,105],socket:105,softmax:[4,7,21,23,30,35,36,49,52,56,74,116],softmax_grad:52,softmaxoutput:35,softwar:[29,55,110,121],solid:33,solut:[51,121],solv:[4,6,19,40,60],some:[0,4,6,7,8,12,13,14,16,17,19,20,21,23,24,29,31,32,33,34,38,39,40,43,47,48,49,50,56,57,58,59,60,61,64,68,70,72,74,75,76,77,92,97,104,105,107,110,118,119,120,121],some_c_api_funct:46,some_inst:46,some_op:[37,48,70],some_python_class:45,somecppclass:45,somegotyp:45,someth:[0,6,12,58,72,109],sometim:[0,55,59,110],somewhat:14,somewher:64,soon:9,sophist:74,sort:[70,97,105,109],sort_by_length:70,sortagrad:67,sourc:[5,8,28,30,33,42,44,46,49,59,60,96,97,99,109,116,119],source_dict_dim:[49,116],source_dict_s:49,source_language_word:[49,116],space:[0,23,29,58,62,67,110,116],span:55,spars:[12,20,74,77,92,97,105],sparse_remot:12,sparse_upd:12,sparseparam:74,sparseprefetchrowcpumatrix:74,speak:116,spec:[97,99],specfii:105,special:[6,14,21,29,31,37,47,49,50,75],specialvartypeinfer:37,specif:[6,8,9,19,21,24,28,31,49,60,64,68,75,93,107,118],specifi:[1,4,5,12,13,14,17,18,20,21,22,24,25,26,28,33,44,55,58,64,66,70,72,74,77,78,97,105,109,116,118,119],spectrogram:67,speech:67,speed:[0,29,44,51,53,121],speedup:55,sphinx:[45,78],split:[16,18,19,22,30,36,49,60,70,92,94,97,107],split_count:[92,97],spread:6,squar:35,square_error_cost:[93,111],srand:105,src:[8,43,92],src_backward:116,src_embed:[49,116],src_forward:116,src_primitive_desc:43,src_word_id:[49,116],src_word_vec:49,ssh:[97,99,100,120],ssh_server:95,sstabl:4,stabil:[5,40,75],stabl:[63,97],stack:[26,60,70,97],stage:[8,15,22,33,40,43,67,71,95,118],stale:9,stamp:110,standalon:118,standard:[0,3,20,30,60,62,67,109],stanford:[5,99],star:8,start:[0,1,3,6,8,9,12,13,14,16,17,20,21,22,49,51,54,55,95,100,105,109,110,116,117],start_mpi_train:100,start_op_idx:6,start_pass:[104,105],start_program:52,start_pserv:105,startup:[9,17,30,97],stat:[105,110],state:[7,9,25,26,48,49,55,64,67,70,76,99,105,116],statem:40,statement:[20,30,34,40,74,97],static_cast:[43,77],staticinput:116,statist:[25,55,105],statset:110,statu:[17,49,97,99,110],status:99,std:[8,12,19,35,37,38,43,45,46,55,57,60,61,64,66,68,74,75,77,105],stdbuf:92,stderr:95,stdout:95,step:[0,1,5,7,9,14,20,21,23,25,30,33,34,41,49,52,53,54,58,60,66,67,70,72,74,87,94,97,99,100,109,110,116,118,120,121],step_gradi:6,step_id:70,step_input:70,step_net:7,step_output:70,step_scop:[60,71],stepnet:[7,48,60,64],still:[3,6,13,16,21,30,40,61,77],stirng:58,stmt1482205552000:97,stmt1482205746000:97,stochast:[9,13,16,53,94],stop:[0,58,95,99,105],stop_gradi:58,storag:[27,29,92,97,99],store:[5,7,8,12,26,35,37,44,47,49,54,56,58,60,61,62,64,70,74,75,77,78,92,94,97,99,100,105,119,120],str:[6,17,70,107],straight:[56,59,65],straightforward:43,strategi:[9,58,105],stream:[21,43,55,68,76],stream_:68,strict:[59,94],stride:[43,47,67],string:[6,7,13,28,35,38,44,55,56,57,58,60,61,64,65,66,71,74,75,97,105],strip:109,strongli:103,struct:[13,14,27,29,37,38,39,46,47,55,61,66,76],structur:[6,7,13,30,33,44,49,56,58,60,65,95,97],sts:97,stuff:72,style:[0,60,66],sub:[4,6,16,18,23,33,40,48,51,54,58,74,116,118],sub_block:6,subclass:58,subcommand:28,subgraph:[23,33],submiss:21,submit:[43,60,78,96,97,104,105],subnet0:97,subnet:[4,97],subobjectpath:99,subsequ:51,subset:74,subtract:5,succ:40,succeed:[13,99],success:[14,97,99],successfulcr:99,successfulli:75,successor:105,sucess:40,sucessor:40,sudo:[0,97],suffer:5,suffici:105,suffix:[3,17,92],suggest:[8,72,110],suit:121,suitabl:[65,68,105],sum:[6,7,10,22,37,58,74,116],sum_op:6,summar:[33,55],summari:55,sumopgradmak:61,supercomput:40,suppli:65,support:[0,1,3,5,7,9,16,17,18,20,21,23,30,32,33,39,40,43,44,47,49,53,54,55,57,59,60,61,62,65,67,71,74,75,77,78,87,92,94,96,97,105,110,114,116,118,119,120,121],support_inplac:40,suppos:[8,18,32,65,74],suppress:28,sure:[0,74,93,97,109],svs:66,swagger:27,swig:[0,15,45,46,118,119],switchop:7,sychron:51,symbol:[3,7,35,46],symbols_ready_:7,symbolt:[7,60],symlink:72,sync:[9,53,62,105],sync_with_cpp:109,syncflag:74,synchron:[9,13,20,43,51,55,94,97,105],syntax:[18,26,30,49,59],sysroot:118,system:[0,1,3,7,8,9,14,16,20,21,23,27,32,33,40,42,67,75,78,92,96,99,109,118],tab:3,tabl:[7,18,30,37,44,65,71,119],tablelookup:65,tablelookupgrad:65,tablelookupop:65,tag:[1,63,103,116],tail:49,take:[0,4,6,7,8,9,16,18,19,20,21,24,26,29,31,33,34,36,37,39,40,43,47,53,56,57,58,59,60,61,68,70,72,74,75,76,92,93,97,99,103,109,110,116],taken:[24,35,40,47,70],talk:[14,31,120],tangl:109,tanh:[33,49,74,116],tar:97,tarbal:97,target:[0,6,7,8,24,26,33,35,50,57,60,75,116,118,119,120],target_block:[6,24],target_dict_dim:116,target_dict_s:49,target_language_word:116,target_link_librari:8,target_word:49,task13:67,task14:67,task:[21,44,49,55,66,93,107,116],task_queu:13,taskentri:13,taskqueu:13,tbd:[15,43,67,79,80,82,83,84,85,88,90,91,98,112,113,115],tcp:[97,105],tear:110,technic:[6,9,93],techniqu:[40,74,109,116],technolog:[0,30],tee:99,tell:[1,9,13,14,49,66,110,118],templat:[32,43,66,68,75,76,77,99,121],tempor:67,temporari:[6,17,26,40,53,58],tempori:40,ten:0,tensor:[5,8,18,20,22,23,29,30,31,33,35,37,38,43,44,47,48,49,52,65,70,71,75,111],tensor_arrai:18,tensor_array_read:70,tensor_array_s:70,tensor_array_stack:70,tensor_array_unstack:70,tensor_array_writ:70,tensor_data:44,tensor_in:39,tensor_s:5,tensor_test:8,tensor_to_check:5,tensorarrai:22,tensorarraydesc:70,tensordesc:[44,65],tensorflow:[7,18,20,21,23,30,33,36,62,70,77],term:[9,61,62,67],termin:99,terminolog:40,tessorarrai:70,test1:11,test:[4,5,8,35,46,53,59,63,77,81,87,92,93,100,104,110,111],test_:75,test_all_data_in_one_period:99,test_check_grad_ingore_i:75,test_check_grad_ingore_x:75,test_check_grad_norm:75,test_check_output:75,test_data_dir:92,test_fcgrad:74,test_gpuprofil:110,test_layergrad:74,test_mkldnn:42,test_mklpack:41,test_mul_op:75,test_norm:75,test_pass:[104,105,107],test_period:[104,105,107],test_recurrent_op:72,test_sum_op:0,test_wait:[104,105],testa:4,testb:4,testbilinearfwdbwd:110,testcas:75,testconfig:74,testfcgrad:74,testfclay:74,testlayergrad:74,testmodel_list:104,testmulop:75,testq:4,testsave_dir:104,testutil:74,text1:28,text:[4,44,48,55,67,97],tflop:110,tftp:121,tgz:3,than:[0,6,9,17,18,19,24,30,31,32,33,58,60,62,69,70,74,92,93,97,116,118,121],the_step:30,theano:30,thei:[0,4,6,8,9,14,16,18,19,20,23,24,28,30,33,34,38,40,49,50,55,58,60,66,70,72,74,75,77,93,94,97,104,110,116],them:[0,4,5,6,8,9,12,17,19,20,23,24,30,31,32,37,38,39,40,49,58,59,60,61,64,65,66,70,71,72,75,78,93,97,104,105,110],themselv:[6,8],theori:[30,110],therefor:[6,40,53],therein:7,theta:33,theta_d:33,theta_g:33,thi:[0,1,3,4,5,6,7,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,29,30,31,32,33,34,35,38,39,40,43,47,48,49,50,51,52,53,54,55,56,58,59,60,61,62,65,66,67,68,70,72,74,75,76,77,78,87,92,93,94,97,99,103,105,107,109,110,111,116,118,119,120,121],thin:37,thing:[21,33,60,68,110],think:[4,8,93],third:[9,35,75,109,110,118,119,120],third_parti:[42,118,119,120],those:[7,8,9,32,34,35,36,56,118],though:[70,121],thought:[8,110],thread:[18,20,22,52,55,74,92,105,107,109,110],thread_count:22,thread_id:55,thread_id_:55,thread_local_rand_use_global_se:[104,105],thread_pool:22,threadid:107,threadloc:110,threadpool:18,three:[5,6,9,20,25,29,30,31,34,43,49,50,54,55,56,59,67,68,105,118],threshold:[9,13,24,72,105],through:[6,8,9,13,15,25,40,50,53,74,75,78,92,110,111,116,119],throughout:26,throughput:[52,92,110],thrust:60,thu:[16,25,35,40,67,74,97],tier:99,time:[0,4,5,8,9,13,16,19,20,21,23,24,30,32,37,40,41,47,48,49,51,55,58,59,60,61,65,66,67,70,71,77,99,105,107,109,110,116,121],timelin:[55,60,110],timeo:97,timeout:[9,13],timestamp:10,timestep:64,tip:[118,119],titan:40,tls:27,tmp:58,todo:[7,9,13,16,49,66,67],togeth:[6,70,93,116],token:[4,67,116],toler:[0,5,75],too:[5,18,20,24,39,43,70],took:121,tool:[0,3,55,93,96,97,109,116,118,120],toolchain:[109,118,119],toolkit:67,top:[48,49,67,75],top_k:49,top_level_rnn:48,topic:43,topk_generated_scor:49,topk_id:49,topk_scor:49,toplevel:0,topolog:[4,9,21,35,40,44,54],topoloi:35,torch:[7,30],total:[9,23,25,51,55,59,92,96,99,109,110,121],total_pass:59,tottim:109,toward:30,trace:[7,31,33],track:[9,13,35,58],tradit:[7,29,67],traffic:21,train:[6,7,11,13,14,16,18,19,24,25,26,30,31,33,34,40,41,44,51,53,54,55,56,57,58,60,62,65,67,68,71,74,81,96,100,101,102,104,108,110,116,119],train_config_dir:97,train_data:92,train_data_dir:92,train_id:97,train_list:92,train_loop:30,train_read:[21,93],trainabl:[44,58],traindot_period:104,trainer:[4,10,11,12,13,15,21,23,31,41,42,53,54,60,74,93,94,95,105,107],trainer_config:[97,99],trainer_config_help:74,trainer_count:[87,92,97,99,104,105,107],trainer_cpu:17,trainer_cr:17,trainer_gpu:17,trainer_id:[92,97,105],trainer_intern:12,trainer_mem:17,trainer_packag:17,trainer_prog:21,trainerid:16,training_rol:93,trainingjob:21,trainingtest_period:104,trainonebatch:12,tran:[43,74,105],trans_var:39,transact:[9,13],transcript:67,transfer:[40,55],transform:[60,67,74,77,116],translat:40,translation_id:49,translation_scor:49,transpar:[49,95],transpil:[18,93],transport:105,transpos:74,travers:[6,34,40],travi:72,treat:[7,14,40],treatment:[14,29],tree:[7,18,26,30,58,105,111,120],trg_dic_siz:49,trg_embed:[49,116],trick:49,tricki:45,trigger:[16,19,54],trivial:[49,70],true_block:[7,36,56],true_imag:59,true_label:59,true_neg:25,true_posit:25,true_read:59,tune:[67,108,109],tuninglog_barrier_abstract:104,tupl:[6,58,59],turn:[1,58,59],tutori:[0,1,74,75,97,100,101,102,109,110,114,116,119],twice:[23,33,93],twine:63,two:[0,4,6,14,15,16,17,18,19,20,21,25,28,29,30,31,33,34,37,39,40,44,47,49,53,55,56,59,60,61,62,64,65,66,67,70,71,75,77,95,97,107,110,116,118,120],txt:[8,17,28,41,42,74,78,92,97,100],type:[4,6,7,9,12,13,16,17,19,21,27,28,29,31,37,38,39,43,44,45,46,48,49,56,57,58,59,60,61,62,65,66,67,68,71,74,76,77,87,92,94,97,99,103,105,107,116,119],type_nam:66,typedef:[14,29,45,46,47,68,76],typeerror:24,typeid:66,typenam:[32,66,68,75,76,77],typic:[21,110,119],ubuntu:[3,63,87,109],ubyt:59,uci_h:[87,93],uid:99,uint16_t:29,uint32:[27,44],uint32_t:55,uint64:[44,45],uint64_t:45,unawar:14,unbalanc:105,unbias:5,unblock:20,unbound:[40,116],unbuff:20,unclear:16,uncreat:6,under:[0,1,8,13,23,39,51,77,78,92,95,96,97],underli:[19,49],understand:[30,58,67,109,110,121],understand_senti:116,undeterminist:110,uni:67,unidirect:67,unifi:[19,26,35,65,72],uniform:[11,33,58,59,105],uniform_random:58,uniniti:6,uninstal:0,uniqu:[4,7,9,16,17,43,47,58,64,75,93,97,103,105],unique_nam:58,unique_name_gener:58,unique_ptr:[61,64,68,74],unit:[0,8,53,55,62,68,77,81,116],unittest:[46,72,75],unittestcheckgrad_ep:104,unix:20,unk:[65,71],unlik:[49,75],unnecessari:[6,67,72],unordered_map:64,unpack:70,unrol:48,unseen:62,unsign:[14,29],unstack:70,unstack_from:70,unsupervis:33,unsupport:75,until:[9,14,20,22,23,30,40,64,93,97],unzip:118,updat:[3,6,9,13,14,21,27,29,33,48,49,50,51,53,54,64,67,70,74,94,105,107,109],update_memori:7,update_op:50,updatecallback:74,updatestack:97,upgrad:[0,3,51],upload:[9,17,20,27,63,94],upon:9,upstream:72,uri:97,url:72,usag:[29,36,40,54,58,69,75,110,119],use:[0,1,3,4,5,7,8,9,15,19,20,21,22,23,26,29,33,35,37,38,39,40,43,47,49,50,51,52,54,55,58,64,65,66,67,70,72,74,75,76,78,87,92,94,97,99,105,107,109,110,116,118,119,120],use_cpu:38,use_cudnn:38,use_eigen_bla:118,use_eigen_for_bla:[118,119],use_gpu:[52,87,92,99,104,105,107],use_mkl_pack:41,use_mkldnn:[38,42],use_old_updat:[12,104,105],use_sparse_remote_updat:12,used:[0,3,4,5,7,8,9,15,16,19,20,21,24,26,29,30,33,35,39,40,48,49,52,53,54,55,58,59,60,62,64,66,68,70,74,75,77,97,104,105,107,109,110,116,118,119,120],useful:[5,29,39,40,58,64,74,107,116,118],usegpu:74,user:[1,4,5,6,7,8,11,13,16,17,18,21,22,23,24,25,26,28,32,33,34,35,37,38,39,43,47,49,50,51,53,55,58,59,60,61,62,64,66,68,70,72,76,78,92,97,104,105,109,118,121],user_nam:11,usercert:11,userkei:11,usernam:11,uses:[0,3,5,9,16,18,20,21,29,39,40,47,48,49,54,55,68,71,72,74,77,78,94,97,105,116,118],using:[0,1,2,4,5,6,7,8,9,13,14,16,17,20,21,26,28,29,30,32,33,35,37,40,48,50,53,56,58,59,61,62,64,66,67,68,72,74,75,76,77,78,87,92,97,99,100,103,105,107,110,116,118,120],usr:[0,1,92,97,105],usual:[6,17,40,47,55,56,62,68,72,75,97,105,107,109,110],util:[21,41,42,51,74,75,76,110,116,121],uuid:[10,16],v7a:118,v8a:118,val:6,valid:[59,60,64,75,97,119],valu:[5,6,7,9,18,20,24,25,35,36,40,42,44,48,49,50,53,54,56,60,64,65,66,69,70,74,75,76,93,97,105,107,116,118,119],value1:105,value2:105,value_:65,valueerror:35,values_:70,vanilla:[52,116],var_nam:[6,39],var_recurs:24,vardesc:[7,34,56,58,60,65,71],vardescbuild:7,vari:[97,110],variabl:[4,5,7,18,19,20,21,23,24,25,26,31,33,34,35,36,37,39,47,48,49,50,52,53,56,57,61,62,65,66,67,70,74,75,76,92,96,97,99,109,111,118,119],variablenamemap:75,varialbl:33,variant:[37,47,68,70,76],varibal:6,varibl:35,varienc:70,varient:70,variou:[7,20,29,40,62,118],varproto:66,vars_:[7,64],vartyp:65,vartypeinfer:37,vec2seq:67,veclib:119,vector:[4,7,12,14,19,35,36,43,48,49,55,58,60,61,65,67,69,70,74,77,116],vectorenable_parallel_vector:104,vendor:8,verbos:[28,72],veri:[8,13,18,23,26,30,32,33,40,43,49,54,59,62,64,67,68,70,95,109,110,116],verifi:[7,74,119],version:[0,1,6,8,17,21,24,28,31,33,35,36,44,49,63,67,74,87,93,97,99,103,104,105,109,110,118,119,120],versu:4,via:[2,6,9,47,72,93,97,110,119,121],view:[44,47],vim:1,viriabl:92,virtual:[0,19,24,37,38,61,68,76],virtualenv:0,visibl:[16,64],visit:6,visual:[49,110],vlog:[12,72],vocabulari:67,voila:87,volum:[78,99],volumemount:[97,99],volumn:97,w1_grad:52,w2_grad:52,wai:[0,4,6,14,16,20,26,30,38,40,49,53,58,59,62,70,72,74,107,116],wait:[9,14,20,22,76,92,93,94,105],walk:119,wangkuiyi:8,want:[0,1,4,17,18,20,25,33,38,39,47,53,55,57,59,62,64,68,69,70,72,74,78,92,105,107,109,118,120],warn:28,warp:110,warp_ctc:67,wast:51,watch:9,wbia:97,web:[78,109],websit:78,weight:[41,44,62,74,105,107,116],weightlist:74,weights_:74,weights_primitive_desc:43,weights_t:74,welcom:[8,67,72],well:[6,17,20,21,23,30,32,33,62,65,67,74,93,97,105],wer:67,were:[8,20,30],west:97,wget:118,wgt:43,what:[0,8,19,30,33,39,49,58,66,75,109,121],whatev:[0,92],wheel:3,when:[0,2,6,7,8,9,12,13,14,17,18,21,23,24,25,26,28,29,30,31,35,49,51,53,54,55,56,58,60,68,70,72,74,75,77,78,92,94,96,97,99,105,107,109,110,116,118,119,121],whenev:[58,67,72],where:[4,6,7,9,16,18,21,30,31,34,47,48,49,53,56,60,62,68,70,74,75,105,107,109,110,111,116],wherea:[7,13,32,36,68,69,71],whether:[0,1,5,6,7,19,26,55,59,65,70,71,74,75,105,119],which:[0,3,4,5,6,7,8,9,11,13,14,16,17,18,19,20,21,22,24,26,29,30,31,32,33,35,37,39,40,43,44,47,48,49,50,51,54,56,57,58,59,60,61,64,65,66,69,70,71,72,74,75,76,77,92,93,94,95,97,105,107,109,110,116,118,119,121],while_grad:40,while_loop:[49,70],while_op:6,whileloop:70,whileop:7,white:67,whl:0,who:[6,32,34,51,58,72],whoever:14,whole:[6,19,33,36,40,45,46,48,51,57,66,67,72,94,97,99,121],wholli:19,whose:[5,6,9,16,24,48,60,61,66,70,116],why:[0,5,46],wide:[3,8,24,33,95,100],width:[12,45,59,74,75],wiki:8,window:[0,1,53,67,118],wirt:35,wise:[23,60,67,77],wish:[0,3,78,92,93],with_avx:[0,1,103,118,119],with_bia:66,with_c_api:[0,118,119,120],with_distribut:93,with_doc:0,with_doubl:[0,74,103],with_dso:0,with_golang:[0,118],with_gpu:[0,103,118,119],with_mkl:[0,41,42,118],with_mkldnn:42,with_mklml:42,with_profil:110,with_python:[0,103,118,119],with_rdma:[103,118,119],with_style_check:[0,72],with_swig_pi:[0,118,119],with_test:[0,75],with_tim:[103,110],within:[13,21,30,67,119],without:[6,9,14,19,20,55,58,59,60,67,75,92,100,109],wloop:70,wmt14:116,won:[94,110],word2vec:[17,92],word:[6,23,34,37,40,48,49,60,66,67,70,76,92,107,116],word_dict:[92,100],word_vector_dim:[49,116],wordcount:67,work:[0,1,4,7,8,9,21,26,29,30,38,50,53,55,58,72,74,78,93,94,97,99,105,109,110,116,121],worker:[23,71,97],workercount:97,workflow:[60,89,97],workspac:[72,92,95,105],world:92,worri:5,worth:111,would:[0,1,7,8,9,16,20,21,22,23,30,32,33,34,43,50,53,54,58,59,65,67,70,72,93,97,109,118,121],wouldn:[30,34],wrap:[19,30,32,33,51,121],wrapper:[8,19,20,32,51,53,61,70,110],write:[4,9,16,18,19,20,21,23,29,30,31,32,35,37,43,50,52,53,58,59,60,61,68,70,92,96,97],write_lock:10,write_output:52,write_to_arrai:40,writer:[4,58],written:[0,1,6,7,18,23,26,33,44,53,60,61,65,75,77,95,109],wrong:59,wrote:35,wsize:97,x64:[118,120],x86:119,x86_64:[118,119],x_neg:5,x_po:5,xarg:[1,74,100],xcode:119,xcodebuild:119,xeon:76,xgbe0:[92,105],xgbe1:[92,105],xpu:30,xrang:[5,30,33,55,59,74,87],xx_layer:38,xxx:[4,70],xxxx:10,xxxxxxxxx:97,xxxxxxxxxx:97,xxxxxxxxxxxxx:97,xxxxxxxxxxxxxxxxxxx:97,y_dim:33,y_neg:5,y_po:5,y_predict:[87,93,111],yaml:[8,95,97,100,121],yancey1989:17,yapf:72,year:30,yep:[55,109],yet:[30,67,121],yield:[4,11,19,59],you:[0,1,2,3,5,17,21,29,64,72,74,75,76,78,87,92,93,95,97,100,103,105,107,109,110,116,118,119,120,121],your:[0,2,3,4,8,12,17,28,60,72,74,78,92,93,95,97,103,107,110,118,119,120,121],your_access_key_id:97,your_secrete_access_kei:97,your_source_root:46,yourself:0,yuang:30,yuyang:109,z_dim:33,z_size:33,zaist:0,zero:[5,6,9,20,33,49,54,58,65,74,97,105],zip:[58,118],zone:97,zxvf:97},titles:["Build from Sources","Run in Docker Containers","Install and Build","Install using pip","PaddlePaddle Design Doc","Auto Gradient Check Design","Backward Building","Design Doc: Block and Scope","Required CMake Function","Design Doc: Distributed Training","\u6a21\u578b\u53c2\u6570\u68c0\u67e5\u70b9\uff08Checkpointing\uff09","\u8bad\u7ec3\u6570\u636e\u7684\u5b58\u50a8\u548c\u5206\u53d1","Alalysis of large model distributed training in Paddle","Design Doc: Master Server","Design Doc: The Client Library of Parameter Server","Design Doc: Remote Parameter Updater for Cluster Train","Design Doc: Save Model","Submit a Distributed Training Job","Design Doc: Concurrent Programming with Fluid","C++ Data Feeding","Design Doc: CSP in PaddlePaddle Fluid","Design Doc: Distributed Training Architecture","Design Doc: Execute the Program with Multi CPU","Design Doc: Parameter Server","Error Clip","Evaluator Design","Executor Design Doc","FileManager\u8bbe\u8ba1\u6587\u6863","PFSClient","Design Doc: float16","Design Doc: PaddlePaddle Fluid","PaddlePaddle Fluid: Towards a Compiled Programming Language","Design Doc: Functions, Operators, and Layers","Design for GAN","Design Doc: Computations as a Graph","Survey on Graph","The IfElse Operator","Design Doc: InferVarType","Problem","Background","Memory Optimization","Intel\u00ae MKL Packed on PaddlePaddle: Design Doc","Intel\u00ae MKL-DNN on PaddlePaddle: Design Doc","Design Doc: Add MKLDNN Kernel in Fluid Operator","Design Doc: Model Format","Paddle\u591a\u8bed\u8a00\u63a5\u53e3\u5b9e\u73b0","C-API \u6a21\u578b\u63a8\u65ad\u5b9e\u73b0\u6587\u6863","Design Doc: The Keys of Operator Kernel Type","RNNOp design","Design: Sequence Decoder Generating LoDTensors","Optimizer Design","Design Doc: NCCL support in Paddle Fluid","Design Doc: Parallel_Do in PaddlePaddle","Averaging Parameter in PaddlePaddle","Design Doc: The C++ Class Parameters","Introduction","Design Doc: PaddlePaddle Programs","Prune","Design Doc: Python API","Python Data Reader Design Doc","Design Doc: Refactorization Overview","Design Doc: Gradient Operators Registration","Regularization in PaddlePaddle","PaddlePaddle\u53d1\u884c\u89c4\u8303","Design of Scope in Paddle","Design Doc: Selected Rows","Interaction between C++ and Python","DeepSpeech2 on PaddlePaddle: Design Doc","Design Doc: Supporting new Device/Library","Design Doc: Switch","Design for TensorArray","Background","Contribute Code","Development","Write New Layers","How to write a new operator","Add Kernels for a New Device","How to use Eigen in Paddle","Contribute Documentation","Install, Build and Unit test","Cluster Training and Prediction","FAQ","Local Training and Prediction","Model Configuration","Parameter Setting","Basic Concept","GET STARTED","Quick Start","Install and Build","C-API Prediction Library","Input/Output Data Organization","C-API Workflow","Command-line arguments","Fluid Distributed Training","Distributed Training","Fabric","Use different clusters","Kubernetes on AWS","Kubernetes Distributed","Kubernetes","OpenMPI","<no title>","<no title>","Preparations","Argument Outline","Detail Description","Set Command-line Parameters","Use Case","HOW TO","Profiling the Python Code","Tune GPU Performance","PaddlePaddle Fluid Source Code Overview","Layers supporting hierarchical sequence as input","API comparision between RNN and hierarchical RNN","RNN Models","Recurrent Group Tutorial","RNN Configuration","PaddlePaddle Documentation","Build PaddlePaddle for Android","Build PaddlePaddle for iOS","Build PaddlePaddle for Raspberry Pi","Cluster bootstrapping tool survey"],titleterms:{"\u4e0a\u4f20\u8bad\u7ec3\u6587\u4ef6":11,"\u4e0d\u4f7f\u7528":45,"\u4e0d\u4f7f\u7528swig\u8fd9\u79cd\u4ee3\u7801\u751f\u6210\u5668":45,"\u4e0d\u5bfc\u51fapaddle\u5185\u90e8\u7684\u7ed3\u6784\u4f53":45,"\u4e0d\u5f15\u7528\u5176\u4ed6\u52a8\u6001\u5e93":45,"\u4ec5\u4ec5\u4f7f\u7528void":45,"\u4ece\u5feb\u7167\u6062\u590d":10,"\u4f7f\u7528\u52a8\u6001\u5e93\u6765\u5206\u53d1paddl":45,"\u4f7f\u7528\u8f6c\u6362\u5e93":11,"\u5177\u4f53\u67d0\u79cd\u7c7b\u578b\u7684\u5934\u6587\u4ef6":46,"\u5177\u4f53\u67d0\u79cd\u7c7b\u578b\u7684\u5b9e\u73b0\u6587\u4ef6":46,"\u5206\u5757\u6587\u4ef6\u4f20\u8f93":27,"\u5206\u652f\u89c4\u8303":63,"\u52a0\u901f\u6267\u884c":10,"\u52a8\u6001\u5e93\u4e2d\u4e0d\u5d4c\u5165\u4efb\u4f55\u5176\u4ed6\u8bed\u8a00\u7684\u89e3\u91ca\u5668":45,"\u52a8\u6001\u6269\u5bb9":10,"\u539f\u56e0":45,"\u539f\u56e0\u5217\u8868":45,"\u53c2\u8003\u6587\u6863":27,"\u53d1\u5e03docker\u955c\u50cf":63,"\u53d1\u5e03wheel\u5305\u5230pypi":63,"\u540d\u8bcd\u89e3\u91ca":27,"\u57fa\u672c\u8981\u6c42":45,"\u5b9e\u73b0":45,"\u5b9e\u73b0\u65b9\u5f0f":46,"\u5bfc\u51fac":45,"\u5feb\u7167\u4fdd\u5b58\u7684\u8bbe\u8ba1\u5982\u4e0b":10,"\u6307\u9488\u4f5c\u4e3a\u7c7b\u578b\u7684\u53e5\u67c4":45,"\u63a8\u6d4b\u6267\u884c":10,"\u652f\u6301\u7528\u6237\u81ea\u5b9a\u4e49\u7684\u6570\u636e\u9884\u5904\u7406job":11,"\u6587\u4ef6\u4f20\u8f93\u4f18\u5316":27,"\u6587\u4ef6\u8bbf\u95ee\u65b9\u5f0f":11,"\u6587\u4ef6\u8bbf\u95ee\u7684\u6743\u9650":11,"\u6587\u4ef6\u9884\u5904\u7406":11,"\u66b4\u9732\u63a5\u53e3\u539f\u5219":46,"\u672f\u8bed":10,"\u67b6\u6784\u56fe":27,"\u6846\u67b6\u751f\u6210":27,"\u6982\u5ff5\u89e3\u91ca":11,"\u6a21\u5757":27,"\u6a21\u578b\u53c2\u6570\u68c0\u67e5\u70b9":10,"\u6a21\u578b\u63a8\u65ad\u5b9e\u73b0\u6587\u6863":46,"\u6d41\u7a0b\u4ecb\u7ecd":11,"\u751f\u6210sparse\u6587\u4ef6":27,"\u7528\u6237\u4f7f\u7528\u6d41\u7a0b":27,"\u76ee\u5f55\u7ed3\u6784":46,"\u76ee\u6807":27,"\u793a\u4f8b\u7a0b\u5e8f":11,"\u7b26\u53f7":45,"\u7c7b":45,"\u7f16\u8bd1\u9009\u9879":46,"\u7f29\u5bb9":10,"\u800c\u662f\u624b\u5199\u591a\u8bed\u8a00\u7ed1\u5b9a":45,"\u80cc\u666f":45,"\u8986\u76d6\u4e0d\u4e00\u81f4\u7684\u90e8\u5206":27,"\u8bad\u7ec3\u6570\u636e\u5b58\u50a8":11,"\u8bad\u7ec3\u6570\u636e\u7684\u5b58\u50a8\u548c\u5206\u53d1":11,"\u8f6c\u6362\u5e93":11,"\u8fd9\u4e2a\u52a8\u6001\u5e93\u4f7f\u7528c99\u6807\u51c6\u7684\u5934\u6587\u4ef6\u5bfc\u51fa\u4e00\u4e9b\u51fd\u6570":45,"\u8fdb\u884c\u8bad\u7ec3":11,"abstract":[21,22,23,51,121],"book\u4e2d\u6240\u6709\u7ae0\u8282":63,"case":[6,107],"class":[33,54,58,74],"filemanager\u8bbe\u8ba1\u6587\u6863":27,"final":38,"function":[8,32,33,58],"new":[68,74,75,76],"paddle\u52a8\u6001\u5e93\u4e2d":45,"paddle\u591a\u8bed\u8a00\u63a5\u53e3\u5b9e\u73b0":45,"paddlepaddle\u53d1\u884c\u89c4\u8303":63,"paddlepaddle\u56de\u5f52\u6d4b\u8bd5\u5217\u8868":63,"return":[58,59],"switch":[43,68,69],"tensor\u5230eigentensor\u7684\u8f6c\u6362":77,AWS:97,DNS:97,E2E:93,EFS:97,For:[8,99],KMS:97,The:[7,14,18,26,30,33,34,36,37,47,50,54,60,61,69,120],Use:[7,56,78,87,96,99,107],Using:[8,14,118],With:17,about:33,absolut:49,access:97,account:97,action:[41,42],activ:42,actor:20,add:[40,43,76,97],address:97,advanc:68,alalysi:12,algorithm:[5,9,21,48,57],all:[64,70],analog:18,analysi:[21,40],android:118,api:[21,41,42,46,50,53,58,62,66,89,91,113],appendix:[0,121],approach:110,arbitrari:30,architectur:[21,55,116],argument:[28,59,92,104,107,118,120],arrai:5,ask:0,asset:97,associ:[64,97],assumpt:121,async:[92,105],attent:116,attribut:[40,62],auto:5,averag:53,aws:97,background:[5,23,39,41,68,69,70,71,75,76],backward:[6,30,34,60,75],base:[17,49],basic:[40,68,85,121],batch:59,batch_siz:59,beam:[49,67],benchmark:[41,42],benefit:[23,60],between:[4,20,58,60,66,68,113],binari:7,bind:75,bla:0,block:[7,31,33,34,56,58,60],blockdesc:56,book:1,bool:0,bootstrap:121,bottleneck:109,bring:121,bucket:97,build:[0,2,6,33,60,78,79,88,99,118,119,120],built:110,can:64,capi:46,capi_priv:46,challeng:[6,23,57],chang:49,channel:20,check:[5,74,95],checkpoint:[9,10,16],choic:38,choos:[8,97],client:14,clip:24,close:5,cloudform:97,cluster:[15,80,93,95,96,97,100,107,121],cmake:[8,41,42,120],code:[17,31,58,72,109,111],command:[92,106,107],commit:99,common:105,commun:105,compar:121,comparis:[58,113],compat:30,compil:[0,7,29,31,56,60,75,111,118,119,120],complet:30,compos:59,comput:[7,34,43,60,62,77],con:121,concept:[58,60,85,97],concern:42,conclus:[16,35,121],concurr:[18,20],condit:33,config:107,configur:[83,97,116,119],construct:34,contain:[1,99],content:[41,42,46,67,97,110],context:76,contribut:[72,78],control:[40,60],contruct:40,convert:16,convolut:67,copi:52,core:[5,58,97],corner:6,cpu:[22,107],creat:[6,20,59,60,64,97,99],createreaderop:19,creation:[13,53,62],creator:59,credenti:97,cross:[118,119,120],csp:20,ctc:67,cuda:[0,29],cudnn:0,current:[29,61],custom:59,data:[9,19,21,39,59,90,97,99],dataflow:40,dataprovid:105,dataset:[9,13,92],datatyp:47,decod:49,decor:59,decoratedread:19,deep:[7,30],deepspeech2:67,defin:[75,97],definit:71,delet:97,demo:[33,93,97],dens:16,dep:3,depend:[0,3,33,67],deploi:17,deriv:74,describ:[30,50],descript:[28,60,105],design:[4,5,7,9,13,14,15,16,18,20,21,22,23,25,26,29,30,32,33,34,37,41,42,43,44,47,48,49,50,51,52,54,56,58,59,60,61,64,65,67,68,69,70],destroi:[64,97],detail:[12,67,105],develop:[60,73],devic:[52,68,76,107],devicecontext:68,dictionari:59,differ:[52,60,68,96,107],directori:97,discrimin:33,discuss:[23,33],dispatch:[9,13],distribut:[4,9,12,17,21,23,93,94,97,98,105],dnn:42,doc:[4,7,9,13,14,15,16,18,20,21,22,23,26,29,30,32,34,37,41,42,43,44,47,51,52,54,56,58,59,60,61,65,67,68,69],docker:[1,17,99,118],document:[78,117],doe:59,down:97,download:[97,99],dure:[49,59],dylib:46,dynam:[9,70],dynet:35,each:3,ec2:97,eigen:77,elast:97,elect:16,els:7,engin:33,enough:5,entri:59,environ:[17,118],equat:74,error:24,evalu:25,event:[4,55],evolut:30,examin:109,exampl:[4,8,18,20,36,46],execut:[7,22,30,56,60],executor:26,explan:5,extern:97,fabric:95,faq:[2,3,81],fault:9,feed:19,file:[7,97,99,109],fileread:19,find:97,float16:29,flow:40,fluid:[18,20,30,31,43,51,93,111],format:[7,9,44],forward:[34,52,75],frame:7,framework:[5,76,77],frequent:0,from:[0,2,4,16,66],functor:68,futur:[30,67],gan:33,gate:116,gener:[31,33,49,109,116,121],get:[86,93,99],give:59,global:[56,58],gpu:[1,105,107,110],grad_op:6,gradient:[5,6,14,42,61,74],graph:[34,35,40,60,62],group:[97,115],gru:105,hand:110,handler:[4,45],happen:16,hardwar:29,have:93,helper:58,hierarch:[112,113],hierarchi:7,high:[50,53,62,66],how:[0,5,12,53,59,60,68,75,77,78,108,110],iOS:119,iam:97,identifi:109,ifels:36,ifelseop:7,imag:[1,17,99,118],implement:[5,6,8,12,22,24,25,29,44,48,51,53,58,59,60,61,62,74,75,76,77],imporv:52,infer:118,infershap:[56,65],infervartyp:37,ingredi:4,ingress:27,initi:[14,33,97,107],input:[52,90,112],insid:64,inspect:97,instal:[2,3,79,87,88,93,97,118,119,120,121],instanc:97,instead:59,integr:[68,97],intel:[41,42],interact:66,interfac:[5,9,14,15,26,50,59,64],intermedi:60,introduc:[49,70,93],introduct:[55,62,93],isn:59,issu:29,job:[9,17,95,97,99,100],kei:[41,47,97],kernel:[43,47,60,76],kill:95,kube:97,kubectl:97,kubernet:[17,97,98,99],languag:[7,31],larg:12,launch:[1,95,100],layer:[4,32,41,42,58,74,107,112],layout:47,learn:[7,30],leval:66,level:[50,53,62,66],libpaddle_capi_shar:46,libpaddle_capi_whol:46,librari:[14,29,47,60,68,89,118],limit:21,line:[92,106],linux:[95,118],list:[10,59],live:40,load:20,local:[21,64,82,97,107],lod:49,lodtensor:[48,49,70],lodtensordesc:71,log:72,logic:13,look:109,low:[53,62,66],lstm:105,machin:49,macro:60,main:33,make:40,manag:8,map:[59,60],master:[9,13,17,18],math:68,mathemat:5,matrix:[42,105],member:33,memori:[40,48,68],messag:66,method:49,might:33,migrat:60,mileston:60,mini:59,minibatch:20,mix:107,mkl:[41,42],mkldnn:43,mkldnn_helper:43,mkldnndevicecontext:43,model:[4,12,14,16,20,30,33,44,49,83,95,107,114,116],modifi:99,modul:[60,68,77],more:33,motiv:[6,20,26,44,51,57],multi:[22,31],multipl:59,mxnet:35,name:[64,97],nativ:31,nccl:51,necess:58,necessari:60,need:[59,110],nest:48,network:[60,107,116],neural:116,nlp:105,non:93,norm:62,note:5,numer:5,numpi:5,nvprof:110,nvvp:110,object:9,offset:49,onli:[59,64],onto:52,op_mak:60,openmpi:100,oper:[19,32,36,40,43,47,53,56,58,60,61,65,70,75],opinfomap:60,opkernel:[60,68,75,76],opproto:66,ops:62,optim:[9,14,34,40,50,58],option:[0,28],opwithkernel:60,order:28,org:78,organ:90,origin:60,orthogon:64,other:42,outlin:104,output:[90,95,97],overview:[16,24,26,41,42,52,60,64,67,111],pack:[41,49],packag:[3,8],paddl:[12,51,59,64,77],paddlejob:17,paddlepaddl:[0,1,3,4,7,20,30,31,41,42,52,53,56,62,63,67,78,93,97,99,111,117,118,119,120],pair:97,paradigm:30,parallel_do:52,parallel_nn:107,paramet:[4,9,14,15,17,20,23,42,52,53,54,58,62,84,92,93,97,105,106],parameteraverageoptim:53,parent:64,part:34,partit:14,pass:[0,107],path:[16,28],penalti:62,perform:[52,53,105,109,110],persist:13,pfsclient:[27,28],pfsserver:27,pip:3,place:[40,47,68,76],placement:21,point:[41,97],polici:40,pose:[37,61],potenti:38,predict:[80,82,89],prefetch:59,prepar:[92,93,95,97,100,103,119],principl:43,privat:97,pro:121,problem:[25,37,38,39,40,47,50,61],procedur:121,process:[9,14,17,50,60],profil:[109,110],program:[1,7,18,20,22,30,31,56,58,92,93],programdesc:[31,56],project:8,propos:[37,61,62],protobuf:65,protomak:75,provid:59,prune:57,pserver:16,pull:1,python:[5,17,21,41,42,48,50,53,58,59,62,66,71,74,75,109],qualiti:60,question:0,queue:[9,13],quick:87,randomnumb:105,raspberri:120,reader:[4,19,59],readerbas:19,readerhold:19,readi:93,readop:19,realiz:60,recoveri:9,recurr:[115,116],recv:20,refactor:60,refer:[5,21,23,40,41,42,67,110],region:97,regist:[37,60,66,75,76],registr:[60,61],registri:60,regular:[14,62],rel:49,relat:[19,60,70],remark:75,remot:15,remoteexecutor:21,render:97,represent:[7,60],requir:[0,8,33],result:[95,99],retri:13,reus:58,review:72,rnn:[48,70,105,113,114,116],rnnop:[7,48,60],route53:97,row:[65,67],rpc:20,run:[0,1,26,75,99,111],runtim:[3,17],save:16,scale:9,scope:[7,48,60,64],script:[93,99],search:[49,67],secur:97,select:[14,20,65],selectedrow:65,semant:69,send:20,separ:60,sequenc:[49,112,116],server:[9,13,14,17,20,23,92,93,97,105],servic:97,set:[84,106],setup:[97,118],sextant:121,sgd:[92,105],shape:49,share:[4,6,40,64],should:64,shuffl:59,simpl:[49,116],singl:59,slice:93,solut:[37,38,39,40,41,47,57,61],sourc:[0,2,111],spars:[14,15,16,65,107],specifi:107,split:52,stack:7,standard:72,start:[4,86,87,92,97,99],startup:99,statement:25,step:[2,48],storag:62,store:9,strategi:40,style:72,subcommond:28,submit:17,suffici:59,suitabl:8,sulut:43,summar:[4,18],summari:44,support:[29,51,68,70,112],survei:[29,35,62,121],synopsi:28,syntax:20,system:[30,97],tabl:[46,67],task:[9,13,67],tear:97,tecton:121,templat:97,tensor:[60,68,77],tensorarrai:[49,70],tensordesc:71,tensorflow:35,test:[0,41,42,43,72,74,75,79,105,107],theori:5,thi:64,think:33,three:70,time:111,timelin:16,timer:110,tip:110,todo:[10,11,22],togeth:64,toler:9,tool:[8,78,110,121],toolchain:120,topic:68,toward:31,train:[1,4,9,12,15,17,21,50,59,80,82,92,93,94,95,97,99,105,107],trainer:[9,14,16,17,20,92,97],transform:39,translat:49,transpil:[21,22,23,31,40,51],tune:[105,110],ture:30,tutori:115,two:5,type:[0,20,47,75],uniform:70,unit:[41,42,43,72,74,75,79,105],unpack:49,updat:[4,15,16,78,92,93,97],usag:[6,24,48,49,59,77],use:[12,59,77],user:9,using:3,valu:58,variabl:[6,40,58,60,64,71],vartyp:71,vector:105,verifi:97,version:[3,18,29],volum:97,vpc:97,what:[12,16,110],when:[16,64],whl:3,why:[29,30,53,59,60,70,110],work:67,worker:18,workflow:[72,91],wrapper:74,write:[72,74,75,76,78],www:78,yaml:99,your:[1,76]}}) \ No newline at end of file +Search.setIndex({docnames:["build_and_install/build_from_source_en","build_and_install/docker_install_en","build_and_install/index_en","build_and_install/pip_install_en","dev/contribute_to_paddle_en","dev/index_en","dev/new_layer_en","dev/write_docs_en","faq/build_and_install/index_en","faq/cluster/index_en","faq/index_en","faq/local/index_en","faq/model/index_en","faq/parameter/index_en","getstarted/concepts/use_concepts_en","getstarted/index_en","getstarted/quickstart_en","howto/capi/compile_paddle_lib_en","howto/capi/index_en","howto/capi/organization_of_the_inputs_en","howto/capi/workflow_of_capi_en","howto/cluster/cmd_argument_en","howto/cluster/index_en","howto/cluster/multi_cluster/fabric_en","howto/cluster/multi_cluster/index_en","howto/cluster/multi_cluster/k8s_aws_en","howto/cluster/multi_cluster/k8s_distributed_en","howto/cluster/multi_cluster/k8s_en","howto/cluster/multi_cluster/openmpi_en","howto/cluster/multi_cluster/src/k8s_data/README","howto/cluster/multi_cluster/src/k8s_train/README","howto/cluster/preparations_en","howto/cmd_parameter/arguments_en","howto/cmd_parameter/detail_introduction_en","howto/cmd_parameter/index_en","howto/cmd_parameter/use_case_en","howto/index_en","howto/optimization/gpu_profiling_en","howto/rnn/hierarchical_layer_en","howto/rnn/hrnn_rnn_api_compare_en","howto/rnn/index_en","howto/rnn/recurrent_group_en","howto/rnn/rnn_config_en","index_en"],envversion:50,filenames:["build_and_install/build_from_source_en.rst","build_and_install/docker_install_en.rst","build_and_install/index_en.rst","build_and_install/pip_install_en.rst","dev/contribute_to_paddle_en.md","dev/index_en.rst","dev/new_layer_en.rst","dev/write_docs_en.rst","faq/build_and_install/index_en.rst","faq/cluster/index_en.rst","faq/index_en.rst","faq/local/index_en.rst","faq/model/index_en.rst","faq/parameter/index_en.rst","getstarted/concepts/use_concepts_en.rst","getstarted/index_en.rst","getstarted/quickstart_en.rst","howto/capi/compile_paddle_lib_en.md","howto/capi/index_en.rst","howto/capi/organization_of_the_inputs_en.md","howto/capi/workflow_of_capi_en.md","howto/cluster/cmd_argument_en.md","howto/cluster/index_en.rst","howto/cluster/multi_cluster/fabric_en.md","howto/cluster/multi_cluster/index_en.rst","howto/cluster/multi_cluster/k8s_aws_en.md","howto/cluster/multi_cluster/k8s_distributed_en.md","howto/cluster/multi_cluster/k8s_en.md","howto/cluster/multi_cluster/openmpi_en.md","howto/cluster/multi_cluster/src/k8s_data/README.md","howto/cluster/multi_cluster/src/k8s_train/README.md","howto/cluster/preparations_en.md","howto/cmd_parameter/arguments_en.md","howto/cmd_parameter/detail_introduction_en.md","howto/cmd_parameter/index_en.rst","howto/cmd_parameter/use_case_en.md","howto/index_en.rst","howto/optimization/gpu_profiling_en.rst","howto/rnn/hierarchical_layer_en.rst","howto/rnn/hrnn_rnn_api_compare_en.rst","howto/rnn/index_en.rst","howto/rnn/recurrent_group_en.md","howto/rnn/rnn_config_en.rst","index_en.rst"],objects:{},objnames:{},objtypes:{},terms:{"00m":37,"03m":37,"0424m":37,"055ee37d":25,"0630u":37,"06u":37,"0810u":37,"0957m":37,"0_cudnn5":0,"0_cudnn5_avx_mkl":[1,3],"0_cudnn7_avx_mkl":3,"0rc":31,"100gb":37,"100gi":25,"10m":37,"1150u":37,"11e6":27,"124n":37,"13m":27,"1490u":37,"1550u":37,"16u":37,"173n":37,"1770u":37,"18ad":25,"18e457ce3d362ff5f3febf8e7f85ffec852f70f3b629add10aed84f930a68750":27,"197u":37,"1gb":37,"210u":37,"215n":37,"228u":37,"2520u":37,"2680u":37,"279n":37,"27m":37,"285m":37,"2863m":37,"28m":37,"2977m":37,"2cbf7385":25,"302n":37,"30u":37,"328n":37,"32u":37,"331n":37,"3320u":37,"365e":25,"36u":37,"3710m":37,"3768m":37,"387u":37,"38u":37,"3920u":37,"39u":37,"4035m":37,"4090u":37,"4096mb":33,"4279m":37,"43u":37,"448a5b355b84":27,"4560u":37,"4563m":37,"45u":37,"4650u":37,"4726m":37,"473m":27,"4gb":33,"50bd":25,"50gi":25,"514u":37,"525n":37,"526u":37,"536u":37,"5460u":37,"5470u":37,"54u":37,"5690m":37,"573u":37,"578n":37,"5798m":37,"586u":37,"58s":27,"5969m":37,"5_cudnn5_avx_mkl":3,"5_cudnn5_avx_openbla":[3,16],"6080u":37,"6140u":37,"6305m":37,"639u":37,"655u":37,"6780u":37,"6810u":37,"682u":37,"6970u":37,"6ce9":25,"704u":37,"7090u":37,"72u":37,"73u":37,"75u":37,"760u":37,"767u":37,"783n":37,"784u":37,"78m":37,"7kb":27,"8250u":37,"8300u":37,"830n":37,"849m":37,"861u":37,"8661m":37,"892m":37,"901n":37,"90u":37,"918u":37,"9247m":37,"924n":37,"9261m":37,"9330m":37,"94u":37,"9530m":37,"983m":37,"988u":37,"997u":37,"99u":37,"9f18":27,"abstract":[6,33],"case":[6,25,34,37,42],"class":32,"const":6,"default":[0,1,3,4,21,23,25,27,33,35],"export":[1,7,21],"final":6,"float":[6,35,37],"function":[6,21,33,37,42],"import":[3,16,21,25,37,42],"int":[6,21,35],"long":37,"new":[0,4,5,16,25,27],"null":[6,33],"public":[6,21,25,27],"return":[6,25,42],"static":25,"super":6,"switch":25,"throw":25,"true":[6,21,25,33,35,42],"try":[0,1,3,37],"var":7,"void":6,"while":[33,42],AGE:[25,27],AWS:[24,29,30],And:[25,35,42],But:[3,4,22],For:[0,1,4,6,16,21,32,33,35,37,42],IDE:0,IPs:21,Into:25,Its:[25,42],Not:[0,3],One:[6,33,42],PRs:7,QoS:27,TLS:25,That:[1,33,35],The:[0,1,3,4,6,7,22,23,24,25,27,31,33,35,37,42],Then:[1,3,6,22,25,27,28,37,42],There:[25,37],These:[0,24,35],Use:[0,6,21,22,25,33,34,37],Using:[0,27],VMs:0,VPS:25,Yes:[0,1],___fc_layer_0__:25,__init__:6,__rnn_step__:42,_source_language_embed:42,_target_language_embed:42,aaaaaaaaaaaaa:25,about:[1,3,16,25,32,33,37],abov:[0,1,2,4,16,25,27,37],absolut:1,acceler:[1,35],access:[3,21,42],accessmod:25,accord:[21,32,33,35],accordingli:6,account:4,accuraci:6,achiev:37,ack:33,act:[16,42],action:25,activ:[6,7,16,33,42],adagrad:21,add:[0,4,6,35,37],add_input:6,add_test:6,add_unittest_without_exec:6,addbia:6,added:[4,6],adding:4,address:[1,23,28,31,33,37],addrow:6,administr:0,advanc:[33,37,42],advantag:31,advic:37,after:[0,1,4,6,23,25,27,31,33,35,42],again:37,against:25,aggreg:25,agre:4,agreement:4,aid:37,alexnet_pass1:35,alexnet_pass2:35,algorithm:42,all:[0,1,6,21,22,23,25,27,32,33,35,37,42],alloc:[6,35],allow:[4,6,25,33,37],allow_only_one_model_on_one_gpu:[32,33,35],almost:[0,23],alreadi:[1,23,25,33,37],alreali:32,also:[0,1,3,6,7,22,27,31,37,42],alwai:[25,33],amazon:[25,27],amazonaw:25,amazonec2fullaccess:25,amazonelasticfilesystemfullaccess:25,amazonroute53domainsfullaccess:25,amazonroute53fullaccess:25,amazons3fullaccess:25,amazonvpcfullaccess:25,amd64:25,amend:4,among:25,amount:37,analysi:37,andd:25,ani:[0,25,28,37],annoi:23,anoth:[0,25,33],ans:25,answer:[4,25],anyth:25,api:[0,3,6,21,24,25,36,37,40],api_trainer_config_helpers_lay:42,apiserv:25,apivers:[25,27],append:[4,6,21,42],appleyard:37,appli:[6,42],applic:[4,24,25,27,37],appreci:4,apt:1,architectur:[0,22],arg:32,argument:[0,4,6,22,33,34,42],arn:25,around:25,articl:[0,27],artifact:[3,25],assign:[21,25,31,33],assum:[1,35,42],async:32,async_count:33,async_lagged_grad_discard_ratio:[21,33],async_lagged_ratio_default:[32,33],async_lagged_ratio_min:[32,33],asynchron:[22,33],attr:42,attribut:6,auc:32,aucvalidationlay:33,authent:25,author:25,auto:[0,4,6,37],autom:[24,25],automat:[0,4,6,23,25,32,33,42],avail:[3,25],averag:33,average_test_period:[32,33],avg:37,avoid:[0,37],avx2:0,avx:[0,1],await:27,awar:25,awk:28,aws_account_id:25,awsaccountid:25,awskeymanagementservicepowerus:25,b363:27,b8561f5c79193550d64fa47418a9e67ebdd71546186e840f88de5026b8097465:27,ba5f:25,back:1,background:21,backward:[6,33,35,42],backward_first:42,backwardactiv:6,baidu:27,balanc:[25,33],bare:27,barrier:[22,33],barrierstatset:37,base:[6,25,33,37,42],basematrix:6,bash:[0,1,25,27,31],basic:[6,15],batch:[4,6,25,27,33],batchsiz:6,beam:[33,42],beam_gen:42,beam_search:42,beam_siz:[32,33,35,42],becaus:[0,6,25,35,42],becom:37,been:4,befor:[0,1,4,25],begin:[6,21],beginn:42,behavior:37,below:[0,3,6,21,22,25,31,37,42],besid:3,best:[0,1,33],besteffort:27,better:25,between:[1,25,40],bia:[6,42],bias:6,bias_attr:42,biases_:6,biasparameter_:6,biassiz:6,bidi:27,bidirect:42,big:37,bilinearfwdbwd:37,bin:[0,1,21,25,27,31],binari:[0,2,3,25,37],bla:1,blank:25,blob:0,block:[6,33,37],book:[7,42],bool:[6,33,35],boot:42,boot_lay:42,bos_id:42,both:[6,7,22,25,37,42],bottleneck:37,box:37,branch:[3,4,7],breadth:33,briefli:37,broken:4,browser:[1,7,25],bsd:0,bucket_nam:25,buddy_alloc:4,buffer:33,bug:[4,25],build:[1,3,4,10,16,18,25,29,30,31,33,43],builder:4,built:[0,3],bunch:[31,37],button:[4,7,25],c703c041:4,c99e:25,cach:0,cacul:21,calcul:[1,6,22,33,35,37,42],call:[0,6,16,25,31,33,37,42],callback:6,caller:25,can:[0,1,2,3,4,6,7,16,21,22,23,24,25,27,28,31,32,33,35,37,42],cannot:6,capac:25,capi:0,captur:23,card:31,care:[1,32,33],carefulli:33,cat:[1,28],categoryfil:27,caus:3,caution:[25,27],cento:[0,3,16],ceph:27,certain:32,certif:25,cfg:27,chain:6,chanc:6,chang:[0,4,6,21,25,33,37,42],channel:37,characterist:35,check:[0,3,4,7,25,31,33,35],check_eq:6,check_l:6,check_sparse_distribution_batch:[32,33],check_sparse_distribution_in_pserv:[32,33],check_sparse_distribution_ratio:[32,33],check_sparse_distribution_unbalance_degre:[32,33],check_styl:4,checkgrad:33,checkgrad_ep:33,checkout:4,china:1,chines:7,chmod:25,choic:1,choos:[0,1,2,33],claim:25,claimnam:25,clang:4,classif:35,claster:25,clean:[0,4],cli:25,click:[3,4,7,25,37],clip:33,clone:[0,4,7],close:4,cludform:25,cluster:[10,21,22,27,31,32,33],cluster_test_fil:21,cluster_train:23,cluster_train_fil:21,cluster_train_v2:[23,24,28],cm469:25,cmake:[0,4,6,7,37],cmakelist:6,cmd:27,cname:25,cnn:27,code:[0,1,3,5,6,7,16,21,25,27,37,42],codebas:4,collectbia:6,column:6,com:[0,1,4,7,25,27],comma:33,command:[0,1,3,4,6,7,16,22,23,25,27,28,29,30,31,36,37],commandlin:37,comment:4,commit:[4,21],common:[6,32,42],commonli:[35,37,42],commun:[4,6,21,25],compar:[0,6],comparis:40,compil:[6,7,31],complet:[2,6,25,27],complex:[37,42],compon:6,compromis:0,comput:[0,1,4,6,24,25,31,35,37,42],computation:42,concat:42,concept:[15,42],concurr:22,concurrentremoteparameterupdat:33,condit:[27,42],conduct:37,conf:23,conf_paddle_gradient_num:25,conf_paddle_n:25,conf_paddle_port:25,conf_paddle_ports_num:25,conf_paddle_ports_num_spars:25,config:[6,25,27,32,33],config_:33,config_arg:[32,33,35],config_lay:6,config_pars:6,configur:[0,4,6,10,16,21,33,37,40],conflict:4,congest:33,connect:[6,21,25,27,31],consid:[0,35,37],consol:[25,37],consolid:7,constant:6,construct:42,constructor:6,contain:[0,2,3,21,25,28,31,42],container:24,containerport:25,content:[7,27],content_dir:7,context:42,contin:25,continu:[22,33],contribut:5,contributor:4,control:[21,25,27,33],convent:4,converg:23,cool:4,copi:[0,4,21,25,28],core:[0,33],coreo:25,correct:[6,25],correctli:6,correspond:6,cost:33,could:[0,23,25,37],count:[21,27,33,35,37],cours:0,cp27:3,cp27m:3,cp27mu:3,cpp:[6,37],cpu:[0,1,27,33,37],cpu_avx_mkl:[1,3],cpu_avx_openbla:[3,16],cpu_noavx_openbla:3,cpuinfo:1,crash:[23,33,37],creat:[1,4,6,7,16,21,28,33],create_bias_paramet:6,create_input_paramet:6,createstack:25,creation:25,creationd:25,crlf:4,csc:6,csr:6,ctest:0,ctrl:[0,23],cuda7:[3,16],cuda8:[0,1,3],cuda:[1,3,31,33,37],cuda_dir:[32,33],cuda_so:1,cudaconfigurecal:37,cudadevicegetattribut:37,cudaeventcr:37,cudaeventcreatewithflag:37,cudafre:37,cudagetdevic:37,cudagetdevicecount:37,cudagetdeviceproperti:37,cudagetlasterror:37,cudahostalloc:37,cudalaunch:37,cudamalloc:37,cudamemcpi:37,cudaprofilerstart:37,cudaprofilerstop:37,cudaruntimegetvers:37,cudasetdevic:37,cudasetupargu:37,cudastreamcr:37,cudastreamcreatewithflag:37,cudastreamsynchron:37,cudeviceget:37,cudevicegetattribut:37,cudevicegetcount:37,cudevicegetnam:37,cudevicetotalmem:37,cudnn:33,cudnn_conv_workspace_limit_in_mb:[32,33],cudnn_dir:[32,33],cudnnv5:0,cudrivergetvers:37,cuinit:37,curl:25,current:[0,1,3,6,7,21,23,25,33,42],current_word:42,custom:[6,25],cxxabi_1:3,d3e0:25,daili:4,dangl:0,darwin:25,data:[6,16,18,21,22,28,29,32,33,35,37,42],data_typ:[16,42],dataset:[16,22,33,42],date:21,dcuda_arch_nam:0,dcudnn_root:0,debug:[1,4],decod:42,decoder_boot:42,decoder_group_nam:42,decoder_input:42,decoder_mem:42,decoder_s:42,decoder_st:42,decor:6,decrypt:25,deep:[1,37],deeper:1,def:[6,42],defalut:[33,35],default_devic:35,default_valu:35,defin:[6,21,23,33,42],definit:1,delai:33,delet:4,deletestack:25,demo:[23,27,29],demolish:27,demonstr:42,denot:35,dens:[6,25],dense_vector:16,depend:[1,21,22,35],deploi:[23,35],deploy:[23,24,25],descent:22,describ:[6,25,27],describestack:25,describestackev:25,describestackresourc:25,descript:[0,3,4,21,25,34],desir:[25,27],destructor:6,detail:[0,6,7,16,23,25,27,34,35,37,42],detect:[0,4],determin:6,dev:[0,1],develop:[0,1,3,4,7,21,32,33,43],deverlop:33,devic:[1,21,33],deviceid:35,devid:33,devtools2:0,diagnos:23,diagram:22,dict:[21,28],dictionari:35,did:1,differ:[1,6,21,22,23,25,27,33,42],difficult:0,dig:[1,25,37],digit:21,dim:6,dimens:[6,35],dimension:[6,42],directli:[0,2,23,27],directori:[0,1,7,21,23,27,28,33,37],discard:[21,33],discoveri:25,discrep:37,disk:[0,27],dispatch:[23,24,33],displai:4,dist:0,distinguish:23,distribut:[3,21,23,27,29,30,36],distribute_test:[32,33],distributedli:6,divid:32,diy_beam_search_prob_so:[32,33],django:7,dnn:0,dns:25,doc:[0,7,21],doc_cn:7,docker:[0,2,4,7,25,29,30,31],docker_clust:[23,28],dockerhub:1,document:[0,4,5,31,35],doe:[0,3,6,37],doesn:[0,1,4,27,37],doing:37,domain:25,don:[0,1,4,7,25],done:[4,25,37],dot:33,dot_period:[33,35],doubl:[0,33],down:37,download:[0,1,3,21,22],driver:[1,31],dropout:6,drwxr:27,dtoh:37,durat:37,dure:[6,25,32,33],dwith_gpu:0,dwith_profil:37,dwith_test:0,dwith_tim:37,dynam:[0,33,37],dynamic_cast:6,each:[6,21,22,23,25,31,33,35,42],earli:4,easi:[4,6,23],easier:[0,4,6],echo:1,edit:[0,1,25],editor:0,edu:[25,27],eeoi3ezpr86c:25,effect:[0,25,33],effici:[0,6,42],efs:25,efs_dns_nam:25,efsvol:25,either:[2,37],elb:25,elbapis:25,electron:27,els:[0,1,6],emac:0,email:4,emb:27,embed:[21,42],embedding_nam:42,embedding_s:42,emphas:37,emplace_back:6,emploi:42,enabl:[4,21,25,33,37],enable_grad_shar:[32,33],enable_parallel_vector:33,enc_proj:42,enc_vec:42,encod:42,encoded_proj:42,encoded_sequ:42,encoded_vector:42,encoder_s:42,encrypt:25,encrypt_decrypt:25,end:[3,4,33,42],endpoint:25,engin:37,english:7,enjoi:1,enough:0,ensur:[0,1,3,6],entri:[0,4,6,25],env:[7,25],environ:[0,3,4,21,24,25,27,32,33,37],environmenterror:21,eos_id:42,equal:33,equip:42,error:[6,23,25,33],especi:0,eta:27,etc:[0,24,25,32,35],eth0:[21,25],evalu:37,even:[0,4,33,37],evenli:25,event:27,everi:[4,6,21,22,28,31,33,42],everyon:4,everywher:0,exactli:25,exampl:[0,4,6,21,24,25,27,32,33,35,37,42],except:35,exec:33,execut:[6,25,37],exist:[0,3,6,25,33],exit:[27,33],expand:[1,6],expect:37,experi:35,experienc:4,explain:[4,22],explicit:6,expos:25,express:25,extraattr:35,extract:25,extrem:37,f120da72:27,f7e3:25,fa0wx:27,fabric:24,fail:[27,33,35],fals:[6,16,21,27,33,35,42],faq:43,fast:37,faster:[1,37,42],fastli:4,fault:0,favorit:0,fbd1f2bb71f4:27,fc1:[6,35],fc2:35,fc3:35,fc4:35,fc8a365:25,fc8a:25,fc_layer:[6,35],fclayer:6,featur:[4,33],feel:4,fetch:[6,42],few:0,field:[25,37],figur:[6,37,42],file:[0,1,3,4,6,16,21,23,28,33,42],filesystem:25,fill:25,find:[0,3,28,37],fingerprint:25,finish:[0,21,22,23,25,27],finit:6,finnal:1,first:[0,4,6,25,33,35,37,42],first_seq:42,firstseen:27,five:37,fix:4,flag:[4,7,33],flexibl:42,flist:21,fnt03:25,focu:37,folder:25,follow:[0,1,3,4,6,7,16,25,27,28,29,30,35,37,42],fork:4,form:[3,37],format:[4,6,16,21,25,33],forward:[6,35,42],forwardactiv:6,found:[24,42],framework:[4,6,24],free:4,frequenc:37,frequent:23,from:[1,3,4,6,21,22,24,25,27,33,35,37,42],fulfil:37,full:[6,42],full_matrix_project:42,fulli:[6,37],fullyconnectedlay:6,gap:33,gather:6,gcc:0,gcc_3:3,gener:[0,4,25,28,33,35,37],generatedinput:42,get:[0,1,3,4,6,21,23,25,28,37,42,43],get_config_arg:35,get_data:27,get_input_lay:6,get_support:3,getbatchs:6,getenv:21,getinput:6,getinputgrad:6,getinputvalu:6,getoutputgrad:6,getoutputvalu:6,getparameterptr:6,getsiz:6,gettempl:25,gettranspos:6,getw:6,getweight:6,getwgrad:6,git:[0,4,7],github:[0,4,7],give:[0,4,6,25,37],given:[6,33],glibc:3,glibc_2:3,glibcxx_3:3,global:[0,25,33,37],globalstat:37,globalstatinfo:37,glog:4,glog_v:4,glog_vmodul:4,goal:37,going:31,good:37,googl:[4,24],googleapi:25,gpg2:25,gpg:25,gpu:[0,3,16,21,31,36],gpu_id:[33,35],gpugpu_id:32,grad:33,grad_share_block_num:[32,33],gradient:[21,22,33],gradual:37,grant:25,greater:21,grep:[1,28],groudtruth:42,group:40,group_input1:42,group_input2:42,group_input:42,grow:4,gru:42,gru_decod:42,gru_decoder_with_attent:42,gru_step:42,grumemori:42,gserver:6,gsizex:37,guarante:6,guest:[0,3],gui:37,guid:[4,6,25,27,37,42],gzip:27,half:25,hand:22,hard:[0,25],hardwar:[0,37],has:[0,4,6,25,27,37,42],have:[0,1,4,6,21,25,33,35,37,42],haven:0,head:[4,21,28],header:6,headip:28,heard:0,heavi:23,height:6,help:[0,4,23],helper:6,here:[0,1,3,4,7,21,23,25,27,32,35,42],heurist:33,hidden:[25,42],hierarch:[40,42],high:[6,24],higher:4,highli:[35,42],hl_get_sync_flag:6,hold:25,home:[1,25,27,28],host:[25,27],hostfil:28,hostnam:25,hostpath:27,hostport:25,hour:0,hourli:4,hous:16,how:[22,25,27,33,42,43],howev:[25,32,33,42],howto:21,htod:37,http:[0,1,4,7,25,27],hyper:6,i1117:37,iamfullaccess:25,iamusersshkei:25,id_rsa:28,ident:25,identifi:6,ids:6,idx:6,ignor:33,illustr:[6,37,42],imag:[0,4,25,29,30,35],imagepullpolici:25,imgsiz:37,imgsizei:37,imgsizex:37,imikolov:21,immedi:[0,25],implement:42,improv:[25,37],inbound:25,includ:[0,3,6,24,25,27,33,37,42],increas:[22,33],increment:33,incupd:6,index:25,indic:[23,25],individu:25,infer:16,info:[6,23],inform:[4,6,25,33,37],infrastructur:25,ingor:33,init:[6,16,21,25,35],init_model_path:[32,33,35],initi:[6,16,33,42],inlin:25,inner:6,input:[4,6,16,18,28,35,40,42],input_data:6,input_data_target:6,input_hassub_sequence_data:6,input_index:6,input_label:6,input_lay:6,input_sequence_data:6,input_sequence_label:6,input_sparse_float_value_data:6,input_sparse_non_value_data:6,input_t:6,inputdef:6,inputlayers_:6,insert:4,insid:[0,1,25],instal:[0,1,4,7,10,18,27,31,43],instanc:[6,33,37,42],instance_ip:25,instead:[0,4],instruct:[1,37],int32:33,integ:6,integer_value_sequ:42,integr:0,intend:0,interact:[1,25],interest:37,interfac:[0,25],intermedi:0,intern:[22,23,25],interpret:[0,37],introduc:[21,22,24,27],introductori:0,invok:[4,25,37],ips:25,ipt:42,is_async:21,is_seq:42,isn:37,isspars:6,issu:[0,1,3,4,37],item:16,its:[3,6,22,25,33,37],jeremi:37,job:[1,21,22,24,32,33,35],job_dispatch_packag:23,job_nam:25,job_namespac:25,job_path:25,job_workspac:23,jobpath:25,jobport0:25,jobport1:25,jobport2:25,jobport3:25,journei:1,json:[25,27],jupyt:1,just:[3,4,23,25,35],jx4xr:25,k8s_data:25,k8s_train:25,kebilinearinterpbw:37,kebilinearinterpfw:37,keep:[0,4],kei:[0,4,37],kernel:37,key1:33,key2:33,key_pair_nam:25,keyid:25,keymetadata:25,keypair:25,keyserv:25,keystat:25,keyusag:25,kill:25,kind:[1,25,27],kms:25,know:[4,6,21,25,37],kubeconfig:25,kubectl:[23,27,28],kuberent:25,kubernet:[24,29,30],l2regular:21,label:27,lag:33,lan:31,languag:35,larg:4,last:[0,33,42],lastseen:27,latenc:[23,25],later:[0,3,25],latest:[0,1,3,7,27],launch:[25,33],layer:[5,16,32,33,40,42],layer_0:6,layer_attr:[35,42],layer_num:35,layerbas:6,layerconfig:6,layergradutil:6,layermap:6,lead:37,learn:[0,1,6,7,37,42],learning_r:21,least:3,leav:25,legaci:1,len:[6,16],length:[27,33,42],let02:27,let:25,level:[4,23,33],lib64:[1,33],lib:[0,1,21],libc:3,libcuda:1,libgcc_:3,libnvidia:1,libpython2:0,librari:[0,3,21,31,33,36],libstdc:3,licens:4,lifetim:3,like:[0,3,4,21,24,25,32,35,37,42],limit:[33,37],line:[0,4,22,23,25,35,36,37],linear:16,link:[3,25],linux:[0,1,3,4,25,31],linux_x86_64:3,list:[0,6,16,21,25,33,35,42],listdir:21,listen:[21,33],littl:33,load:[0,25,33],load_missing_parameter_strategi:[32,33,35],loadsave_parameters_in_pserv:[32,33],local:[0,1,4,10,23,27,32,33],localhost:[1,7],locat:[6,21,42],log:[3,6,21,23,25,27,28,33],log_barrier_abstract:33,log_barrier_lowest_nod:[32,33],log_barrier_show_log:[32,33],log_clip:[32,33],log_error_clip:[32,33],log_period:[27,33,35],log_period_serv:[32,33],login:[3,28],look:[21,25,27,32],loss:6,lot:[21,32],lower:[4,23],lowest:33,lstm:[27,42],lstmemori:42,mac:[0,4],machin:[0,3,6,25,27,28,32,33,35],machine_transl:42,maco:[0,3,16],made:42,mai:[0,1,7,21,25,37],main:[3,22,25],mainli:33,mainlin:3,maintain:25,make:[0,4,6,21,22,25,37],malloc:6,manag:[3,7,24],mani:[0,33],manipul:23,manual:23,manylinux1:3,manylinux1_x86_64:3,map:33,mapreduc:21,mark:42,master:33,match:[3,37],math:[4,6,37],matric:[6,42],matrix:[6,32,35],matrixptr:6,matur:24,max:[33,35,37],max_length:42,maximum:[33,37,42],mean:[0,1,25,33,35,37,42],measur:37,mechan:[25,42],memcpi:37,memori:[0,4,6,27,33,35,37,42],memory_threshold_on_load_data:33,mention:0,merg:[4,7,33],messag:[4,27,33],metadata:[25,27],method:[0,1,3,6,7,33,35,37],might:[0,4,6,25],mileag:37,million:35,min:[25,35,37],minim:33,minimun:33,minut:[0,1,25],mirror:1,miss:33,mit:25,mix:42,mkdir:[7,25,28],mkl:[0,1],mobil:7,mod:21,mode:[4,33],model:[4,6,7,10,16,22,25,33,36],model_list:[33,35],model_path:35,modif:2,modifi:[6,21,23,25,42],mon:27,more:[0,1,6,7,16,21,22,27,35,37,42],most:[3,6,32,37,42],mount:[0,1,21,25,27],mountpath:[25,27],move:[1,25,37],movement:37,mpi:28,mpirun:28,much:37,mul:6,multi:[6,23,32,33],multipl:[6,21,22,25,33,35,42],multipli:6,must:[0,6,21,25,33,35,42],my_cluster_nam:25,my_external_dns_nam:25,my_lib:21,mypaddl:27,name:[1,3,6,16,21,27,29,30,33,35,37,42],namespac:[6,27],nativ:4,natur:35,navig:7,necessari:[6,28],necessarili:6,need:[0,1,2,3,6,7,16,21,25,27,31,32,33,35,42],net:0,network:[6,16,21,22,33,37],network_config:35,networkadministr:25,neural:[16,22,33,37],neuron:6,never:[25,27],next:[6,25,27,33,37,42],nfs4:25,nfs:25,nfsver:25,nic:[21,32,33],nnz:6,node1ip:28,node2ip:28,node3ip:28,node:[6,21,23,24,25,27,28,31,33],node_0:25,node_1:25,node_2:25,node_id:21,nodefil:23,nohup:21,nois:22,non:[6,25,33],none:42,nonlinear:6,nor:0,normal:[6,27,31,33,42],note:[0,1,7,21,25,33,35,37],notebook:1,noth:[0,33],notic:[4,6,42],notif:4,now:[25,33],nproc:0,nullptr:6,num:[21,33],num_gradient_serv:[21,32,33],num_input:4,num_pass:[27,32,33,35],number:[0,6,21,24,25,33],numdevices_:35,numlogicaldevices_:35,numpi:0,numsampl:37,nvidia:[1,33,37],object:37,observ:[6,37],oct:27,off:[0,1,31],offici:[4,7,25],often:[4,21],ograd:6,old:33,omit:0,ompi_comm_world_rank:21,onc:[4,6,7,25],one:[0,1,4,6,23,25,27,28,33,35],ones:4,onli:[0,2,3,6,7,25,27,32,33,35,37,42],onto:[25,28],open:[4,24,25],openbla:[0,1],openmpi:24,oper:[4,6,25,33,37,42],opt:0,optim:[6,21,22,37],option:[6,21,23,35],order:[0,6,22,25,27,33],oregon:25,organ:18,origin:4,other:[0,4,22,25,27,35,42],otherwis:[4,23,35,42],our:[0,3,4,6,25,27,42],out:[16,25,27,28,33,37,42],out_dir:25,out_mem:42,outlin:34,output:[0,4,6,18,21,27,33,35,37,42],output_:6,output_lay:16,output_mem:42,outv:6,over:[4,6,37],overal:4,overhead:37,overlap:6,overrid:6,overwhelm:4,overwrit:21,own:[0,21,23,25],owner:[0,4],packag:[0,1,4,25],paddl:[0,1,3,4,6,7,16,21,23,25,27,28,31,33,35,37,42],paddle_init_num_gradient_serv:21,paddle_init_port:21,paddle_init_ports_num:21,paddle_init_ports_num_for_spars:21,paddle_init_pserv:21,paddle_init_trainer_count:21,paddle_init_trainer_id:21,paddle_init_use_gpu:21,paddle_manylinux_devel:0,paddle_output:27,paddle_pserver2:23,paddle_train:23,paddlepaddl:[2,4,6,16,21,22,23,24,28,29,30,31,37,42],paddlepaddle_gpu:3,paddlepaddlebook:1,paddlepaddlehub:1,page:[4,25],parallel:[0,21,22,24,25,27,33,35,37],parallel_nn:[32,33],param_attr:42,paramattr:42,paramet:[0,4,6,10,16,22,23,35,36],parameter_block_s:[32,33],parameter_block_size_for_spars:[32,33],parametermap:6,parameters_:6,paramt:[21,25],paraspars:6,parent:6,pars:[0,25,35],part:[6,21,22,37,42],parti:37,particular:37,partit:[22,25],pass:[4,6,21,23,25,27,33,37],passtyp:6,password:28,past:[1,16,25],path:[0,1,21,25,27,33,35],path_to_paddlepaddle_working_directori:7,pattern:25,pem:25,peopl:0,pep425tag:3,pep8:4,per:33,perfom:[33,35],perform:[0,6,24,32,36,42],period:33,permiss:25,persist:25,persistentvolum:25,persistentvolumeclaim:25,perspect:37,perturb:6,pgp:25,pick:25,pickl:[21,28],pil:21,ping:4,pip:[0,2,4,7,16],place:[6,37],plan:6,platform:[3,4,21,24,25],pleas:[0,1,3,4,6,7,25,31,42],pnpairvalidationlay:33,pnpairvalidationpredict_fil:32,pod:[25,27],pod_nam:25,point:[0,4,37],polici:25,pool3:6,port:[21,25,27,32,33],port_num:32,portal:7,ports_num:[21,33],ports_num_for_spars:[21,32,33,35],possibl:37,post:0,potenti:37,ppo_workspac:7,practic:[6,42],pre:[0,4,25,27],precis:0,predetermin:33,predict:[10,16,33,36,42],predict_fil:33,predict_output_dir:[32,33],prefetch:6,prefix:25,pregrad:6,prepar:[22,29,42],preprocess:27,prev_batch_st:[32,33],preview:7,previou:[6,25,33],previous:27,price:16,primer:4,print:[3,16,28,33],printallstatu:37,printstatu:37,privat:4,privileg:[0,25],prob:16,probabl:[1,4,42],problem:[0,3],proc:1,proce:[1,25],process:[0,4,23,24,25,27,33,35,42],processor:37,product:[6,25,27],productgraph:27,proflier:37,program:[23,33,37],progress:33,project:[4,6,42],propag:[33,35],proper:31,properli:0,properti:33,protect:6,protocol:33,provi:21,provid:[1,16,21,23,24,25,37],providermemory_threshold_on_load_data:32,provis:25,pserver:[21,25,32,33],pserver_num_thread:[32,33],pserverstart_pserv:32,psize:6,pub:28,pull:4,purpos:37,push:4,push_back:6,put:[6,27],pvc:25,pwd:[0,1,7],pypi:3,python3:3,python:[0,1,3,4,7,16,21,28,42],queri:25,question:25,quick:[15,33],quick_start:[25,27,29],quick_start_data:27,quickstart:27,quit:37,rais:21,ran:37,rand:[33,35,37],random:[22,33],randomnumberse:32,rang:[4,33,35],rank:25,rate:6,rather:25,ratio:33,rdma:33,rdma_tcp:[32,33],reach:37,read:[0,1,7,22,25,31,42],reader:21,readi:[25,27],readwritemani:25,real:21,reason:[4,27],reciev:33,recommend:[0,1,2,4,6,7,23,31,33,42],recompil:37,record:25,recurr:40,recurrent_group:42,recurrentlay:33,recv:25,reduc:[1,4,23,33,35],refer:[0,1,6,27,42],reformat:4,region:37,regist:[6,37],register_gpu_profil:37,register_lay:6,register_timer_info:37,registri:27,regular:[6,21,25],reinstal:0,relat:[4,27],releas:25,relu:6,rememb:4,remot:[4,6,25,33,35],remoteparameterupdat:33,remov:[4,33],removing_docker_contain:0,renam:3,repli:4,repo:[4,7],report:37,repositori:7,repres:[6,25,42],reproduc:0,request:[4,25,27],requir:[3,4,6,7,21,25,27],reserveoutput:6,resid:0,resolv:[4,27],resourc:25,respect:[6,33,42],respons:[25,27],restart:[25,27],restartpolici:[25,27],restrict:33,result:[25,33,37],retran:25,retriev:[6,27],retriv:21,reus:6,rev:0,revers:42,review:27,reviews_electronics_5:27,right:4,rkt:0,rnn:[32,36],rnn_bias_attr:42,rnn_layer_attr:42,rnn_out:42,rnn_use_batch:[32,33],role:25,root:[0,25,27],row:6,rsize:25,rule:[6,25],run:[2,3,4,6,7,16,21,22,23,24,25,28,29,30,31,33,37],run_test:0,runinitfunct:37,runserv:7,runtim:[1,23],s_recurrent_group:42,sai:[33,35],sake:6,same:[0,23,25,35,42],sampl:[1,21,23,33,35],satisfi:[3,25],save:[0,21,25,27,33,35],save_dir:[27,33,35],save_only_on:[32,33],saving_period:[32,33],saving_period_by_batch:[32,33,35],scale:24,scenario:32,scene:32,schdule:25,schedul:25,scp:28,script:[0,21,23,24,25,28],search:[0,33,42],second:23,secret:25,section:[4,6,22,25,42],see:[4,25,37],seed:[33,37],seem:3,select:25,selector:27,self:6,send:[4,22,25,33],sens:4,sent:27,sentenc:42,separ:[21,33],sequenc:[4,6,40],sequenti:42,seri:3,serv:[1,21,25,37],server:[0,6,22,23,31,32],servic:21,session:37,set:[0,4,6,7,10,21,23,25,27,32,33,35,36,37,42],set_active_typ:6,set_drop_r:6,set_siz:6,set_typ:6,setp:25,setq:0,settup:6,setup:6,sever:[0,23,24,25,35],sgd:22,sgdasync_count:32,shard:[22,25],share:[0,33,37],shell:[1,25],should:[7,16,23,25,42],show:[0,3,22,25,27,33],show_check_sparse_distribution_log:[32,33],show_layer_stat:[32,33],show_parameter_stats_period:[27,32,33,35],shown:[6,25,37,42],shrink:6,sid:25,side:22,sig:25,sigint:23,sigmoid:6,sign:25,signal:23,signatur:25,signific:37,silent:21,similar:25,simpl:[33,37],simple_attent:42,simple_gru:42,simple_rnn:42,simplest:25,simpli:[1,16,37,42],simplifi:[6,27],simultan:25,sinc:[25,37],sincer:4,singl:[6,16,22,27],site:25,size:[1,6,16,33,42],size_t:6,skip:[4,23,25],slow:37,slowli:0,small:[4,6,33],small_messag:[32,33],smaller:4,snap:27,snapshot:25,snippet:[6,25,37,42],sock_recv_buf_s:[32,33],sock_send_buf_s:[32,33],socket:33,softmax:[6,42],softwar:37,some:[0,4,6,21,25,32,33,35,37],someth:[0,4],sometim:[0,37],sophist:6,sort:[25,33],sourc:[24,25,27,42],source_dict_dim:42,source_language_word:42,space:[0,37,42],spars:[6,21,25,33],sparseparam:6,sparseprefetchrowcpumatrix:6,speak:42,spec:[25,27],specfii:33,specif:35,specifi:[1,4,6,7,25,33,42],speed:0,sphinx:7,split:[21,22,25,35],split_count:[21,25],srand:33,src:21,src_backward:42,src_embed:42,src_forward:42,src_word_id:42,ssh:[25,27,28],ssh_server:23,stabl:25,stack:25,stage:23,stamp:37,standard:[0,3],stanford:27,start:[0,1,3,23,28,33,37,42,43],start_mpi_train:28,start_pass:[32,33],start_pserv:33,startup:25,stat:[33,37],state:[27,33,42],statement:[6,25],staticinput:42,statist:33,statset:37,statu:[25,27,37],status:27,std:[6,33],stdbuf:21,stderr:23,stdout:23,step:[0,1,4,6,16,22,25,27,28,37,42],still:3,stmt1482205552000:25,stmt1482205746000:25,stochast:22,stop:[0,23,27,33],storag:[21,25,27],store:[6,7,21,22,25,27,28,33],str:35,strategi:33,strict:22,string:[6,25,33],strongli:31,structur:[23,25],sts:25,stuff:4,style:0,sub:[6,42],submit:[7,24,25,32,33],subnet0:25,subnet:25,subobjectpath:27,subset:6,succeed:27,success:[25,27],successfulcr:27,successor:33,sudo:[0,25],suffici:33,suffix:[3,21],suggest:[4,37],suitabl:33,sum:[6,42],support:[0,1,3,6,7,16,21,22,24,25,33,37,40,42],suppos:6,sure:[0,6,25],swig:0,symbol:3,symlink:4,sync:33,syncflag:6,synchron:[22,25,33],system:[0,1,3,7,21,24,27],tab:3,tag:[1,31,42],take:[0,4,6,21,25,27,31,37,42],tanh:[6,42],tar:25,tarbal:25,target:[0,42],target_dict_dim:42,target_language_word:42,task:[35,42],tbd:[8,9,11,12,13,14,17,19,20,26,38,39,41],tcp:[25,33],tear:37,techniqu:[6,42],technolog:0,tee:27,tell:[1,37],templat:27,ten:0,termin:27,test:[10,16,21,28,32,37],test_all_data_in_one_period:27,test_data_dir:21,test_fcgrad:6,test_gpuprofil:37,test_layergrad:6,test_pass:[32,33,35],test_period:[32,33,35],test_recurrent_op:4,test_sum_op:0,test_wait:[32,33],testbilinearfwdbwd:37,testconfig:6,testfcgrad:6,testfclay:6,testlayergrad:6,testmodel_list:32,testsave_dir:32,testutil:6,text:25,tflop:37,tgz:3,than:[0,6,21,25,42],thei:[0,4,6,22,25,32,37,42],them:[0,4,7,25,32,33,37],theori:37,thi:[0,1,3,4,6,7,16,21,22,25,27,31,33,35,37,42],thing:37,third:37,thought:37,thread:[6,21,33,35,37],thread_local_rand_use_global_se:[32,33],threadid:35,threadloc:37,three:33,threshold:[4,33],through:[6,7,21,37,42],throughput:[21,37],thu:[6,25],tier:27,time:[0,27,33,35,37,42],timelin:37,timeo:25,togeth:42,token:42,toler:0,tool:[0,3,24,25,42],toplevel:0,total:[21,24,27,37],train:[6,10,24,28,29,30,32,36,37,42],train_config_dir:25,train_data:21,train_data_dir:21,train_id:25,train_list:21,traindot_period:32,trainer:[6,22,23,33,35],trainer_config:[25,27],trainer_config_help:6,trainer_count:[16,21,25,27,32,33,35],trainer_id:[21,25,33],trainingtest_period:32,tran:[6,33],transform:[6,42],transpar:23,transport:33,transpos:6,travi:4,tree:33,trg_embed:42,tune:36,tuninglog_barrier_abstract:32,turn:1,tutori:[0,1,6,25,28,29,30,37,40,42],two:[0,23,25,35,37,42],txt:[6,7,21,25,28],type:[6,16,21,22,25,27,31,33,35,42],typic:37,ubuntu:[3,16],uci_h:16,uid:27,unbalanc:33,unbound:42,under:[0,1,7,21,23,24,25],understand:37,understand_senti:42,undeterminist:37,unifi:4,uniform:33,uninstal:0,uniqu:[25,31,33],unique_ptr:6,unit:[0,10,42],unittest:4,unittestcheckgrad_ep:32,unnecessari:4,until:25,updat:[3,6,22,33,35],updatecallback:6,updatestack:25,upgrad:[0,3],upload:22,upstream:4,uri:25,url:4,usag:37,use:[0,1,3,4,6,7,16,21,22,25,27,33,35,37,42],use_gpu:[16,21,27,32,33,35],use_old_updat:[32,33],used:[0,3,6,25,32,33,35,37,42],useful:[6,35,42],usegpu:6,user:[1,4,7,21,25,32,33],uses:[0,3,4,6,7,22,25,33,42],using:[0,1,2,4,6,7,16,21,25,27,28,31,33,35,37,42],usr:[0,1,21,25,33],usual:[4,25,33,35,37],util:[6,37,42],valid:25,valu:[6,25,33,35,42],value1:33,value2:33,vanilla:42,vari:[25,37],variabl:[6,21,24,25,27],vector:[6,42],vectorenable_parallel_vector:32,verbos:4,veri:[23,37,42],verifi:6,version:[0,1,6,16,25,27,31,32,33,37],via:[2,4,25,37],vim:1,viriabl:21,virtual:0,virtualenv:0,visual:37,vlog:4,voila:16,volum:[7,27],volumemount:[25,27],volumn:25,wai:[0,4,6,35,42],wait:[21,22,33],want:[0,1,4,6,7,21,33,35],warp:37,wbia:25,web:7,websit:7,weight:[6,33,35,42],weightlist:6,weights_:6,weights_t:6,welcom:4,well:[6,25,33],west:25,what:0,whatev:[0,21],wheel:3,when:[0,2,4,6,7,21,22,24,25,27,33,35,37,42],whenev:4,where:[6,33,35,37,42],whether:[0,1,6,33],which:[0,3,4,6,21,22,23,25,33,35,37,42],whl:0,who:4,whole:[4,22,25,27],whose:42,why:0,wide:[3,23,28],width:6,window:[0,1],wish:[0,3,7,21],with_avx:[0,1,31],with_c_api:0,with_doc:0,with_doubl:[0,6,31],with_dso:0,with_golang:0,with_gpu:[0,31],with_mkl:0,with_profil:37,with_python:[0,31],with_rdma:31,with_style_check:[0,4],with_swig_pi:0,with_test:0,with_tim:[31,37],without:[21,28],wmt14:42,won:[22,37],word2vec:21,word:[21,35,42],word_dict:[21,28],word_vector_dim:42,work:[0,1,4,6,7,22,25,27,33,37,42],worker:25,workercount:25,workflow:[18,25],workspac:[4,21,23,33],world:21,would:[0,1,4,25],wrapper:37,write:[5,21,24,25],written:[0,1,23],wsize:25,xarg:[1,6,28],xgbe0:[21,33],xgbe1:[21,33],xrang:[6,16],xxxxxxxxx:25,xxxxxxxxxx:25,xxxxxxxxxxxxx:25,xxxxxxxxxxxxxxxxxxx:25,y_predict:16,yaml:[23,25,28],yapf:4,you:[0,1,2,3,4,6,7,16,21,23,25,28,31,33,35,37,42],your:[0,2,3,4,6,7,21,23,25,31,35,37],your_access_key_id:25,your_secrete_access_kei:25,yourself:0,zaist:0,zero:[6,25,33],zone:25,zxvf:25},titles:["Build from Sources","Run in Docker Containers","Install and Build","Install using pip","Contribute Code","Development","Write New Layers","Contribute Documentation","Install, Build and Unit test","Cluster Training and Prediction","FAQ","Local Training and Prediction","Model Configuration","Parameter Setting","Basic Concept","GET STARTED","Quick Start","Install and Build","C-API Prediction Library","Input/Output Data Organization","C-API Workflow","Command-line arguments","Distributed Training","Fabric","Use different clusters","Kubernetes on AWS","Kubernetes Distributed","Kubernetes","OpenMPI","<no title>","<no title>","Preparations","Argument Outline","Detail Description","Set Command-line Parameters","Use Case","HOW TO","Tune GPU Performance","Layers supporting hierarchical sequence as input","API comparision between RNN and hierarchical RNN","RNN Models","Recurrent Group Tutorial","RNN Configuration","PaddlePaddle Documentation"],titleterms:{"case":35,"class":6,"new":6,AWS:25,DNS:25,EFS:25,For:27,KMS:25,Use:[7,16,24,27,35],access:25,account:25,add:25,address:25,api:[18,20,39],appendix:0,approach:37,architectur:42,argument:[21,32,35],ask:0,asset:25,associ:25,async:[21,33],attent:42,aws:25,basic:14,between:39,bla:0,book:1,bool:0,bucket:25,build:[0,2,7,8,17,27],built:37,check:[6,23],choos:25,cloudform:25,cluster:[9,23,24,25,28,35],code:4,command:[21,34,35],commit:27,common:33,commun:33,comparis:39,compil:0,concept:[14,25],config:35,configur:[12,25,42],contain:[1,27],content:[25,37],contribut:[4,7],core:25,cpu:35,creat:[25,27],credenti:25,cuda:0,cudnn:0,data:[19,25,27],dataprovid:33,dataset:21,defin:25,delet:25,demo:25,dep:3,depend:[0,3],deriv:6,descript:33,destroi:25,detail:33,develop:5,devic:35,differ:[24,35],directori:25,distribut:[22,25,26,33],docker:[1,27],document:[7,43],down:25,download:[25,27],each:3,ec2:25,elast:25,equat:6,extern:25,fabric:23,faq:[2,3,10],file:[25,27],find:25,frequent:0,from:[0,2],gate:42,gener:42,get:[15,27],gpu:[1,33,35,37],gradient:6,group:[25,41],gru:33,hand:37,hierarch:[38,39],how:[0,7,36,37],iam:25,imag:[1,27],implement:6,initi:[25,35],input:[19,38],inspect:25,instal:[2,3,8,16,17,25],instanc:25,integr:25,job:[23,25,27,28],kei:25,kill:23,kube:25,kubectl:25,kubernet:[25,26,27],launch:[1,23,28],layer:[6,35,38],librari:18,line:[21,34],linux:23,local:[11,25,35],log:4,lstm:33,matrix:33,mix:35,model:[12,23,35,40,42],modifi:27,name:25,need:37,network:[35,42],neural:42,nlp:33,nvprof:37,nvvp:37,openmpi:28,option:0,org:7,organ:19,outlin:32,output:[19,23,25],packag:3,paddlepaddl:[0,1,3,7,25,27,43],pair:25,parallel_nn:35,paramet:[13,21,25,33,34],pass:[0,35],perform:[33,37],pip:3,point:25,predict:[9,11,18],prepar:[21,23,25,28,31],privat:25,profil:37,program:[1,21],pull:1,python:6,question:0,quick:16,randomnumb:33,recurr:[41,42],refer:37,region:25,render:25,requir:0,result:[23,27],review:4,rnn:[33,39,40,42],route53:25,run:[0,1,27],runtim:3,script:27,secur:25,sequenc:[38,42],server:[21,25,33],servic:25,set:[13,34],setup:25,sgd:[21,33],simpl:42,sourc:[0,2],spars:35,specifi:35,standard:4,start:[15,16,21,25,27],startup:27,step:2,style:4,support:38,system:25,tear:25,templat:25,test:[0,4,6,8,33,35],timer:37,tip:37,tool:[7,37],train:[1,9,11,21,22,23,25,27,33,35],trainer:[21,25],tune:[33,37],tutori:41,type:0,unit:[4,6,8,33],updat:[7,21,25],using:3,vector:33,verifi:25,version:3,volum:25,vpc:25,what:37,whl:3,why:37,workflow:[4,20],wrapper:6,write:[4,6,7],www:7,yaml:27,your:1}}) \ No newline at end of file diff --git a/develop/doc/survey/cluster_bootstrapping_tools.html b/develop/doc/survey/cluster_bootstrapping_tools.html deleted file mode 100644 index eb4667e4a0e6e7f3d0b2f13d0fa770c3feaec382..0000000000000000000000000000000000000000 --- a/develop/doc/survey/cluster_bootstrapping_tools.html +++ /dev/null @@ -1,357 +0,0 @@ - - - - - - - - - - - - - Cluster bootstrapping tool survey — PaddlePaddle documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                - - - - -
                                - - - - - - -
                                -
                                - - - - - - -
                                - -
                                -
                                -
                                -
                                - -
                                -

                                Cluster bootstrapping tool survey

                                -
                                -

                                Abstract

                                -

                                In order to bring up a cluster from bare metal machine to a fully functional kubernetes cluster for Paddlepaddle to run, we need to utilize some tools. Here we are going to compare Sextant and Tectonic installer

                                -
                                -
                                -

                                Basic assumptions

                                -

                                Here are some basic assumptions before we move on to details

                                -
                                  -
                                1. You are an administrator of a bare metal machine cluster, which means:
                                2. -
                                -
                                  -
                                • you have full control to each of the machines.
                                • -
                                • you have full control to the network which machines are connected to.
                                • -
                                -
                                  -
                                1. Machines can be booted from network with PEX or iPXE
                                2. -
                                3. You understand the general procedure to bring up a cluster
                                4. -
                                -

                                if your cluster is able to mark above items with checkmarks, then keep reading.

                                -
                                -
                                -

                                Comparing Sextant and Tectonic installer

                                -
                                -

                                Sextant

                                -

                                Sextant is an end2end solution to bring up a bare metal cluster to a fully functional k8s cluster, it integrates DHCP, name service, PEX, cloud-config-service, docker registry services altogether.

                                -
                                -

                                Pros

                                -
                                  -
                                1. End2End: basically all admin need to do is to config the cluster.yaml and power on the cluster.
                                2. -
                                3. Offline cluster configuration: Sextant has 2 phases during working with it, config time and deploy time. when admin is configuring, it requires admin’s machine has internet connectivity, which will download some images, etc. But in deploy time, it’s completely OK to go offline since all dependencies are ready during config time.
                                4. -
                                5. docker registry integrated.
                                6. -
                                7. GPU machine took care of.
                                8. -
                                -
                                -
                                -
                                -

                                Cons

                                -
                                  -
                                1. k8s API server is not deployed with high availability in considering by default.
                                2. -
                                3. No grouping support.
                                4. -
                                5. No API interface, a one-off service.
                                6. -
                                -
                                -
                                -

                                Tectonic installer

                                -

                                First of all, Tectonic is not free, it requires coreos.com account as a step of installation, and free user can only create less than 10 nodes.

                                -

                                Tectonic is a suite of software which wraps around k8s and providing more utility regarding dev ops, ie, -Tectonic installer as it’s named, it installs Tectonic to a bare metal cluster which means it’s not totally an equivalent of Sextant. At the “booting a cluster” part, it mostly utilizes Matchbox, which is a general cluster bootstrapper.

                                -

                                Matchbox’s Approach is similar to Sexstant.

                                -
                                -
                                -

                                Pros

                                -
                                  -
                                1. supports grouping machines.
                                2. -
                                3. supports running provisioning service in rtk. (not a big deal though).
                                4. -
                                5. supports http/gRPC API interface.
                                6. -
                                7. supports multi-template.
                                8. -
                                -
                                -
                                -

                                Cons

                                -
                                  -
                                1. Not an e2e solution to bring up a cluster, need a lot of extra work and other software.
                                2. -
                                3. Not fully supporting centOS deployment yet.
                                4. -
                                -
                                -
                                -
                                -

                                Conclusion

                                -

                                Sextant is a better solution overall for paddle cloud deploying to a bare metal cluster. It would be great if Sextant can also 1) deploy k8s api server with high availability by default; 2) not designed as a one-off service.

                                -
                                -
                                -

                                Appendix: General procedure to bring up a cluster

                                -

                                It’s physically impossible for a cluster admin to manually install OS and applications into cluster nodes one by one, here is what an admin would do in cloud industry:

                                -
                                  -
                                1. setup a bootstrap machine with static IP in the cluster, which has following services:
                                2. -
                                -
                                  -
                                • DHCP: assigns ip address for rest of the nodes.
                                • -
                                • name service: to map node name to a IP
                                • -
                                • PXE related services: the booting related info will be delivered to newly booted machines as their IP is assigned via DHCP service, PXE service will provide further booting and installing info and image with TFTP and http protocol.
                                • -
                                • cluster config service: this is for providing cluster node with OS config via http
                                • -
                                • optional docker registry: a built-in docker registry makes the whole cluster independent from connecting internet, and speeds up software distribution.
                                • -
                                -
                                  -
                                1. New node powers on, it will
                                2. -
                                -
                                  -
                                • broadcast the request for an IP address
                                • -
                                • DHCP server assigns the IP address, and deliver the PXE booting related info to the node.
                                • -
                                • cluster node will request config files with booting info delivered with DHCP via the TFTP service, and in most of the cases, the config file will point to a http service for the booting image.
                                • -
                                • Since PXE is configured with initrd, it will utilize the cloud config service and do further installations like coreOS or K8s installations.
                                • -
                                • then restart the node.
                                • -
                                -

                                For further understanding, following 2 links from Matchbox are some good readings:

                                - -
                                -
                                - - -
                                -
                                -
                                - - -
                                - -
                                -

                                - © Copyright 2016, PaddlePaddle developers. - -

                                -
                                - Built with Sphinx using a theme provided by Read the Docs. - -
                                - -
                                -
                                - -
                                - -
                                - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/.buildinfo b/develop/doc_cn/.buildinfo index eb2dc0daa1643ab05ebbc79962f778277cf06665..3fa5d35c2285c622247d2cb2d534a9121ecb7d5b 100644 --- a/develop/doc_cn/.buildinfo +++ b/develop/doc_cn/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 85c8b7c86b39554b9baa0019a9b6f16a +config: 83a63eb718f9df9e43a17cb922d8d728 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/develop/doc_cn/_images/control_flow_graph.png b/develop/doc_cn/_images/control_flow_graph.png deleted file mode 100644 index 3579998e58d07abc50bd3332128d4733a391cb3b..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/control_flow_graph.png and /dev/null differ diff --git a/develop/doc_cn/_images/dataflow_equations.png b/develop/doc_cn/_images/dataflow_equations.png deleted file mode 100644 index c10f7f69f4007952e5b0394edaa04efa1cfbb658..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/dataflow_equations.png and /dev/null differ diff --git a/develop/doc_cn/_images/deep_learning.png b/develop/doc_cn/_images/deep_learning.png deleted file mode 100644 index 026becc4d94e01e407dacb2a5314a0e5723334ff..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/deep_learning.png and /dev/null differ diff --git a/develop/doc_cn/_images/fluid-compiler.png b/develop/doc_cn/_images/fluid-compiler.png deleted file mode 100644 index 1b0ffed2039c91a3a00bbb719da08c91c3acf7bb..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/fluid-compiler.png and /dev/null differ diff --git a/develop/doc_cn/_images/graph_construction_example_all.png b/develop/doc_cn/_images/graph_construction_example_all.png deleted file mode 100644 index 261611a5721f9aa97874f7e6d897fe48cf667db2..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/graph_construction_example_all.png and /dev/null differ diff --git a/develop/doc_cn/_images/graph_construction_example_forward_backward.png b/develop/doc_cn/_images/graph_construction_example_forward_backward.png deleted file mode 100644 index 4c69687f4a6a181138f3df72ce5e8aa48487b5be..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/graph_construction_example_forward_backward.png and /dev/null differ diff --git a/develop/doc_cn/_images/graph_construction_example_forward_only.png b/develop/doc_cn/_images/graph_construction_example_forward_only.png deleted file mode 100644 index e668c16e0cac73acb4e5dc2b1827557ae77126b4..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/graph_construction_example_forward_only.png and /dev/null differ diff --git a/develop/doc_cn/_images/pprof_1.png b/develop/doc_cn/_images/pprof_1.png deleted file mode 100644 index 8e9edbf377672d0ef40f2fc7bd39e746923550cb..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/pprof_1.png and /dev/null differ diff --git a/develop/doc_cn/_images/pprof_2.png b/develop/doc_cn/_images/pprof_2.png deleted file mode 100644 index 172ba20399ba974d27f4c072425277b69b02520b..0000000000000000000000000000000000000000 Binary files a/develop/doc_cn/_images/pprof_2.png and /dev/null differ diff --git a/develop/doc_cn/_sources/design/api.md.txt b/develop/doc_cn/_sources/design/api.md.txt deleted file mode 100644 index e6a4638d9100d9b07c3ee6b92b530a17eae1c162..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/api.md.txt +++ /dev/null @@ -1,262 +0,0 @@ -# PaddlePaddle Design Doc - -## Ingredients - -As our design principle is starting from the essence: how could we -allow users to express and solve their problems as neural networks. -Some essential concepts that our API have to provide include: - -1. A *topology* is an expression of *layers*. - -1. A layer could be any kind of computation, including *cost*. - -1. Some layers have parameters, some don't. Most costs don't have - parameters. - -1. In some topologies, layers share parameters. For - example, - [the network for training a ranking model](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850). - -1. At programming time, users specify topologies and possible sharing - of parameters. PaddlePaddle can figure out and create parameters - required (and possibly shared) by one or more topologies. - - -## Starting from Examples - -As a summarization -of -[our disucssion](https://github.com/PaddlePaddle/Paddle/issues/1315), -let us present two examples here: - - -### Example 1. Sharing Parameters between Layers - -We use -the -[3-branch ranking](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850) model -in this example. For your convenience, I copy-a-paste the model's -topology as follows: - -``` -A -> f -\ -Q -> f --> cost -B -> f -/ -``` - -The following program trains the topology including the cost, and then -use the sub-network in the trained topology in inference: - -```python -def f(in): - e = paddle.layer.embedding(in, parameter_name="embedding") - o = paddle.layer.softmax(e, parameter_name="semantic") - return o - -# Create 3 topologies (subnets), they share parameters because all -# correspoinding layers have the same parameter names. -fA = f(paddle.layer.data(input_name="A")) -fB = f(paddle.layer.data(input_name="B")) -fQ = f(paddle.layer.data(input_name="Q")) - -topology = paddle.layer.less_than( - paddle.layer.cross_entropy(fA, fQ), - paddle.layer.corss_entropy(fB, fQ)) - -# Derive parameters required in topology and create them in model. -parameters = paddle.parameters.create(topology) - -# Estimate parameters used in topology from data. -paddle.train(topology, parameters, reader=read_ranking_model_data) - -# Inference using fA (or fB or fC, as they share their parameters). -[testA, testB, testQ] = read_ranking_model_data() -print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA) -``` - - -### Example 2. Sharing Parameters between "Models" - -We use [GAN](https://github.com/PaddlePaddle/book/tree/develop/gan) in -this example. In the following example program, `d0` and `d1` -correspond to the two networks in the following figure: - - - -```python -def G(in): - # over-simplified example as G has only one layers: - return paddle.layer.fc(in, parameter_name="G") - -def D(in); - # again, over-simplified: - return paddle.layer.fc(in, parameter_name="D") - -# Construct the first topology, which contains both D and G. -# By learning this topology, we update parameters of G. -d0 = paddle.layer.should_be_false(D(G(paddle.layer.data()))) - -# Construct a second topology d1, which contains only D. By -# training this topology, we update parameters of D. Note -# that d1 share parameters with d0. -d1 = paddle.layer.should_be_true(D(paddle.layer.data())) - -# Create parameters from a list of multiple topologies (models) for -# the chance to share parameters between these topologies. -parameters = paddle.parameters.create([d0, d1]) - -# Iterative training of GAN. -for ...: - train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"}) - train(d1, parameters, reader=read_from_realistic_images) - -# Use d1 for inference: -print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images) -``` - - -### Summarization - - -Above two programs reveal some important design concerns: - -1. Users describe a topology as an expression of layers. Every layer - has a *parameter name*. If the users don't specify it explicitly, it's automatically generated as a unique name. By - specifying the parameter name, users can specify the sharing of - parameters between layers and even between topologies. - -1. `paddle.parameters.create` figures out parameters required by one - or more topologies from parameter names of layers. It creates these - parameters and returns a `ParameterSet` object, which is in essence - a map from *parameter names* to *parameters*. - -1. At training and inference time, `paddle.train` and `paddle.infer` - requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons: - - 1. This prevents users from forgetting to call - `paddle.parameters.create`. - 1. `paddle.train` needs to know which parameter set to update. - 1. Users could load another (pre-trained) parameter set and use it - with a topology in `train.infer`. - -1. By specifying the `immutable_parameters` parameter of - `paddle.train`, we can forbid the update of these parameters. - - -## Reader - -Not all programming frameworks allow users to define I/O functions. -An example is Google MapReduce, which can only read from text, -SSTable, and RecordIO files. Hadoop MapReduce allows users to define -readers and writers by deriving from base classes `Reader` and -`Writer`. The former is less flexible but also less error-prone. We -decide to provide the flexibility to users to define their readers. - - -There are some open questions here: - -1. **Should a reader return a Python dictionary?** - -1. **How to map multiple outputs from a reader to multiple data layers?** - -1. **How to easily compose some existing readers to read more data and - feed a topology with more data layers?** - - -## Training - -The recommended way to training a model is to call `paddle.train`, -which simply calls `paddle.trainer.Default`, a global variable of -type `paddle.trainer.SGD`. Equivalently, we can do - -```python -opt = paddle.trainer.SGD(..., paddle.updater.Adam(...)) -opt.train(topology, parameters, reader=read, ...) -``` - -### Updater - -Please be aware that a trainer can accept an updater as its data -member, where an updater is a class derived from -`paddle.trainer.Updater`. This is to make it easier to customize -trainers, as discussed -[here](https://github.com/PaddlePaddle/Paddle/issues/1319). - -### Event Handler - -`paddle.train` and `paddle.trainer.XXX.train` take an optional -parameter `event_handler`, which should be either `None` or a function -that handle some events: - -1. BeginTraining -1. EndTraining -1. BeginIteration -1. EndIteration -1. BeginPass -1. EndPass - -where EndPass is sent if and only if the reader yields -`end_pass=True`. - -An example as follows: - -```python -def event_handler(event): - if ininstance(event, paddle.event.EndIteration): - print paddle.test(...) - -paddle.train(topology, parameters, reader, event_handler) -``` - -If we are writing a PaddlePaddle program in and for iPython/Jypyter, -we can use metaplotlib in the event handler to plot a curve of -cost/error versus iterations, as shown -[here](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/). - -### Distributed Training - -If users want to do distributed training on a cluster, s/he should -call `paddle.dist_train` and provides access tokens to the cluster as -a parameter. - -For example, if the user has a TLS certificate that allows him to -access a Kubernetes cluster, s/he should be able to call - -```python -paddle.dist_train(model, - trainer=paddle.trainer.SGD(..., - paddle.updater.Adam(...)), - reader=read, - k8s_user="yi", - k8s_token="kube_cluster_tls.pem", - k8s_job="hello", - num_parameter_servers=15) -``` - -The pseudo code of `paddle.dist_train` is as follows: - -```python -def dist_train(topology, parameters, trainer, reader, ...): - if os.getenv("KUBERNETES_SERVICE_HOST") == None: - image_name = k8s_user + '/' + k8s_job - docker_build(image_name) - docker_push() - kube_ctrl_start_job(image_name, k8s_user, k8s_token) - else: - rank = kube_list_containers_in_job_and_return_current_containers_rank() - if rank == 0: - master() - elif rank < 15: - parameter_server() - else: - trainer.train(model, reader=read) -``` - -Please be aware that if a process is running on the Kubernetes -cluster, it will have some environment variables pre-defined. - -If `dist_train` doesn't see these environment variables, it knows -that it's running on users' personal computer, and it should work as a -*launcher*. Otherwise, it knows that it's running on the cluster and -need to figure out its role as either the master, or a trainer, or a -parameter server. diff --git a/develop/doc_cn/_sources/design/auto_gradient_check.md.txt b/develop/doc_cn/_sources/design/auto_gradient_check.md.txt deleted file mode 100644 index 773b7b6a767541f28c27f247c1ad8c9a8a2d0ccf..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/auto_gradient_check.md.txt +++ /dev/null @@ -1,150 +0,0 @@ -## Auto Gradient Check Design - -## Background: -- Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right because of the following challenges: - 1. The formula for backpropagation formula should be correct according to the forward computation. - 2. The Implementation of the above shoule be correct in CPP. - 3. It is difficult to prepare an unbiased test data. - -- Auto gradient checking gets a numerical gradient using forward Operator and uses it as a reference for the backward Operator's result. It has several advantages: - 1. Numerical gradient checker only needs the forward operator. - 2. The user only needs to prepare the input data for forward Operator and not worry about the backward Operator. - -## Mathematical Theory -The following documents from Stanford have a detailed explanation of how to compute the numerical gradient and why it is useful. - -- [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization) -- [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96) - - -## Numerical Gradient Implementation -### Python Interface -```python -def get_numerical_gradient(op, - input_values, - output_name, - input_to_check, - delta=0.005, - local_scope=None): - """ - Get Numerical Gradient for the input of an operator. - - :param op: C++ operator instance, could be an network. - :param input_values: The input variables. Should be an dictionary, whose key is - variable name, and value is a numpy array. - :param output_name: The final output variable name. - :param input_to_check: The input variable with respect to which the gradient has to be computed. - :param delta: The perturbation value for numerical gradient method. The - smaller the delta, the more accurate the result. But if the delta is too - small, it will suffer from the numerical stability problem. - :param local_scope: The local scope used for get_numeric_gradient. - :return: The gradient array in numpy format. - """ -``` - -### Explanation: - -- Why do we need an `output_name` - - An Operator may have multiple Outputs, one can compute an independent gradient from each Output. So the caller should specify the name of the output variable. - -- Why do we need `input_to_check` - - One operator can have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numerical Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times each with a different input. - - -### Core Algorithm Implementation - - -```python - # we only compute the gradient of one element a time. - # we use a for loop to compute the gradient of each element. - for i in xrange(tensor_size): - # get one input element using the index i. - original = tensor_to_check.get_float_element(i) - - # add delta to it, run the forward op and then - # get the new value of the result tensor. - x_pos = original + delta - tensor_to_check.set_float_element(i, x_pos) - y_pos = get_output() - - # Subtract delta from this element, run the op again - # and get the new value of the result tensor. - x_neg = original - delta - tensor_to_check.set_float_element(i, x_neg) - y_neg = get_output() - - # restore old value - tensor_to_check.set_float_element(i, original) - - # compute the gradient of this element and store - # it into a numpy array. - gradient_flat[i] = (y_pos - y_neg) / delta / 2 - - # reshape the gradient result to the shape of the source tensor. - return gradient_flat.reshape(tensor_to_check.get_dims()) -``` - -## Auto Gradient Check Framework - -Each Operator Kernel has three kinds of Gradient: - -1. Numerical gradient -2. CPU kernel gradient -3. GPU kernel gradient (if supported by the device) - -The numerical gradient only relies on the forward Operator, so we use the numerical gradient as the reference value. The gradient checking is performed in the following three steps: - -1. Calculate the numerical gradient -2. Calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient. -3. Calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient. (if supported) - -#### Python Interface - -```python - def check_grad(self, - forward_op, - input_vars, - inputs_to_check, - output_name, - no_grad_set=None, - only_cpu=False, - max_relative_error=0.005): - """ - :param forward_op: used to create backward_op - :param input_vars: numpy value of input variable. The following - computation will use these variables. - :param inputs_to_check: the input variable with respect to which the - gradient will be computed. - :param output_name: The final output variable name. - :param max_relative_error: The relative tolerance parameter. - :param no_grad_set: used to create backward ops - :param only_cpu: only compute and check gradient on cpu kernel. - :return: - """ -``` - -### How to check if two numpy arrays are close enough? -if `abs_numerical_grad` is nearly zero, then use absolute error for numerical_grad. - -```python -numerical_grad = ... -operator_grad = numpy.array(scope.find_var(grad_var_name(name)).get_tensor()) - -abs_numerical_grad = numpy.abs(numerical_grad) -# if abs_numerical_grad is nearly zero, then use abs error for -# numeric_grad, instead of relative error. -abs_numerical_grad[abs_numerical_grad < 1e-3] = 1 - -diff_mat = numpy.abs(abs_numerical_grad - operator_grad) / abs_numerical_grad -max_diff = numpy.max(diff_mat) -``` - - -#### Notes: -The Input data for auto gradient checker should be reasonable to avoid numerical stability problem. - - -#### References: - -- [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization) -- [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96) diff --git a/develop/doc_cn/_sources/design/backward.md.txt b/develop/doc_cn/_sources/design/backward.md.txt deleted file mode 100644 index 20fda7a98f514a3f1c1c2d0ba7447ec954b21d5a..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/backward.md.txt +++ /dev/null @@ -1,158 +0,0 @@ -# Backward Building - -## Motivation - -In Neural Network, most models are solved by the backpropagation algorithm(known as **BP**) at present. Technically, BP calculates the gradient of the loss function, then propagates it back through the networks following the chain rule. However, when configuring the model structure, users do not need to define the backward part. So a mechanism is required by the framework which can complete the model's backward part automatically according to the given forward part. - -When implementing a specific `op`, the developer is also asked to implement its backward version, called `grad_op`. A `grad_op` takes gradients of its corresponding `op`'s outputs, and calculate gradients of the `op`'s inputs. During the building of a model's backward part, the framework creates each forward `op`'s `grad_op`, and then string them together in reverse order of forwarding part. In this way, gradients spread from the end to the beginning of the model, in another word, from the loss to parameters. - -## Challenges - -The motivation of backward building is apparent. However, implementation it correctly is not so easy. In the **Fluid** design, a deep learning model is described by `Program`, `Block`, `Op` and `Variable`. The `Block` itself can be nested. It means that the `op`s and `variable`s are scattered across different blocks rather than all be gathered in a single graph. Our backward building algorithm shall visit blocks in recursive order and be able to insert `grad_op`s and new created `variable`s into the right place. - -## Usage - -Although the whole algorithm is comprised of many functions, only one is exposed as API: - -```python -def append_backward(loss, parameter_list=None, no_grad_set=None): - """ - Append backward part to main_program - - Args: - loss(Variable): The variable generated by the cost function. - parameter_list(list): Parameters that need to be updated by optimizers. - If None, it means all parameters need to be updated. - - no_grad_set(set): Variables that have no gradients in Block 0. - If None, the set will be generated inside the function and - contains all variables with `step_gradient=True` from all blocks. - - Return: - (list[Variable]): list of (parameters, gradients) pair. - """ -``` - -By invoking this API, the framework appends backward part of the program where the `loss` is. It takes three arguments. `loss` means the final loss value. It must be a scalar and is usually the output of the loss layer. It is also where the gradient generated and backpropagation starts. `parameter_list` marks all parameters needs updating. If it's `None`, all parameter will be updated by optimizers. `no_grad_set` marks variables without gradient. if all outputs of some `grad_op` are in `no_grad_set`, the `grad_op` will not be run. - -This API will be invoked automatically before optimizer building. -As a result, in most cases, users do not need to invoke the API by themselves to append backward part. - -## Implementation - -The implementation of backward building algorithm is in `backward.py` file. The whole algorithm can be divided into two independent parts: creating `grad_op`s and creating new variables. - -### Creating `grad_op`s - -The creating of `grad_op`s is implemented by: - -```python -def _append_backward_ops_(target, - block, - target_block, - no_grad_dict, - grad_to_var): - """ - Create all grad ops, and insert them into given block - - Args: - target(Variable): the target variable of forward pass - block(Block): the block where forward ops are - target_block(Block): the block which is going to hold new generated grad ops - no_grad_dict(dict): - key(int) block index - val(set) a set of varibale names. These varibales have no gradient - grad_to_var(dict)(output argument): - key(str): grad variable name - val(str): corresponding forward variable name - """ -``` - -Given a `block`, the function will traverses all `op`s in this block in reverse order, gets corresponding `grad_op` from the C++ core via `core.get_grad_op_desc()`, then append it to `target_block`. - -However, some specific `op`(e.g. `while_op`, `if_else_op`) can hold its own sub-block. For these sub-blocks contains `op`s as well, the `grad_op` creating should be recursive. - -During the reverse traversal, we check each `op` whether it has an attribute named `sub_block`. If so, it means there is a sub-block and we need to deal with it first. After creating a new block whose father is the one in `op`'s attribute, we invoke `_append_backward_ops_()` recursively, assigning the new block to parameter `target_block` and the one in `op`'s attribute to `block`. The *pseudo-code* shows this process: - -``` -******* pseudo-code ******** -for op in reversed(block.ops): - if op has an attribute named 'sub_block': - Get the sub-block(`s_block`) from op's attribute. - Create a new block(`grad_s_block`), whose father is `s_block`. - Invoke _append_backward_ops_(), with `block=s_block` and `target_block=grad_s_block` - - Invoke `core.get_grad_op_desc()` to get op's grad_op. - Insert name correspondings between variables and their gradients of the grad_op to grad_to_var - Assign grad_s_block to grad_op as it's 'sub_block' attribute. - Append grad_op to current target_block. -``` - -The first invoking of `_append_backward_ops_()` is initiated by `append_backward()`, in which parameters `block` and `target_block` are all assigned with root block(the block with index 0). - -### Corner Cases of `grad_op` Creating - -In the previous section, we show the regular process of `grad_op` creating. However, in some corner cases, the conventional algorithm is not enough to get the correct result and appending handling is required. These additional processes run after the algorithm mentioned above and do some special adjusts on its output `grad_op`s. - -#### Shared Variables - -If a variable is read by more than one `op` in the forward pass, its gradient is likely to be written by more than one `grad_op`s in the next backward pass. To make the gradient result being the sum of all `grad_op`s' outputs instead of the last running one, we assign each output with a temporary variable and then add a `sum_op` to add them up. - -For the debug convenience, if the final gradient name is `w@GRAD`, it's corresponding temporary variables will be named as `w@GRAD@RENAME@0`, `w@GRAD@RENAME@1`... - -See function `_addup_repetitive_outputs_` in `backward.py` for implementation details. - -#### No Gradient Variables - -In our framework, variables can be marked as *no_gradient*, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some `grad_op` are marked as *no_gradient*, the `grad_op` itself can be skipped in backward pass. - -Another situation is all the gradient inputs of some `grad_op` are marked as *no_gradient*, which means all of them can be considered as zeros. For `grad_op`s are in essence the propagation of gradients, all the outputs are definitely zeros when all gradient inputs are zeros. Therefore the `grad_op` can also be skipped. - -It should be noted that all these zero gradients still need to be creating and initialized by something, otherwise following `grad_op`s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ `fill_zeros_like_op` to initialize them as all zeros. - -This features are implemented in function `_remove_no_grad_branch_`. It checks new created `grad_op`s one-by-one, removes who can be skipped and inserts `fill_zeros_like_op` when its necessary. We can get the `no_grad_set` from the `_append_backward_ops_` argument `no_grad_dict` or generate it on the fly by scanning all variables' `no_gradient` attribute(True or False). - -### Creating Backward Variables - -Up to now, we have completed all creating and adjusting jobs of `grad_op`s. However, backward variables have not been created. Now they are only represented by `grad_op`'s input and output arguments. The backward variable creating job will be done by: - -```python -def _append_backward_vars_(block, - start_op_idx, - grad_to_var, - grad_info_map): - """ - Create new variables required by backward pass. - - Args: - block(Block): the block where new variables will be created - start_op_idx(int): Only variables required by ops in block.ops[start_op_idx : ] will be created - grad_to_var(dict): - key(str): grad variable name - val(str): corresponding forward variable name - In most cases, this dict is generated by _append_backward_ops_() - grad_info_map(dict)(output argument): - key(str): forward variable name - val(tuple): a tuple of (str, int), str is the corresponding grad name, int is the block index - """ -``` - -Given a `block`, this function traverses all the `grad_op`s in it(The argument `start_op_idx` indicates where the grad_op sequence starts.) and creates all the uncreated outputs. The *pseudo-code* shows this process: - -``` -for op in block.ops[start_op_idx : ]: - - if op has an attribute named 'sub_block': - Get the sub-block(`s_block`) from op's attribute. - Invoke _append_backward_vars_(), with `block=s_block` - - for var_name in op.all_output_names(): - if block.has_var_recursive(var_name) or var_name is the name of empty variable: - continue - create a new variable named 'var_name' in block - if grad_to_var.has_key(var_name): - set grad_info_map[grad_to_var[var_name]] as a tuple of (var_name. block) - - do op's var type inference - do op's shape inference -``` diff --git a/develop/doc_cn/_sources/design/block.md.txt b/develop/doc_cn/_sources/design/block.md.txt deleted file mode 100644 index 907a2def557fd472ac4d679c73447bd9107d1190..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/block.md.txt +++ /dev/null @@ -1,336 +0,0 @@ -# Design Doc: Block and Scope - -## The Representation of Computation - -Both deep learning systems and programming languages help users describe computation procedures. These systems use various representations of computation: - -- Caffe, Torch, and Paddle: sequences of layers. -- TensorFlow, Caffe2, Mxnet: graph of operators. -- PaddlePaddle: nested blocks, like C++ and Java programs. - -## Block in Programming Languages and Deep Learning - -In programming languages, a block is a pair of curly braces that includes local variables definitions and a sequence of instructions or operators. - -Blocks work with control flow structures like `if`, `else`, and `for`, which have equivalents in deep learning: - -| programming languages | PaddlePaddle | -|-----------------------|-----------------------| -| for, while loop | RNN, WhileOp | -| if, if-else, switch | IfElseOp, SwitchOp | -| sequential execution | a sequence of layers | - -A key difference is that a C++ program describes a one pass computation, whereas a deep learning program describes both the forward and backward passes. - -## Stack Frames and the Scope Hierarchy - -The existence of the backward pass makes the execution of a block of PaddlePaddle different from traditional programs: - -| programming languages | PaddlePaddle | -|-----------------------|---------------------------------| -| stack | scope hierarchy | -| stack frame | scope | -| push at entering block| push at entering block | -| pop at leaving block | destroy when minibatch completes| - -1. In traditional programs: - - - When the execution enters the left curly brace of a block, the runtime pushes a frame into the stack, where it realizes local variables. - - After the execution leaves the right curly brace, the runtime pops the frame. - - The maximum number of frames in the stack is the maximum depth of nested blocks. - -1. In PaddlePaddle - - - When the execution enters a block, PaddlePaddle adds a new scope, where it realizes variables. - - PaddlePaddle doesn't pop a scope after the execution of the block because variables therein are used by the backward pass. So it has a stack forest known as a *scope hierarchy*. - - The height of the highest tree is the maximum depth of nested blocks. - - After the processing of a minibatch, PaddlePaddle destroys the scope hierarchy. - -## Use Blocks in C++ and PaddlePaddle Programs - -Let us consolidate the discussion by presenting some examples. - -### Blocks with `if-else` and `IfElseOp` - -The following C++ programs shows how blocks are used with the `if-else` structure: - -```c++ -namespace pd = paddle; - -int x = 10; -int y = 1; -int z = 10; -bool cond = false; -int o1, o2; -if (cond) { - int z = x + y; - o1 = z; - o2 = pd::layer::softmax(z); -} else { - int d = pd::layer::fc(z); - o1 = d; - o2 = d+1; -} - -``` - -An equivalent PaddlePaddle program from the design doc of the [IfElseOp operator](./if_else_op.md) is as follows: - -```python -import paddle as pd - -x = minibatch([10, 20, 30]) # shape=[None, 1] -y = var(1) # shape=[1], value=1 -z = minibatch([10, 20, 30]) # shape=[None, 1] -cond = larger_than(x, 15) # [false, true, true] - -ie = pd.ifelse() -with ie.true_block(): - d = pd.layer.add_scalar(x, y) - ie.output(d, pd.layer.softmax(d)) -with ie.false_block(): - d = pd.layer.fc(z) - ie.output(d, d+1) -o1, o2 = ie(cond) -``` - -In both examples, the left branch computes `x+y` and `softmax(x+y)`, the right branch computes `fc(x)` and `x+1` . - -The difference is that variables in the C++ program contain scalar values, whereas those in the PaddlePaddle programs are mini-batches of instances. - - -### Blocks with `for` and `RNNOp` - -The following RNN model in PaddlePaddle from the [RNN design doc](./rnn.md) : - -```python -x = sequence([10, 20, 30]) # shape=[None, 1] -m = var(0) # shape=[1] -W = var(0.314, param=true) # shape=[1] -U = var(0.375, param=true) # shape=[1] - -rnn = pd.rnn() -with rnn.step(): - h = rnn.memory(init = m) - h_prev = rnn.previous_memory(h) - a = layer.fc(W, x) - b = layer.fc(U, h_prev) - s = pd.add(a, b) - act = pd.sigmoid(s) - rnn.update_memory(h, act) - rnn.output(a, b) -o1, o2 = rnn() -``` -has its equivalent C++ program as follows - -```c++ -int* x = {10, 20, 30}; -int* m = {0}; -int* W = {0.314}; -int* U = {0.375}; - -int mem[sizeof(x) / sizeof(x[0]) + 1]; -int o1[sizeof(x) / sizeof(x[0]) + 1]; -int o2[sizeof(x) / sizeof(x[0]) + 1]; -for (int i = 1; i <= sizeof(x)/sizeof(x[0]); ++i) { - int x = x[i-1]; - if (i == 1) mem[0] = m; - int a = W * x; - int b = Y * mem[i-1]; - int s = fc_out + hidden_out; - int act = sigmoid(sum); - mem[i] = act; - o1[i] = act; - o2[i] = hidden_out; -} -``` - -## Compilation and Execution - -Like TensorFlow, a PaddlePaddle program is written in Python. The first part describes a neural network as a protobuf message, and the rest executes the message for training or inference. - -The generation of this protobuf message is similar to how a compiler generates a binary executable file. The execution of the message is similar to how the OS executes the binary file. - -## The "Binary Executable File Format" - -The definition of the protobuf message is as follows: - -```protobuf -message BlockDesc { - repeated VarDesc vars = 1; - repeated OpDesc ops = 2; -} -``` - -The step net in above RNN example would look like - -``` -BlockDesc { - vars = { - VarDesc {...} // x - VarDesc {...} // h - VarDesc {...} // fc_out - VarDesc {...} // hidden_out - VarDesc {...} // sum - VarDesc {...} // act - } - ops = { - OpDesc {...} // matmul - OpDesc {...} // add_two - OpDesc {...} // sigmoid - } -}; -``` - -Also, the RNN operator in above example is serialized into a protobuf message of type `OpDesc` and would look like: - -``` -OpDesc { - inputs = {0} // the index of x in vars of BlockDesc above - outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above - attrs { - "states" : {1} // the index of h - "step_net" : - } -}; -``` - -This `OpDesc` value is in the `ops` field of the `BlockDesc` value representing the global block. - - -## The Compilation of Blocks - -During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator). - -VarDesc in a block should have its name scope to avoid local variables affecting parent block's name scope. -Child block's name scopes should inherit the parent's so that OpDesc in child block can reference a VarDesc that is stored in the parent block. For example: - -```python -a = pd.Variable(shape=[20, 20]) -b = pd.fc(a, params=["fc.w", "fc.b"]) - -rnn = pd.create_rnn() -with rnn.stepnet(): - x = a.as_step_input() - # reuse fc's parameter - fc_without_b = pd.get_variable("fc.w") - rnn.output(fc_without_b) - -out = rnn() -``` -The method `pd.get_variable` can help retrieve a Variable by the name. The Variable may be stored in a parent block, but might be retrieved in a child block, so block should have a variable scope that supports inheritance. - -In compiler design, the symbol table is a data structure created and maintained by compilers to store information about the occurrence of various entities such as variable names, function names, classes, etc. - -To store the definition of variables and operators, we define a C++ class `SymbolTable`, like the one used in compilers. - -`SymbolTable` can do the following: - -- store the definitions (some names and attributes) of variables and operators, -- verify if a variable was declared, -- make it possible to implement type checking (offer Protobuf message pointers to `InferShape` handlers). - - -```c++ -// Information in SymbolTable is enough to trace the dependency graph. So maybe -// the Eval() interface takes a SymbolTable is enough. -class SymbolTable { - public: - SymbolTable(SymbolTable* parent) : parent_(parent) {} - - OpDesc* NewOp(const string& name=""); - - // TODO determine whether name is generated by python or C++. - // Currently assume that a unique name will be generated by C++ if the - // argument name is left default. - VarDesc* Var(const string& name=""); - - // find a VarDesc by name, if recursive is true, find parent's SymbolTable - // recursively. - // this interface is introduced to support InferShape, find protobuf messages - // of variables and operators, pass pointers into InferShape. - // - // NOTE maybe some C++ classes such as VarDescBuilder and OpDescBuilder should - // be proposed and embedded into pybind to enable python operation on C++ pointers. - VarDesc* FindVar(const string& name, bool recursive=true); - - OpDesc* FindOp(const string& name); - - BlockDesc Compile() const; - - private: - SymbolTable* parent_; - - map ops_; - map vars_; -}; -``` - -After all the description of variables and operators is added into SymbolTable, -the block has enough information to run. - -The `Block` class takes a `BlockDesc` as input, and provides `Run` and `InferShape` functions. - - -```c++ -namespace { - -class Block : OperatorBase { -public: - Block(const BlockDesc& desc) desc_(desc) {} - - void InferShape(const framework::Scope& scope) const override { - if (!symbols_ready_) { - CreateVariables(scope); - CreateOperators(); - } - // should run InferShape first. - for (auto& op : runtime_table_.ops()) { - op->InferShape(scope); - } - } - - void Run(const framework::Scope& scope, - const platform::Place& place) const override { - PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first."); - for (auto& op : runtime_table_.ops()) { - op->Run(scope, place); - } - } - - void CreateVariables(const framework::Scope& scope); - void CreateOperators(); - - // some other necessary interfaces of NetOp are listed below - // ... - -private: - BlockDesc desc_; - bool symbols_ready_{false}; -}; -``` - -## The Execution of Blocks - -Block inherits from OperatorBase, which has a Run method. -Block's Run method will run its operators sequentially. - -There is another important interface called `Eval`, which takes some arguments called targets and generates a minimal graph which treats targets as the end points and creates a new Block. After `Run`, `Eval` will get the latest value and return the targets. - -The definition of Eval is as follows: - -```c++ -// clean a block description by targets using the corresponding dependency graph. -// return a new BlockDesc with minimal number of operators. -// NOTE: The return type is not a Block but the block's description so that this can be distributed -// to a cluster. -BlockDesc Prune(const BlockDesc& desc, vector targets); - -void Block::Eval(const vector& targets, - const framework::Scope& scope, - const platform::DeviceContext& dev_ctx) { - BlockDesc min_desc = Prune(desc_, targets); - Block min_block(min_desc); - min_block.Run(scope, dev_ctx); -} -``` diff --git a/develop/doc_cn/_sources/design/build_system/README.md.txt b/develop/doc_cn/_sources/design/build_system/README.md.txt deleted file mode 100644 index bf0e4dddc1b640ecbce489f65820aaf8a4b3b1e7..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/build_system/README.md.txt +++ /dev/null @@ -1,152 +0,0 @@ -A few months ago when we were trying to replace CMake with Bazel, @emailweixu suggested that we rewrite those handy Bazel functions using CMake. Now it seems that it's the right time to get this done, as we are facing problems from the porting of Majel and the development of new the parameter server using Go and C++. - -Here are some initial thoughts. Your comments are welcome! - -### Required CMake Function - -I think we need only the following few CMake functions to make a project description mean and clean: - -| C++ | CUDA C++ | Go | -|---|---|---| -| cc_library | nv_library | go_library | -| cc_binary | nv_binary | go_binary | -| cc_test | nv_test | go_test | - -- The `_library` functions generate .a files from source code. -- The `_binary` functions generate executable binary files. -- The `_test` functions generate executable unit test files. They work like `_binary` but links `-lgtest` and `-lgtest_main`. - -The difference between `nv_` functions and `cc_` functions is that the former use `nvcc` instead of the system-default C++ compiler. - -Both `nv_` and `cc_` functions enables C++11 (-std=c++11). - -Also, - -- to describe external dependencies, we need `external_library`. -- to build shared libraries, we need `shared_library`. - -### An Example Project - -Suppose that we have aforementioned functions defined in our `/cmake` directory. The following example `CMakeLists.txt` describes a project including the following source files: - -- tensor.h -- tensor.cc -- tensor_test.cc -- ops.h -- ops.cu -- ops_test.cu -- api.go -- api_test.go - -Suppose that ops.cu depends on CUDNN. - -```cmake -# cc_binary parses tensor.cc and figures out that target also depend -# on tensor.h. -cc_binary(tensor - SRCS - tensor.cc) - -# The dependency to target tensor implies that if any of -# tensor{.h,.cc,_test.cc} is changed, tensor_test need to be re-built. -cc_test(tensor_test - SRCS - tensor_test.cc - DEPS - tensor) - -# I don't have a clear idea what parameters external_library need to -# have. @gangliao as a CMake expert would have better ideas. -external_library(cudnn - ....) - -# Suppose that ops.cu depends on external target CUDNN. Also, ops.cu -# include global functions that take Tensor as their parameters, so -# ops depend on tensor. This implies that if any of tensor.{h.cc}, -# ops.{h,cu} is changed, ops need to be re-built. -nv_library(ops - SRCS - ops.cu - DEPS - tensor - cudnn) # cudnn is defined later. - -nv_test(ops_test - SRCS - ops_test.cu - DEPS - ops) - -# Because api.go defines a GO wrapper to ops and tensor, it depends on -# both. This implies that if any of tensor.{h,cc}, ops.{h,cu}, or -# api.go is changed, api need to be re-built. -go_library(api - SRCS - api.go - DEPS - tensor # Because ops depend on tensor, this line is optional. - ops) - -go_test(api_test - SRCS - api_test.go - DEPS - api) - - -# This builds libapi.so. shared_library might use CMake target -# api_shared so to distinguish it from above target api. -shared_library(api - DEPS - api) - -``` - -### Implementation - -As above example CMakeLists.txt executes, each function invocation adds "nodes" to a dependency graph. It also use this graph to generate CMake commands including `add_executable`, `add_dependencies`, `target_link_libraries`, and `add_test`. - -### Using Package Manager For Go - -Building Go binaries and libraries need to satisfy their dependencies, generally -we can do `go get ./...` to download and compile all external dependencies. The -problems are: - -1. `go get` will always get the latest code from the default branch of the - remote repo, so changes of dependents might break the build. This is very - different with what we already have in `cmake/external` which download a - specific version or commit id of the dependency. -1. Some locations can not access external dependencies through the internet, as mentioned - in https://github.com/PaddlePaddle/Paddle/issues/2605. Using package management - tools can package the dependencies as a "vendor" package, which can be mirrored - at many cloud file hosting, so users what to compile paddle by themselves can - download this "vendor" package from a mirror site. - -#### Choose A Suitable Tool - -As mentioned by @wangkuiyi, [Here](https://github.com/golang/go/wiki/PackageManagementTools) -list dozens of Go package managers. We choose the tool using following principles: - -- Most "active" projects with more stars, more pull requests or commits -- Widely used project - -After comparing all these projects, we shall choose between the most popular -tools: Godep and Glide. - -Here's a brief comparison between Godep and Glide -: https://github.com/Masterminds/glide/wiki/Go-Package-Manager-Comparison. There are -also many complaints about using `Godep`. There's also a new "official" pakcage -management tool has been started at: https://github.com/golang/dep to resolve -such problems, but it's currently at Alpha stage. So the best choice now is -glide obviously. - -#### Manage Go Packages - -- Dependencies: `go/glide.yaml` will store the dependencies and their versions which - is directly imported by paddle. `go/glide.lock` will store all dependencies recursively - with their commit id. Builds will "lock" to these packages if we don't `glide up` - them -- Vendor package: `go/vendor` directory will generated when running `cmake` command. `cmake` - will download the code corresponding to `go/glide.lock`. If we put a vendor folder - under `go/`, cmake will just check the commit id to the packages under the folder, - if commit id matches, there will be no download at all. diff --git a/develop/doc_cn/_sources/design/cluster_train/README.md.txt b/develop/doc_cn/_sources/design/cluster_train/README.md.txt deleted file mode 100644 index 177a5f5d54bd924fab34795219ce1f7b270c8e25..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/README.md.txt +++ /dev/null @@ -1,182 +0,0 @@ -# Design Doc: Distributed Training - -## Objective - -In [this slides](https://www.slideshare.net/cxwangyi/paddlepaddle-a-complete-solution-for-businesses), we explained that we'd like PaddlePaddle running on general-purpose clusters like those managed by Kubernetes, so to address demands for AI from both Internet and non-Internet industries. - -This poses technical challenges to PaddlePaddle: - -1. Support fault-recovery. -1. Support both offline and online training. -1. [Serverless computing](https://en.wikipedia.org/wiki/Serverless_computing) of distributed training. - - -## Training Job - -A training job will be created once user asks Paddle cloud to train a model. The training job is made up of different processes that collaboratively consume data and produce a trained model. There are three kinds of processes: - -1. the *master server process*, which dispatches tasks to -1. one or more *trainer processes*, which run distributed training and synchronize gradients/models via -1. one or more *parameter server processes*, where each holds a shard of the global model, and receive the uploaded gradients from every *trainer process*, so they can run the optimize functions to update their parameters. - -Their relation is illustrated in the following graph: - - - -By coordinating these processes, PaddlePaddle supports use both Synchronize Stochastic Gradient Descent (sync SGD) and Asynchronous Stochastic Gradient Descent (async SGD) to train user-defined neural network topologies. - -When training with sync SGD, parameter servers wait for all trainers to finish gradients update and then send the updated parameters to trainers, training can not proceed until the trainer received the updated parameters. This creates a synchronization point between trainers. When training with async SGD, each trainer upload gradient and download new parameters individually, without the synchronization with other trainers. Using asyc SGD will be faster in terms of time per pass, but have more noise in gradient since trainers are likely to have a stale model. - -### Master Server Process - -The master server process will: - -- Partition a dataset into [tasks](#task) and dispatch tasks to trainers. -- Keep track of training progress on the dataset with [task queue](#task-queue). A training job will iterate on the dataset for a full pass until it goes into next pass. - - -#### Task - -A task is a data shard to be trained. The total number of tasks will be much bigger than the total number of trainers. The number of data instances inside a task will be much bigger than the mini-batch size. - -#### Task Queue - -The master server has three task queues to track training progress. As illustrated in the graph below, Job A and Job B both have one master server. Each master server process has three task queues. - - - -- The todo queue holds tasks to be dispatched. When a job starts, the master server fills in the todo queue with all tasks. -- The pending queue holds tasks that are currently training by trainers. -- the done queue holds tasks that are already trained. - -The life cycle of a single task is illustrated below: - - - -1. When a new pass of training starts, all tasks will be placed in the todo queue. -1. Upon trainer requests for new task, the master server will dispatch a task from todo queue to it, put the task in the pending queue and wait for completion. -1. The trainer will work on its task and tell the master server once the task is completed and ask for new task. The master server will dispatch a new task to that trainer. -1. If a task fails for any reason in trainer, or takes longer than a specific period of time, the master server will move the task back to the todo queue. The timeout count for that task will increase by one. If the timeout count is above a threshold, the task is likely to cause a trainer to crash, then it will be discarded. -1. The master server will move completed task to the done queue. When the todo queue is empty, the master server will start a new pass by moving all tasks in the done queue to todo queue and reset the timeout counter of all tasks to zero. - -### Trainer Process - -The trainer process will: - -- Request tasks from the master. -- Work on the tasks -- Upload gradient to parameter servers, and update local model by downloading new parameters from parameter servers. - -### Parameter Server Process - -Parameter server processes hold the parameters collaboratively. The parameters are partitioned on different parameter servers. - -The parameter server will: - -- Receive gradient from the trainers, update its parameters, and give the trainers the latest parameters. -- Periodically save its parameters to distributed file system by overriding the previous save. - -### Optimization Algorithms - -The communication pattern between the trainers and the parameter servers depends on the category of optimization algorithm: - -- Synchronous Stochastic Gradient Descent (sync-SGD) - - Parameter server will wait for all trainer finish n-th mini-batch calculation and send their gradients before broadcasting new parameters to every trainer. Every trainer will wait for the new parameters before starting n+1-th mini-batch. - -- Asynchronous Stochastic Gradient Descent (async-SGD) - - There will no synchronization between different trainers, and parameter server updates its parameter as soon as it receives new gradient: - - - Each trainer uploads its accumulated gradient every n mini-batches. - - Every m mini-batches, the trainer downloads new parameters from parameter server. - - n and m do not have to be equal. - -## Fault Tolerant - -The training job will pause if the master server processes is dead, or any of the parameter server process is dead. They will be started by [Kubernetes](https://kubernetes.io/) and recover in few minutes. Please refer to [fault recovery](#fault-recovery). - -The training job will continue to make progress if there is at least one training process running. The strategy depends on the type of optimization algorithm: - -- sync-SGD - - TODO - -- async-SGD - - Since async-SGD does not require synchronization between mini-batches, the system will by definition make process if at least one trainer is running. - -## Fault Recovery - -PaddlePaddle uses [etcd](https://github.com/coreos/etcd) to keep track of the states of processes. Because etcd is a distributed reliable key-value store, the restarted process can recover its states from etcd. The model parameters are periodically saved into distributed file system, so a restarted parameter server can recover its parameters from the saved file. - -Now we will introduce how each process recovers from a failure, the graph below shows how etcd is used: - - - -### Master Server Process - -When the master is started by the Kubernetes, it executes the following steps at startup: - -1. Grabs a unique *master* lock in etcd, which prevents concurrent master instantiations. -1. Recovers the task queues from etcd if they already exist, otherwise, the master will create them. -1. Write its ip address to */master/addr* so that trainers can discover it. -1. Listens to trainers' request of task, dispatch one upon request, and updates task queue using an etcd transaction to ensure lock is held during the update. - -When the master server process is dead for any reason, Kubernetes will restart it. It will be online again with all states recovered from etcd in few minutes. - -### Trainer Process - -When the trainer is started by the Kubernetes, it executes the following steps at startup: - -1. Watches the available parameter server prefix keys `/ps/` on etcd and waits until the count of parameter servers reaches the desired count */ps_desired*. -1. Finds and watches */master/addr* to get master's address. -1. Requests for tasks from the master to start training. - -When a trainer fails, Kuberentes would try to restart it. The recovered trainer would fetch tasks from master and go on training. - -### Parameter Server Process - -When the parameter server is started by Kubernetes, it executes the following steps at startup: - -1. Read desired total number of parameter servers from etcd `/ps_desired` -1. Search through etcd keys `/ps/` (`/ps/0`, `/ps/1`, ...) to find the first non-existant key whose index is smaller than the total number of parameter servers. Set the key using a transaction to avoid concurrent writes. The parameter server's index is inferred from the key name. - - The desired number of parameter servers is 3: - - - - The third parameter server joined: - - - -1. The parameter server can load parameters if there are already saved parameters in the save path (inferred from its index). -1. Now the parameter server is ready for the trainers' requests. - -If the parameter server's etcd lease expires, the parameter server will kill itself. - - -## Parameter Server Checkpointing -See [here](./checkpointing.md) - -## Store and dispatching trainning data -See [here](./data_dispatch.md) - - -## Dynamic Scaling - -### Trainer Scaling - -TODO - -### Parameter Server Scaling - -Not planned for v1. - -## Training Dataset Format - -TODO - -## User Interface - -TODO diff --git a/develop/doc_cn/_sources/design/cluster_train/checkpointing.md.txt b/develop/doc_cn/_sources/design/cluster_train/checkpointing.md.txt deleted file mode 100644 index c87ef2c7d2636208866d05456d5d44316d0bb200..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/checkpointing.md.txt +++ /dev/null @@ -1,44 +0,0 @@ -## 模型参数检查点(Checkpointing) -模型数据检查点的实现,可以有效的避免parameter server的单点或多点同时故障。模型参数检查点通过定期向磁盘上保存一份存储在parameter server内存中的模型数据的完整镜像,来保证训练过程可以从中间状态重新启动。在一个不可中断并缺少备份的训练任务中,可以通过阶段性的保存每个parameter server的数据快照(snapshot)到 ***分布式存储服务*** 达到容灾的目的,比如每隔10分钟最新的快照,并删除更早的快照。在出现单点故障时,只需要恢复这台节点,或者将这台节点迁移到另一个节点并启动即可恢复训练任务。 - - - -### 快照保存的设计如下: - -说明: - -* parameter server在集群中启动后,自动挂载分布式存储目录,并把快照保存到这个目录下。 -* ***注:每个parameter server的检查点各自独立保存,暂时不考虑多个parameter server同步的保存一个特定时间点的全局检查点,因为这样做也没法保证消除随机性。*** - -检查点保存程序流程: - -1. 如果满足条件"每隔10分钟"时,parameter server会获取parameters内存的`read_lock`,启动一个新的线程开始保存检查点。如果已经正在执行保存检查点的线程,则忽略。由于对parameters的更新需要获取parameters内存的`write_lock`,所以在写入快照的过程中,parameter server会暂停参数更新并等待。 -2. parameter server生成一个UUID,向指定的目录中一个新的文件(文件名为此UUID)写入快照数据。在快照写入完成后,计算这个文件的MD5 sum。然后在etcd的`/checkpoints/[pserver_id]`中写入json内容:`{"uuid": [UUID], "md5", "MD5 sum", "timestamp": xxxx}`。 -3. 删除磁盘目录中不是当前uuid的快照文件。 -4. 释放对paramters内存的锁定,停止保存检查点的线程。 - -这里需要用户额外注意,在您的实际环境中,训练任务的运行可能会占满trainer和parameter server之间的网络带宽,如果parameter server此时还需要通过网络访问分布式存储以保存快照,可能会造成网络拥塞,而出现阶段性的运行停滞。 - -### 从快照恢复 - -在parameter server第一次启动或任意时间parameter server故障后被Kubernetes重新启动,则需要回滚到上一个检查点: - - 1. 从etcd中读取节点:`/checkpoints/[pserver_id]`获取最新的检查点的文件uuid - 1. 从磁盘文件中加载uuid文件名的检查点快照文件,并加载其中的参数 - 1. 如果上面两步出现错误,则使用启动参数定义的初始化方法初始化参数 - 1. 开始提供服务 - -## TODO List -### 推测执行/加速执行(TODO) -在异构集群中,如果存在某些trainer执行速度过慢会影响整体集群的速度(如图中Trainer 1),此时master将负责启动一个新的Trainer(Accelerate Trainer 2),使用同样的训练数据block。哪个trainer先完成block的训练,则把另一个慢速的kill掉。 - -### 动态扩容/缩容 -目前只考虑动态扩容trainer数量,可以减小系统复杂性。 - -## 术语 -* model: 指深度学习训练之后得到的所有参数,使用这个神经网络可以完成对新数据的预测 -* parameters: 神经网络中的参数,包括权重w和偏置b。一个神经网络的模型由大量的参数组成 -* shard: 分片,通常指将一个整体拆分成多份的其中的一份。 -* model shard: 将一个神经网络参数拆分成多份,每个shard分别存储在其中一台parameter server之上 -* parameter block: 多个parameter block构成一个model shard -* 单点故障: 任意时刻只可能同时有一台服务器故障。由于集群中同时存在两台机器故障的概率极低((平均故障率*平均故障修复时间)^2)只对特殊在线系统考虑两台以上同时故障的容灾。 diff --git a/develop/doc_cn/_sources/design/cluster_train/data_dispatch.md.txt b/develop/doc_cn/_sources/design/cluster_train/data_dispatch.md.txt deleted file mode 100644 index 1f5d22ff5e6abcb576d16cbe7391da1967a1ab8e..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/data_dispatch.md.txt +++ /dev/null @@ -1,160 +0,0 @@ -## 训练数据的存储和分发 - -### 概念解释 - -### 流程介绍 -生产环境中的训练数据集通常体积很大,并被存储在诸如Hadoop HDFS,Ceph,AWS S3之类的分布式存储之上。这些分布式存储服务通常会把数据切割成多个分片分布式的存储在多个节点之上。这样就可以在云端执行多种数据类计算任务,包括: - -* 数据预处理任务 -* Paddle训练任务 -* 在线模型预测服务 -
                                - -
                                - -在上图中显示了在一个实际生产环境中的应用(人脸识别)的数据流图。生产环境的日志数据会通过实时流的方式(Kafka)和离线数据的方式(HDFS)存储,并在集群中运行多个分布式数据处理任务,比如流式数据处理(online data process),离线批处理(offline data process)完成数据的预处理,提供给paddle作为训练数据。用户也可以上传labeled data到分布式存储补充训练数据。在paddle之上运行的深度学习训练输出的模型会提供给在线人脸识别的应用使用。 - -### 训练数据存储 -我们选择[CephFS](http://docs.ceph.com/docs/master/cephfs/)作为存储系统。 - -- 无论是从[PFSClient](../file_manager/README.md)的角度,还是从[Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/)中运行任务的角度,统一用`/pfs/$DATACENTER/home/$USER`来访问用户自己的数据。 -- `/pfs/$DATACENTER/common`下存放公共数据集合 - - 做只读挂载 - -
                                - -
                                - -### 文件预处理 - - -在开始训练之前, 数据集需要预先被转换成PaddlePaddle分布式训练使用的存储格[RecordIO](https://github.com/PaddlePaddle/Paddle/issues/1947)。我们提供两个转换方式: - -1. 用户在本地转换好再上传 -1. 用户上传数据后,在机群上运行转换程序 - -转换生成的文件名会是以下格式: - -```text -name_prefix-aaaaa-of-bbbbb -``` - -"aaaaa"和"bbbbb"都是五位的数字,每一个文件是数据集的一个shard,"aaaaa"代表shard的index,"bbbbb"代表这个shard的最大index。 - -比如ImageNet这个数据集可能被分成1000个shard,它们的文件名是: -```text -imagenet-00000-of-00999 -imagenet-00001-of-00999 -... -imagenet-00999-of-00999 -``` - -#### 转换库 - -无论是在本地或是云端转换,我们都提供Python的转换库,接口是: -```python -def convert(output_path, reader, num_shards, name_prefix) -``` - -- `output_path`: directory in which output files will be saved. -- `reader`: a [data reader](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md#data-reader-interface), from which the convert program will read data instances. -- `num_shards`: the number of shards that the dataset will be partitioned into. -- `name_prefix`: the name prefix of generated files. - -`reader`每次输出一个data instance,这个instance可以是单个值,或者用tuple表示的多个值: - -```python -yield 1 # 单个值 -yield numpy.random.uniform(-1, 1, size=28*28) # 单个值 -yield numpy.random.uniform(-1, 1, size=28*28), 0 # 多个值 -``` - -每个值的类型可以是整形、浮点型数据、字符串,或者由它们组成的list,以及numpy.ndarray。如果是其它类型,会被Pickle序列化成字符串。 - -### 示例程序 - -#### 使用转换库 - -以下`reader_creator`生成的`reader`每次输出一个data instance,每个data instance包涵两个值:numpy.ndarray类型的值和整型的值: -```python -def reader_creator(): - def reader(): - for i in range(1000): - yield numpy.random.uniform(-1, 1, size=28*28), 0 # 多个值 - return reader -``` - -把`reader_creator`生成的`reader`传入`convert`函数即可完成转换: -```python -convert("./", reader_creator(), 100, random_images) -``` - -以上命令会在当前目录下生成100个文件: -```text -random_images-00000-of-00099 -random_images-00001-of-00099 -... -random_images-00099-of-00099 -``` - -#### 进行训练 - - -PaddlePaddle提供专用的[data reader creator](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md#python-data-reader-design-doc),生成给定`RecordIO`文件对应的data reader。**无论在本地还是在云端,reader的使用方式都是一致的**: - -```python -# ... -reader = paddle.reader.creator.RecordIO("/pfs/datacenter_name/home/user_name/random_images-*-of-*") -batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128) -trainer.train(batch_reader, ...) -``` - -以上代码的reader输出的data instance与生成数据集时,reader输出的data instance是一模一样的。 - -### 上传训练文件 - -使用下面命令,可以把本地的数据上传到存储集群中。 - -```bash -paddle pfs cp filename /pfs/$DATACENTER/home/$USER/folder/ -``` - -比如,把之前示例中转换完毕的random_images数据集上传到云端的`/home/`可以用以下指令: - -```bash -paddle pfs cp random_images-*-of-* /pfs/$DATACENTER/home/$USER/folder/ -``` - -需要`$DATACENTER`的配置写到配置文件中,例如 - -``` -# config file -[datacenter_1] -username=user -usercert=user.pem -userkey=user-key.pem -endpoint=datacenter1.paddlepaddle.org - -[datacenter_2] -username=user -usercert=user.pem -userkey=user-key.pem -endpoint=datacenter2.paddlepaddle.org -``` -## TODO -### 文件访问的权限 -控制用户权限 - -- 用户可以把自己的数据分享给别人 - -### 文件访问方式 -不用mount的方式来访问数据,而是直接用API的接口远程访问 - -例如: - -``` -f = open('/pfs/datacenter_name/home/user_name/test1.dat') -``` - - -### 支持用户自定义的数据预处理job diff --git a/develop/doc_cn/_sources/design/cluster_train/large_model_dist_train.md.txt b/develop/doc_cn/_sources/design/cluster_train/large_model_dist_train.md.txt deleted file mode 100644 index 0c4b5bc24c854b7062d509249bea9c50d42bd5f1..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/large_model_dist_train.md.txt +++ /dev/null @@ -1,101 +0,0 @@ -# Alalysis of large model distributed training in Paddle - -***NOTE: This is only some note for how we implemeted this scheme in V1, not a new design.*** - -## What is it - -We often encounter cases that the embedding layer parameters(sparse) are so large that we can not store it in the trainer's memory when training. So we need to put them to several servers, and fetch them row by row instead of fetch all of the parameters. - -## How to use - -Specify command-line argument like `--loadsave_parameters_in_pserver=true --ports_num_for_sparse=1 --use_old_updater=1` when starting the paddle trainer. And also add something like `--ports_num_for_sparse=1 --pserver_num_threads=5` when starting pserver processes. - -Accrodingly, configure your embedding layers like: - -```python -SPARSE_REMOTE=True - -w1 = data_layer(name="w1", size=dict_size) -emb1 = embedding_layer(input=w1, size=32, param_attr=ParameterAttribute(sparse_update=SPARSE_REMOTE)) -w2 = data_layer(name="w2", size=dict_size) -emb2 = embedding_layer(input=w2, size=32, param_attr=ParameterAttribute(sparse_update=SPARSE_REMOTE)) -... -``` - -## Implementation details - -```c++ -enum MatType { - MAT_NORMAL, - MAT_NORMAL_SHARED, - MAT_VALUE_SHARED, - MAT_SPARSE_ROW_IDS, - MAT_SPARSE_ROW_AUTO_GROW, - MAT_CACHE_ROW, - MAT_SPARSE_ROW, - MAT_SPARSE_ROW_PREFETCH, - MAT_SPARSE_ROW_PREFETCH_FULL_SIZE, -}; -``` - -`MAT_SPARSE_ROW_PREFETCH` is what we use when configured to fetch only row of matrix when training. - -In `trainer_internal.cpp:L93 trainOneBatch`: - -```c++ - if (config_->getOptConfig().use_sparse_remote_updater()) { - REGISTER_TIMER("prefetch"); - gradientMachine_->prefetch(inArgs); - parameterUpdater_->getParametersRemote(); - } -``` - -When doing actual network forward and backward, at the beginning of each batch, the trainer will try to download one row of data from pserver. - -In `trainer/RemoteParameterUpdater.cpp`: `parameterUpdater_->getParametersRemote();`: - -```c++ -if (fullSize) { - ... -} else { -getParams = [&] { - parameterClient_->getParameterSparse( - /* recvParameterType= */ PARAMETER_VALUE, sendBackParameterType); -}; -applyL1 = [](Parameter& para, real decayRate) { - para.getMat(PARAMETER_VALUE)->applyL1(/*lr=*/1.0f, decayRate); -}; -} -``` - -Calling `parameterClient_->getParameterSparse` will do remote call to pserver's `getParameterSparse`: - -```c++ -void ParameterServer2::getParameterSparse(const SendParameterRequest& request, - std::vector& inputBuffers, - SendParameterResponse* response, - std::vector* outputBuffers) { - (void)inputBuffers; - auto& buffer = *readWriteBuffer_; - size_t numReals = 0; - for (const auto& block : request.blocks()) { - numReals += getParameterConfig(block).dims(1); - } - buffer.resize(numReals); - - VLOG(3) << "pserver: getParameterSparse, numReals=" << numReals; - - ReadLockGuard guard(parameterMutex_); - size_t offset = 0; - for (const auto& block : request.blocks()) { - size_t width = getParameterConfig(block).dims(1); - Buffer buf = {buffer.data() + offset, width}; - int type = request.send_back_parameter_type(); - sendBackParameterSparse(block, type, response, &buf, width, outputBuffers); - offset += width; - } -} -``` - -`getParameterConfig(block).dims(1)` returns the width of the current "parameter block"(a shard of parameter object), -then `getParameterSparse` remote call returns only one row of data to the client. diff --git a/develop/doc_cn/_sources/design/cluster_train/master_server.md.txt b/develop/doc_cn/_sources/design/cluster_train/master_server.md.txt deleted file mode 100644 index 4bf3c506f101361875043f8bfd97972b8c981a22..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/master_server.md.txt +++ /dev/null @@ -1,91 +0,0 @@ -# Design Doc: Master Server - -For an overview of master server's role, please refer to [distributed training design doc](./README.md). In this design doc we will discuss the master server in more details. The master will be implemented in [Go](https://golang.org/). - -## Dataset - - - -A dataset is a list of files in *RecordIO* format. A RecordIO file consists of chunks, whereas each chunk consists some records. - -## Task Queue - -As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress. - -### Task Queue Creation - -1. Each trainer will make an RPC call (using Go's [rpc](https://golang.org/pkg/net/rpc/) package) to the master server, telling it the RecordIO files representing the dataset specified by the user. Since every trainer will tell the master server the same dataset, only the first RPC call will be honored. - - The RPC interface is: - ```go - func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error { - } - ``` -1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file. - - The definition of the chunk is: - ```go - type Chunk struct { - Idx int // index of the chunk within the file - Path string - Index recordio.Index // chunk index - } - ``` -1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element. - - The definition of the task is: - ```go - type Task struct { - Index int - Chunks []Chunk - } - ``` - - The elements in the tasks queues is of type `TaskEntry`, containing a timeout counter (described in [task retry logic](#task-retry-logic)), and a task: - ```go - type TaskEntry struct { - NumTimeout int - Task Task - } - ``` - - The definition of task queues is: - ```go - type TaskQueues struct { - Todo []TaskEntry - Pending map[int]TaskEntry // map from task index to task entry - Done []TaskEntry - } - ``` - -### Task Queue Persistence - -The task queues need to be persisted on [etcd](https://github.com/coreos/etcd) for fault recovery. Since the task queues only change once a task is completed or timed out, which is not very frequent, we can afford to synchronize with etcd every time the task queues change. - -We will serialize the task queues data structure with [gob encoding](https://golang.org/pkg/encoding/gob/), compress with gzip, and save into etcd synchronously under key `/task_queues`. - -### Task Dispatch - -The trainer will make an RPC call to master to get a new task when: - -- the trainer first started, or -- the trainer finishes a task. - -The RPC interface is: -```go -func (m *RPCServer) GetTask(finished *Task, result *Task) error { -} -``` -Argument `finished` will be `nil` when the trainer is just started. - -During the RPC call the master will do the following: - -- Make a copy of the task queues, and update the copy reflecting the finished tasks and the new pending tasks. -- Synchronize the copy of task queues with etcd using a transaction conditioned on holding the master lock. -- Replace the task queues with the copy and report to the trainer with the new tasks if succeeded, or discard the copy and report the error to the trainer if failed. - -### Task Retry Logic - -When a task is dispatched to the trainer, the master will schedule a function for execution after the timeout duration (based on the moving average of task completion time). If the task entry in still in the pending queue, its timeout counter will increase by one, and the task will be moved to todo queue. If the timeout counter is above the threshold, the master will log the error and discard the task. - -Please note that since a timed out task could be completed after it has been dispatched for retry, so it is possible for a task to be processed multiple times. We do not try to prevent it from happening since it's fine to train on the same task multiple times due to the stochastic nature of the stochastic gradient decent algorithm. diff --git a/develop/doc_cn/_sources/design/cluster_train/pserver_client.md.txt b/develop/doc_cn/_sources/design/cluster_train/pserver_client.md.txt deleted file mode 100644 index 474b8c572cd92fc87e9f7f3f2b19d12cccd158de..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/pserver_client.md.txt +++ /dev/null @@ -1,171 +0,0 @@ -# Design Doc: The Client Library of Parameter Server - -For an overview of trainer's role, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter server's client library, which will manage communication with parameter servers. The library will be implemented in [Go](https://golang.org/) and made available as a static or dynamic library with a C header file. - -## Parameter Partition - -Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The *sparse parameter* require a little different treatment: - -### Sparse Parameter - -The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector. - -Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention: - -If a sparse parameter is partitioned into n shards, they should be named as: - -```text -name:sparse-0 -name:sparse-1 -... -name:sparse-n-1 -``` - -The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention. - -## Model Optimization Using Gradients - -There are two ways to perform model optimization using gradients: - -- On Client - - The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization). - -- On Parameter Server - - The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients. - -## L1 and L2 Regularization - -PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary. - -## Parameter Initialization - -The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers. - -### Trainer Selection - -To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below: - - - -### Trainer Selection Process - -The trainer select process is encapsulated in the C API function: -```c -int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto); -``` -The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below: - - - -## C Interface - -```c -typedef enum { - PADDLE_ELEMENT_TYPE_INT32 = 0, - PADDLE_ELEMENT_TYPE_UINT32 = 1, - PADDLE_ELEMENT_TYPE_INT64 = 2, - PADDLE_ELEMENT_TYPE_UINT64 = 3, - PADDLE_ELEMENT_TYPE_FLOAT32 = 4, - PADDLE_ELEMENT_TYPE_FLOAT64 = 5, -} paddle_element_type; - -typedef struct { - char* name; - paddle_element_type element_type; - unsigned char* content; - int content_len; -} paddle_parameter, paddle_gradient; - -typedef int paddle_pserver_client; - -/** - * @brief creates a pserver client that talks to etcd for coordination. - */ -paddle_pserver_client paddle_new_etcd_pserver_client(char* etcd_addr); - -/** - * @brief creates a pserver client given pserver addresses. - * - * @param pserver_addrs comma-separated pserver addresses. - * @param selected if current pserver client is selected to initialize all parameter servers. - */ -paddle_pserver_client paddle_new_pserver_client(char* pserver_addrs, int selected); -void paddle_pserver_client_release(paddle_pserver_client c); - -/** - * @brief paddle_begin_init_params begins to initialize parameters on - * parameter servers. - * - * paddle_begin_init_params will be called from multiple trainers, - * only one trainer will be selected to initialize the parameters on - * parameter servers. Other trainers need to get the initialized - * parameters from parameter servers using @paddle_get_params. - * - * @return 1 if the trainer is selected to initialize parameter - * servers, otherwise 0. - */ -int paddle_begin_init_params(paddle_pserver_client client); - -/** - * @brief paddle_init_param initializes the parameter on parameter - * servers. - * - * @param param the parameter to initialize. - * @param param_config_proto the configuration for the parameter. - * @param config_len the length of param_config_proto - * @return 0 if successful, otherwise -1. On failure, the trainer - * needs to restart the entire initialization process (starting from - * @paddle_begin_init_param). Or simply exit the program and wait for - * the cluster management system to restart the trainer. - */ -int paddle_init_param(paddle_pserver_client client, paddle_parameter param, const unsigned char* param_config_proto, int config_len); - -/** - * @brief paddle_finish_init_params tells parameter servers client has - * sent all parameters to parameter servers as initialization. - * - * @return 0 if successful, otherwise -1. On failure, the trainer - * needs to restart the entire initialization process (starting from - * @paddle_begin_init_param). Or simply exit the program and wait for - * the cluster management system to restart the trainer. - */ -int paddle_finish_init_params(paddle_pserver_client client); - -/** - * @brief paddle_send_grads sends gradients to parameter servers for - * updating parameters. - * - * @param grads the array of gradients to send. - * @param len the length of the gradient array. - * @param learning_rate the learning rate for the gradients. - * @return 0 if successful, otherwise -1. - */ -int paddle_send_grads(paddle_pserver_client client, const paddle_gradient* grads, int len); - -/** - * @brief paddle_get_params gets parameters from parameter servers. - * - * paddle_get_params will block until parameters are initialized on - * the parameter servers. - * - * @param dst the destination array of parameter pointers to save to. - * The parameter pointer must be pre-popullated with required parameter name, - * and the content of parameter must be pre-allocated of the size of required - * parameter on pserver. - * @param len the length of the names array and the paddle_parameter - * array. - * @return 0 if successful, otherwise -1. - */ -int paddle_get_params(paddle_pserver_client client, paddle_parameter** dst, int len); - -/** - * @brief paddle_save_model indicates parameters to save the parameter - * to the given path - * - * @param path the path to save parameters. - * @return 0 if successful, otherwise -1. - */ -int paddle_save_model(paddle_pserver_client client, const char* path); -``` diff --git a/develop/doc_cn/_sources/design/cluster_train/remote_parameter_updater.md.txt b/develop/doc_cn/_sources/design/cluster_train/remote_parameter_updater.md.txt deleted file mode 100644 index 6e8e5938455b869e0f3367794c41250340b37f77..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/remote_parameter_updater.md.txt +++ /dev/null @@ -1,21 +0,0 @@ -# Design Doc: Remote Parameter Updater for Cluster Train - -For an overview of distribute training, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter updater that will use parameter server cclient [The Client Library of Parameter Server Design Doc](pserver_client.md) to manage and update parameters. - -## Parameter Updater - -Parameter Updater is used by trainer to manage and update parameter, there are mainly two kind of parameter updater: local and remote, since this design is for cluster train, we will only discuss remote parameter updater here. - -### Remote Parameter Updater - -Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md)) - -In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a instance of parameter updater and call it's functions directly. In this design, we will also expose the api of RemoteParameterUpdater to python with swig. - -#### Sparse Remote Parameter Updater - -Since we will only implement dense parameter management new, the mechanism for sparse parameter will be discussed in next stage. - -### Interface Design - -TBD diff --git a/develop/doc_cn/_sources/design/cluster_train/save_model.md.txt b/develop/doc_cn/_sources/design/cluster_train/save_model.md.txt deleted file mode 100644 index b755185c81ad617b9c85c47de0f5f65d2201c658..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/save_model.md.txt +++ /dev/null @@ -1,111 +0,0 @@ -# Design Doc: Save Model - -## Overview - -The model is the output of the training process. There are two -ways from which user can obtain a model: - -- Save model triggered by user code: user code asks PaddlePaddle to - save a model. -- Convert model from the checkpoint: model being converted from - pservers' periodic checkpoint. In this way, the user can cancel a - job at any time, and still have a relatively fresh model (we - checkpoint around every 5 minutes). - -### Trainer Saving Model vs. Pservers Saving Model - -Both trainers and pservers have access to the model. So the model can -be saved from a trainer or pservers. We need to decide where the model -is saved from. - -#### Dense Update vs. Sparse Update - -There are two types of model update methods: dense update and sparse -update (when the model parameter is configured to be sparse). - -- Dense update - - Every trainer has it's own full copy of the model. Every model - update will update the entire model. - -- Sparse update - - The training input is sparse, and the trainer does not have the - entire model. It will only download the sub-model necessary related - to the input. When updating the model, only the sub-model related to - the training input is updated. - - -#### Pservers Saving Model - -The benefit of letting pservers save model is they have the entire -model all the time. However, since pservers are on different nodes, it -requires a merging process to merge model shards into the same -model. Thus requires the pservers to write models to a distributed -filesystem, making the checkpoint shards visible to the merge program. - -#### Trainer Saving Model - -The benefit of letting one trainer to save the model is it does not -require a distributed filesystem. And it's reusing the same save model -logic when training locally - except when doing sparse update, the -trainer needs to download the entire model during the saving process. - -#### Conclusion - -Given trainer saving model does not require a distributed filesystem, -and is an intuitive extension to trainer saving model when training -locally, we decide to let the trainer save the model when doing -distributed training. - - -### Convert Model from Checkpoint - -TODO - - -## Timeline - -We first implement trainer save the model. Converting the latest -snapshot to a model will be a TODO for future. - - -## Trainer Save Model - -### Trainer Election - -One trainer will be elected as the one to save the model. When using -etcd, trainer ID is a randomly generated UUID, the trainer will -contact the master server requesting to save the model, and find out -if itself is elected. When the master server is not used, unique -trainer IDs will be given by the administrator, the trainer whose ID -is "0" is elected to save the model. - -### Model Save Path - -Each trainer will be given the directory to save the model. The -elected trainer will save the model to -`given-directory/trainerID`. Since the trainer ID is unique, this -would prevent concurrent save to the same file when multiple trainers -are elected to save the model when split-brain problem happens. - -### What Happens When Model Is Saving - -It takes some time to save model, we need to define what will happen -when save model is taking place. - -When doing dense update, the trainer uses the local model. Pservers -does not need to pause model update. - -When doing sparse update. The trainer needs to download the entire -model while saving. To get the most accurate model, the model update -needs to be paused before the download starts and resumed after the -download finishes. Otherwise, the trainer gets a model that is -"polluted": some part of the model is old, some part of the model is -new. - -It's unclear that the "polluted" model will be inferior due to the -stochastic nature of deep learning, and pausing the model update will -add more complexity to the system. Since supporting sparse update is a -TODO item. We defer the evaluation of pause the model update or not -during saving model to the future. diff --git a/develop/doc_cn/_sources/design/cluster_train/submit-job.md.txt b/develop/doc_cn/_sources/design/cluster_train/submit-job.md.txt deleted file mode 100644 index 8377d5489dc64bd2fdc5bb4f7bc737e7b489000d..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cluster_train/submit-job.md.txt +++ /dev/null @@ -1,127 +0,0 @@ -# Submit a Distributed Training Job - -The user can submit a distributed training job with Python code, rather than with a command-line interface. - -## Runtime Environment On Kubernetes - -For a distributed training job, there is two Docker image called *runtime Docker image* and *base Docker image*. The runtime Docker image is the Docker image that gets scheduled by Kubernetes to run during training. The base Docker image is for building the runtime Docker image. - -### Base Docker Image - -Usually, the base Docker image is PaddlePaddle product Docker image including paddle binary files and python package. And of course, users can specify any image name hosted on any docker registry which users have the access right. - -### Runtime Docker Image - -The trainer package which user upload and some Python dependencies are packaged into a runtime Docker image based on base Docker image. - -- Handle Python Dependencies - - You need to provide requirements.txt file in your `trainer-package` folder. Example: - - ```txt - pillow - protobuf==3.1.0 - ``` - More [details](https://pip.readthedocs.io/en/1.1/requirements.html) about requirements, an example project looks like: - ```bash - paddle_example - |-quick_start - |-trainer.py - |-dataset.py - |-requirements.txt - ``` - -## Submit Distributed Training Job With Python Code - - -- `paddle.job.dist_train()` will call the Job Server API `/v1/packages` to upload the trainer package and save them on CephFS, and then call `/v1/trainer/job` to submit the PaddlePaddle distributed job. -- `/v1/trainer/job` will start a building job for preparing the runtime Docker image. When the building job is finished, Job Server will submit the PaddlePaddle distributed job to Kubernetes. -- *NOTE*: For the first version, we will not prepare the runtime Docker image, instead, the package is uploaded to Paddle Cloud, and Paddle Cloud will mount the package in a temporary folder into the base Docker image. We will not support custom Python dependencies in the first version as well. - -You can call `paddle.job.dist_train` and provide distributed training configuration as the parameters: -```python -paddle.job.dist_train( - trainer=dist_trainer(), - paddle_job=PaddleJob( - job_name = "paddle-cloud", - entry_point = "python %s"%__file__, - trainer_package = "/example/word2vec", - image = "yancey1989/paddle-job", - trainers = 10, - pservers = 3, - trainer_cpu = 1, - trainer_gpu = 1, - trainer_mem = "10G", - pserver_cpu = 1, - pserver_mem = "2G" - )) -``` - -The parameter `trainer` of `paddle.job.dist_train` is a function and you can implement it as follows: -```python -def dist_trainer(): - def trainer_creator(): - trainer = paddle.v2.trainer.SGD(...) - trainer.train(...) - return trainer_creator -``` - -The pseudo code of `paddle.job.dist_train` is as follows: -```python -def dist_train(trainer, paddle_job): - # if the code is running on cloud, set PADDLE_ON_CLOUD=YES - if os.getenv("RUNNING_ON_CLOUD", "NO") == "NO": - #submit the paddle job - paddle_job.submit() - else: - #start the training - trainer() -``` -### PaddleJob Parameters -parameter | type | explanation - --- | --- | --- -job_name | str | the unique name for the training job -entry_point | str | entry point for startup trainer process -trainer_package | str | trainer package file path which user have the access right -image|str|the [base image](#base-docker-image) for building the [runtime image](#runtime-docker-image) -pservers|int| Parameter Server process count -trainers|int| Trainer process count -pserver_cpu|int| CPU count for each Parameter Server process -pserver_mem|str| memory allocated for each Parameter Server process, a plain integer using one of these suffixes: E, P, T, G, M, K -trainer_cpu|int| CPU count for each Trainer process -trainer_mem|str| memory allocated for each Trainer process, a plain integer using one of these suffixes: E, P, T, G, M, K -trainer_gpu|int| GPU count for each Trainer process, if you only want CPU, do not set this parameter - -### Deploy Parameter Server, Trainer and Master Process - - Deploy PaddlePaddle Parameter Server processes, it's a Kubernetes ReplicaSet. - - Deploy PaddlePaddle Trainer processes, it's a Kubernetes Job. - - Deploy PaddlePaddle Master processes, it's a Kubernetes ReplicaSet. - -## Job Server - -- RESTful API - - Job server provides RESTful HTTP API for receiving the trainer package and displaying - PaddlePaddle job related informations. - - `POST /v1/package` receive the trainer package and save them on CephFS - - `POST /v1/trainer/job` submit a trainer job - - `GET /v1/jobs/` list all jobs - - `GET /v1/jobs/` the status of a job - - `DELETE /v1/jobs/` delete a job - - `GET /v1/version` job server version - -- Build Runtime Docker Image on Kubernetes - - `paddle.job.dist_train` will upload the trainer package to Job Server, save them on the distributed filesystem, and then start up a job for building the runtime Docker image that gets scheduled by Kubernetes to run during training. - - There are some benefits for building runtime Docker image on JobServer: - - On Paddle Cloud, users will run the trainer code in a Jupyter Notebook which is a Kubernetes Pod, if we want to execute `docker build` in the Pod, we should mount the host's `docker.sock` to the Pod, user's code will connect the host's Docker Engine directly, it's not safe. - - Users only need to upload the training package files, does not need to install docker engine, docker registry as dependencies. - - If we want to change another image type, such as RKT, users do not need to care about it. - -- Deploy Parameter Server, Trainer and Master Processes - - `POST /v1/trainer/job` receives the distributed training parameters, and deploy the job as follows: - - Deploy PaddlePaddle Parameter Server processes, it's a Kubernetes ReplicaSet. - - Deploy PaddlePaddle Trainer processes, it's a Kubernetes Job. - - Deploy PaddlePaddle Master processes, it's a Kubernetes ReplicaSet. diff --git a/develop/doc_cn/_sources/design/concurrent_programming.md.txt b/develop/doc_cn/_sources/design/concurrent_programming.md.txt deleted file mode 100644 index f022e67fd3a048cd7e53c91d9a1fd0506487b665..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/concurrent_programming.md.txt +++ /dev/null @@ -1,163 +0,0 @@ -# Design Doc: Concurrent Programming with Fluid - -With PaddlePaddle Fluid, users describe a program other than a model. The program is a [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto) protobuf message. TensorFlow/MxNet/Caffe2 applications generate protobuf messages too, but their protobuf messages represent the model, a graph of operators, but not the program that trains/uses the model. - -Many know that when we program TensorFlow, we can specify the device on which each operator runs. This allows us to create a concurrent/parallel AI application. An interesting questions is **how does a `ProgramDesc` represents a concurrent program?** - -The answer relies on the fact that a `ProgramDesc` is similar to an abstract syntax tree (AST) that describes a program. So users just program a concurrent program that they do with any concurrent programming language, e.g., [Go](https://golang.org). - -## An Analogy - -The following table compares concepts in Fluid and Go - -| Go | Fluid | -|----|-------| -|user-defined functions | [layers](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid) | -| control-flow and built-in functions | [intrinsics/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators) | -| goroutines, channels | [class ThreadPool](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework/thread_pool.h) | -| runtime | [class Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h) | - -## An Example Concurrent Program - -To review all above concepts in an example, let us take a simple program and writes its distributed version. - -Suppose that we want to parallelize a naive Fluid program (written in Go and calling Fluid's Go binding) that multiplies two tensors. - -```go -import "fluid" - -func paddlepaddle() { - X = fluid.read(...) - W = fluid.Tensor(...) - Y = fluid.mult(X, W) -} -``` - -Please be aware that the Fluid's Go binding provides the default `main` function, which calls the `paddlepaddle` function, which, in this case, is defined in above program and creates the following `ProgramDesc` message. - -```protobuf -message ProgramDesc { - block[0] = Block { - vars = [X, W, Y], - ops = [ - read(output = X) - assign(input = ..., output = W) - mult(input = {X, W}, output = Y) - ], - } -} -``` - -Then, the default `main` function calls `fluid.run()`, which creates an instance of the [`class Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h) and calls `Executor.Run(block[0])`, where `block[0]` is the first and only block defined in above `ProgramDesc` message. - -The default `main` function is defined as follows: - -```go -func main() { - paddlepaddle() - fluid.run() -} -``` - -## The Concurrent Version - -By parallelizing the above program, we could support very big tensor X by splitting into small pieces {x_1, x_2, ...} and sent each piece to worker process/node for parallel multiplication. - -In this case, we can write a transpiler that takes a `ProgramDesc` message that represents the above example program and outputs two `ProgramDesc` messages, one for running on the master process/node, and the other one for worker processes/nodes. - -### The Master Program - -The master program could look like the following: - -```protobuf -message ProgramDesc { - block[0] = Block { - vars = [X, L, Y], - ops = [ - read(output = X) - kube_get_workers_addrs(output = L) - Y = tensor_array(len(L)) - parallel_for(input = X, output = Y, - attrs = {L, block_id(1)}) # referring to block 1 - ] - } - - block[1] = Block { - parent = 0, - vars = [x, y, index], - ops = [ - slice(input = [X, index], output = x) # index is initialized by parallel_for - send(input = x, attrs = L[index]) - recv(outputs = y, attrs = L[index]) - assign(input = y, output = Y[index]) - ] - } -} -``` - -The equivalent Fluid program (calling the Go binding) is: - -```go -func main() { //// block 0 - X = fluid.read(...) - L = fluid.k8s.get_worker_addrs() - Y = fluid.tensor_array(len(L)) - fluid.parallel_for(X, L, - func(index int) { //// block 1 - x = X[index] - fluid.send(L[index], x) - y = fluid.recv(L[index]) - Y[index] = y - }) -} -``` - -An explanation of the above program: - -- `fluid.k8s` is a package that provides access to Kubernetes API. -- `fluid.k8s.get_worker_addrs` returns the list of IP and ports of all pods of the current job except for the current one (the master pod). -- `fluid.tensor_array` creates a [tensor array](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor_array.h). `fluid.parallel_for` creates a `ParallelFor` intrinsic, which, when executed, - - 1. creates `len(L)` scopes, each for the concurrent running of the sub-block (block 1 in this case), and initializes a variable named "index" in the scope to an integer value in the range `[0, len(L)-1]`, and - 2. creates `len(L)` threads by calling into the `ThreadPool` singleton, each thread - 1. creates an Executor instance, and - 2. calls `Executor.Run(block)`, where `block` is block 1 as explained above. -1. Please be aware that block 1 is a sub-block of block 0, so ops in block 1 could refer to variables defined in block 0. - -### The Worker Program - -The worker program looks like - -```go -func main() { - W = Tensor(...) - x = fluid.listen_and_do( - fluid.k8s.self_addr(), - func(input Tensor) { - output = fluid.mult(input, W) - }) -} -``` - -where - -- `fluid.listen_and_do` creates a `ListenAndDo` intrinsic, which, when executed, - 1. listens on the current pod's IP address, as returned by `fliud.k8s.self_addr()`, - 2. once a connection is established, - 1. creates a scope of two parameters, "input" and "output", - 2. reads a [Fluid variable](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h) and saves it into "input", - 3. creates an Executor instance and calls `Executor.Run(block)`, where the block is generated by running the lambda specified as the second parameter of `fluid.listen_and_do`. - -## Summarization - -From the above example, we see that: - -1. Fluid enables the imperative programming paradigm by: - 1. letting users describe a program, but not a model (a sequence of layers, or a graph of operators), and - 2. call the `fluid.run` function that runs the program implicitly. -1. The program is described as a `ProgramDesc` protobuf message. -2. Function `Executor.Run` takes a block, instead of a `ProgramDesc`, as its parameter. -3. `fluid.run` calls `Executor.Run` to run the first block in the `ProgramDesc` message. -4. `Executor.Run`'s implementation is extremely simple -- it doesn't plan the execution nor create threads; instead, it runs on the current thread and execute intrinsics/operators' `Run` method sequentially as they appear in the `Block.ops` array. -5. Intrinsics/operators' `Run` method might create threads. For example, the `ListenAndDo` operator creates a thread to handle each incoming request. -6. Threads are not necessarily OS thread; instead, they could be [green threads](https://en.wikipedia.org/wiki/Green_threads) managed by ThreadPool. Multiple green threads might run on the same OS thread. An example green threads is Go's [goroutines](https://tour.golang.org/concurrency/1). diff --git a/develop/doc_cn/_sources/design/cpp_data_feeding.md.txt b/develop/doc_cn/_sources/design/cpp_data_feeding.md.txt deleted file mode 100644 index 40205350f99722f0b71bfa6f390fe9d01d831966..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/cpp_data_feeding.md.txt +++ /dev/null @@ -1,79 +0,0 @@ -# C++ Data Feeding - -In training with Paddle V2 API, data feeding wholly dependents on Python code. To get rid of the Python environment and achieve the goal of "wrapping the whole training by a while loop op" in Paddle Fluid, a C++ data feeding mechanism is required. - -In this document we show the fundamental design of C++ data feeding process, which includes the data reading, shuffling and batching. - -## Reader - -A new concept named 'Reader' is introduced. `Reader` is a series of inherited classes which can be hold by our `Variable` and they are used to read or process file data. - - -### `ReaderBase` - -`ReaderBase` is the abstract base class of all readers. It defines the all readers' interfaces. - -```cpp -class ReaderBase { - public: - explicit ReaderBase(const std::vector& shapes) : shapes_(shapes) { - PADDLE_ENFORCE(!shapes_.empty()); - } - // Read the next batch of data. (A 'batch' can be only one instance) - virtual void ReadNext(std::vector* out) = 0; - // Show whether the next bacth exists. - virtual bool HasNext() const = 0; - - // Reinitialize the reader and read the file from the begin. - virtual void ReInit() = 0; - - // Get a certain read in data's shape. - DDim shape(size_t idx) const; - // Get shapes of all read in data. - std::vector shapes() const { return shapes_; } - // Set shapes of read in data. - void set_shapes(const std::vector& shapes) { shapes_ = shapes; } - - virtual ~ReaderBase() {} - - protected: - std::vector shapes_; -}; -``` - -### `FileReader` and `DecoratedReader` - -These two classes are derived from the `ReaderBase` and will further be derived by respective specific readers. That is to say, in our design, there are two kinds of readers: file readers and decorated readers. A file reader reads from a file of some specific format, and yield only one instance of data at a time. e.g. RecordIO reader, jpg reader, .... A decorated reader takes another reader(both file reader and decorated reader are OK) as its 'underlying reader'. It gets data from its underlying reader, does some process on them(shuffling, or batching), then yields processed data. The output data of a decorated reader can be a single instance or a batch. `ShuffleReader` and `BatchReader` are both decorated readers. - -All the readers share exactly the same interfaces defined in `ReaderBase`. So they can be decorated for more than one time: We can **shuffle** a reader's outputs and then **batch** the shuffle outputs. The interface consistency also allows related ops use readers without knowing what they are exactly. - - -### `ReaderHolder` - -Different readers belong to different class types. It leads to a problem: How can we drop them into `Variable`s and fetch them out by a unified method? For example, if a Variable holds a `BatchReader`, we can not get it by the following code: - -```cpp -var->Get("batch_reader"); -``` - -we have to write: - -```cpp -var->Get("batch_reader"); -``` - -This requires each time getting a reader from a variable we must know the reader's type exactly. It is nearly impossible. - -To solve this problem, we introduce `ReaderHolder` as a wrapper. It acts as an empty decorator of `ReaderBase`, which erases reader's type. With `ReaderHolder` we are able to fetch all types of readers by `var->Get("...")` and regard the obtained object as a reader. - -## Related Operators - -To create and invoke readers, some now ops are introduced: - -### `CreateReaderOp` - -Each reader has its creating op. File readers' creating ops have no input and yield the created file reader as its output. Decorated readers' creating ops take the underlying readers as inputs and then yield new decorated readers. - -### `ReadOp` - -A reader is only a Variable. It cannot trigger the reading process by itself. So we add the `ReadOp` to execute it. A `ReadOp` takes a reader Variable as its input. Each time it runs, it invokes the reader‘s `ReadNext()` function and gets a new batch of data(or only one instance of data, if we use file reader directly). The output data of a reader are in the form of `std::vector`, so the `ReadOp` also needs to split the vector and move LoDTensors to their respective output Variables. diff --git a/develop/doc_cn/_sources/design/csp.md.txt b/develop/doc_cn/_sources/design/csp.md.txt deleted file mode 100644 index 10d936860fab7e09241e968a63526c7d86d3e568..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/csp.md.txt +++ /dev/null @@ -1,224 +0,0 @@ -# Design Doc: CSP in PaddlePaddle Fluid - -## Motivation - -Concurrent programming is important for deep learning. Few example applications are: - -1. The main thread keeps reading the next mini-batch while another thread uses the GPU for computing. -2. The main thread performs the computation while another thread uploads the local gradients from each trainer to the parameter server. - -Most DL systems, including TensorFlow, Caffe2, and MxNet, can asynchronously execute operators in a graph. However, Fluid doesn't have the concept of a graph at all, as the design goal of Fluid is that of a programming language. - -## Concurrent Programming Models - -There were many concurrent programming models, implemented in various forms: - -| concurrent programming model | implementation | -|-----|-----| -| mutex | types and functions in standard libraries | -| semaphore | types and functions in standard libraries | -| communicating sequential processes (CSP) | Go programming language | -| actor model | Erlang programming language | -| message passing | MPI | -| bulk synchronous parallel (BSP) | Pregel distributed programming framework | - -Since Fluid was designed to be a programming language, we would like to implement CSP in Fluid. - -### CSP v.s. Actor Model - -A well-known implementation of Actor Model is the Erlang programming language. In Actor Model, *processes* could send messages to another process and receive messages from another process given the process IDs. We can find the three ingredients, process with ID, send, and recv, in MPI too. Indeed, we can rewrite Erlang programs in Python + MPI with possibly fewer lines of code. Our concern with Actor Model is that it doesn't seem reasonable to implement process management in a programming language's runtime library; instead, it should be the operating systems' responsibility to manage processes and libraries like MPI for send/recv. - -## CSP in Fluid - -Fluid has two fundamental control-flows: *if-else* and *while*. If we are to implement CSP, we need the following: - -1. a new data type: *channel* and operators *send* and *recv*, -1. *goroutine* or thread, and -1. a new control-flow: select. - -We also need Python wrappers for the above components. - -The type *channel* is conceptually the blocking queue. In Go, its implemented is a [blocking circular queue](https://github.com/golang/go/blob/68ce117cf17b8debf5754bfd476345779b5b6616/src/runtime/chan.go#L31-L50), which supports send and recv. - -The `select` operation has been in OS kernels long before Go language. All Unix kernels implement system calls *poll* and *select*. They monitor multiple file descriptors to see if I/O is possible on any of them. This takes O(N) time. Since Linux 2.6, a new system call, *epoll*, can do the same in O(1) time. In BSD systems, there is a similar system call *kqueue*. Go's Linux implementation uses epoll. - -It might be a good idea to implement Fluid's select using epoll too. In this design doc, we start from the O(N) way so that we could focus on Python binding and the syntax. - -### Type Channel - -Fluid supports many data types: - -1. Tensor, -1. Row-sparse Tensor -1. LoD Tensor, -1. Tensor array, etc - -Each data type is registered in the [`framework.proto`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L117-L127) as an enum value. To add a new type channel, we need to add a new type enum. - -To expose a C++ type to Python, we need to edit the [`pybind.cc`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc) file. [Here](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc#L120-L164) is an example how we expose C++ class LoDTensor. - -## Syntax Design - -### Create Channel - -In Go, we create a channel by specifying the element type and buffer size: - -```go -ch := make(chan int) // a channel without buffer -ch1 := make(chan int, 100) // a channel that can buffer 100 ints. -``` - -In Fluid, we should be able to do the same: - -```python -ch = fluid.make_channel(dtype=INT) -ch1 = fluid.make_channel(dtype=INT, 100) -``` - -In addition to that, we want channels that can hold more complex element types, e.g., Tensors of float16: - -```python -ch = fluid.make_channel(dtype=Tensor, etype=float16) -``` - -or Tensors of Tensors of float16 etc. - -The point here is that we need a consistent way to compose types, like in C++ we can have `Tensor...> >`. - -### Send and Recv - -Go's CSP implementation depends on data type *channel*. There are two types of channels: - -1. The unblocked channel, or buffered channel, is a blocking queue with a non-zero sized buffer. The sending to buffered channel blocks if the buffer is full, and the receive operation blocks if the buffer is empty. -1. blocked channel, or unbuffered channel, is a blocking queue with no buffer. Both sending and receiving block with unbuffered channels. - -There are four types of actions with a channel: - -1. Create a channel - - ```go - ch := make(chan int) // this is an unbuffered channel - ch := make(chan int, 100) // this is a buffered channel of 100 ints. - ``` - -1. Send - - ```go - ch <- 111 - ``` - -1. Recv - - ```go - y, ok <- ch - ``` - -1. Close - - ```go - close(ch) - ``` - - Please be aware that a closed channel is not a nil channel, which is `var ch chan int`. - -There are some [axioms with channels](https://dave.cheney.net/2014/03/19/channel-axioms): - -1. A send to a nil channel blocks forever - -1. A receive from a nil channel blocks forever - -1. A send to a closed channel panics - -1. A receive from a closed channel returns the residual values and then zeros. - -In Fluid, we have [buffered channels](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/details/buffered_channel.h) and [unbuffered channels](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/details/unbuffered_channel.h) - -The following program illustrates the Python syntax for accessing Fluid buffers. - -```python -import fluid - -buffer_size = 10 -ch = fluid.make_channel(dtype=INT, buffer_size) - -# Now write three elements to the channel -with fluid.while(steps=buffer_size): - fluid.send(ch, step) - -fluid.close_channel(ch) - -with fluid.while(steps=buffer_size): - fluid.print(fluid.recv(ch)) -``` - -The following example shows that to avoid the always-blocking behavior of unbuffered channels, we need to use Fluid's goroutines. - -```python -import fluid - -ch = fluid.make_channel(dtype=INT) - -with fluid.go(): - fluid.send(ch) - -y = fluid.recv(ch) - -fluid.close_channel(ch) -``` - -### Select - -In Go, the `select` statement lets a goroutine wait on multiple communication operations. A `select` blocks until one of its cases can run, then it executes that case. It chooses one at random if multiple are ready. - -```go - -ch1 := make(chan int) -ch2 := make(chan int, 100) - -x := 0 - -for { - select { - case ch1 <- x: - x := x + 1 - case y <- ch2: - fmt.Println("Received on channel") - default: - fmt.Println("Default") - } - } - -``` - -In Fluid, we should be able to do the same: - -```python -ch1 = fluid.make_chan(dtype=INT) -ch2 = fluid.make_chan(dtype=INT, 100) - -sel = fluid.select() - -with sel.case(ch1, 'w', X): - fluid.layers.increment(X) - -with sel.case(ch2, 'r', Y): - fluid.print("Received on Channel") - -with sel.default(): - fluid.print("Default") - -``` - -In the above code snippet, `X` and `Y` are variables. Now let us look at each of these statements one by one. - -- `sel.case(ch1, 'w', X)` : This specifies that we are writing to `ch1` and we want to write the integer in variable `X` to the channel. The character `w` is used here to make the syntax familiar to write syntax in Python I/O. - -- `sel.case(ch2, 'r', Y)` : This specifies that we would like to read the result from `ch2` into variable `Y`. The character `r` is used here to make the syntax familiar to read syntax in Python I/O. - -- `sel.default()` : This is equivalent to the default in Go `select`. If none of the channels are ready for read or write, then the fluid code in the default block will be executed. - -## Example Programs - -### 1. RPC between Trainers and Parameter Servers - -### 2. Concurrent Minibatch Loading diff --git a/develop/doc_cn/_sources/design/dist_refactor/distributed_architecture.md.txt b/develop/doc_cn/_sources/design/dist_refactor/distributed_architecture.md.txt deleted file mode 100644 index 9368c5780dc922953f38bf0f86d9f797a4a8a6fe..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/dist_refactor/distributed_architecture.md.txt +++ /dev/null @@ -1,197 +0,0 @@ -# Design Doc: Distributed Training Architecture - -## Abstract - -PaddlePaddle version 0.10.0 uses the "trainer-parameter server" architecture. We run multiple instances of trainers (where each trainer runs the same model) and parameter servers for distributed training. This architecture serves well, but has few limitations: - -1. There is a need to write special code that handles tasks which should only be run on a single trainer. E.g., initializing the model, saving the model etc. - -2. Model parallelism is hard: It would need all the if-else branches conditioned on the trainer ID to partition the model onto the trainers, and eventually manually writing out the inter-model-shard communication code to communicate between different trainers. - -3. The user can not directly specify the parameter update rule: This would need to modify the parameter server code and compile a new binary. This makes things more complicated for researchers: A lot of extra effort is required to make this work. Besides, the training job submission program may not allow running arbitrary binaries. - -This design doc discusses PaddlePaddle's new distributed training architecture that addresses the above mentioned limitations. - -## Analysis - -The assumption is that the user writes the trainer program in either Python or C++. - -### Limitation 1 - -There are two basic functionalities in the trainer program: - -1. The training logic such as loading / saving the model and printing out the logs. -2. The neural network definition such as the definition of the data layer, the fully connected layer, the cost function and the - optimizer. - -When we train using PaddlePaddle v0.10.0 in a distributed fashion, multiple instances of the same Python code are run on different nodes, hence both: the -training logic as well as the neural network computation logic, is replicated. - -The tasks that only need to be run once belong to the training logic. Hence if we only replicate the neural network computation part, and do **not** -replicate the training logic, the limitation mentioned above can be avoided. - -### Limitation 2 - -Model parallelism means that a single model is partitioned into different components and each node runs one of the component separately. This comes at the extra cost of managing the -inter-model-shard communication between nodes. - -PaddlePaddle should ideally be able to modify the neural network computation and figure out the support for model parallelism automatically. However, the -computation is only specified in Python code which sits outside of PaddlePaddle, hence PaddlePaddle can not support the feature in this setup. - -Similar to how a compiler uses an intermediate representation (IR) so that the programmer does not need to manually optimize their code for most of the cases, we can have an intermediate representation in PaddlePaddle as well. The compiler optimizes the IR as follows: - - - -PaddlePaddle can support model parallelism by converting the IR so that the user no longer needs to manually perform the computation and operations in the Python component: - - - -The IR for PaddlePaddle after refactoring is called a `Block`, it specifies the computation dependency graph and the variables used in the computation. - -### Limitation 3 - -The user can not directly specify the parameter update rule for the parameter server in the Python module, since the parameter server does not use the same computation definition as the trainer. Instead, the update rule is baked inside the parameter server. The user can not specify the update rule explicitly. - -This could be fixed by making the parameter server also run an IR, which can be different to the trainer side -For a detailed explanation, refer to this document - -[Design Doc: Parameter Server](./parameter_server.md) - -## Distributed Training Architecture - -The revamped distributed training architecture can address the above discussed limitations. Below is the illustration of how it does so: - - - -The major components are: *Python API*, *Distribute Transpiler* and *Remote Executor*. - -### Python API - -Python API is the Python library that user's Python code invokes, to read the data, build the neural network topology, and start training, etc. - -```Python -images = fluid.layers.data(name='pixel', shape=[1, 28, 28], dtype='float32') -label = fluid.layers.data(name='label', shape=[1], dtype='int64') -... -predict = fluid.layers.fc(input=conv_pool_2, size=10, act="softmax") -cost = fluid.layers.cross_entropy(input=predict, label=label) -avg_cost = fluid.layers.mean(x=cost) -optimizer = fluid.optimizer.Adam(learning_rate=0.01) -optimizer.minimize(avg_cost) - -train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=500), - batch_size=BATCH_SIZE) - -place = fluid.CPUPlace() -exe = fluid.Executor(place) - -for pass_id in range(10): - for data in train_reader(): - loss, acc = exe.run(trainer_prog, - feed=feeder.feed(data), - fetch_list=[avg_cost]) -``` - -The code above is a typical local training program, the "Training Program" is built using helper functions such as -`fluid.layer.fc`. The training is done by calling `Executor.run` -iteratively. - -For more details, the implementation of IR is [Program](../program.md), and `ProgramDesc` is the protobuf type. - -[Executor](../executor.md) simply runs the `ProgramDesc`. For local training you generally use -`Executor` to run the program locally. For any kind of distributed training, you can use -`RemoteExecutor` to specify desired distributed training method with some optional arguments. - -### Distributed Transpiler - -The Distributed Transpiler automatically converts the IR (in protobuf format) to partitioned IRs. Then -the Remote Executor dispatches the new IRs to Remote Executors across the cluster. -Below are the steps that are followed : - -1. User only need to change `Executor` to `RemoteExecutor` to change local program to distributed program. -1. `RemoteExecutor` calls `Distributed Transpiler` to "transpile" user's program to several IRs representing a - distributed training program: - 1. Parse configurations from `RemoteExecutor`. - 1. Determine the type of distributed program, can be DataParallelism, ModelParallelism or Streaming. - 1. Partition the `ProgramDesc` according to type and add `send` / `recv` OP pair on the boundaries. Take - DataParallelism type for example, it removes the optimization operators and add a `send` OP to the - "trainer" role, then add the optimization operators to the parameter server role within the `recv` OP. -1. Dispatch the partitioned graph to different `RemoteExecutor` in the cluster. -1. `RemoteExecutor` on each node run the received `ProgramDesc` utill the end. - - -### RemoteExecutor - -As shown in the graph, `RemoteExecutor.run` sends the IR to the cluster for Execution. -You can also use parameter `fetch_list` to interactively fetch variable back to local for -log printing. - -The Python `RemoteExecutor` is derived from `Executor` class. - -```python -exe = RemoteExecutor( - feed=feeder.feed(data), - fetch_list=[avg_cost], - job_desc=JobDesc( - jobname, - num_trainer, - num_pserver, - cpu_per_trainer, - gpu_per_trainer, - mem_per_trainer, - cpu_per_pserver, - mem_per_pserver - )) -for data in train_reader(): - loss, acc = exe.run(trainer_prog, - feed=feeder.feed(data), - fetch_list=[avg_cost]) -``` - -`JobDesc` object describe the distributed job resource specification to run on -Cluster environment. - - - -`RemoteExecutor.run` sends the `ProgramDesc` and -[TrainingJob](https://github.com/PaddlePaddle/cloud/blob/develop/doc/autoscale/README.md#training-job-resource) -to a server in the cluster which executes `RemoteExecutor.listen`. This server is responsible -to start the final Kubernetes Jobs to run the different role of `ProgramDesc` from `ConfigMap`. - - -### Placement Algorithm - -Our first implementation will only support "trainer-parameter server" placement: the parameters, initializers, and optimizers are all placed on the PaddlePaddle runtimes with the parameter server role. Everything else will be placed on the PaddlePaddle runtimes with the trainer role. This has the same functionality as the "trainer-parameter server" architecture of PaddlePaddle v0.10.0, but is more generic and flexible. - -In the future, a more general placement algorithm should be implemented, which makes placements according to the input IR, and a model of device computation time and device communication time. Model parallelism requires the generic placement algorithm. - - -### Local Training Architecture - -The local training architecture will be the same as the distributed training architecture, the difference is that everything runs locally, and there is just one PaddlePaddle runtime: - - - - -### Training Data - -In PaddlePaddle v0.10.0, training data is typically read -with [data reader](../reader/README.md) from Python. This approach is -no longer efficient when training distributedly since the Python -process no longer runs on the same node with the trainer processes, -the Python reader will need to read from the distributed filesystem -(assuming it has the access) and send to the trainers, doubling the -network traffic. - -When doing distributed training, the user can still use Python data -reader: the training data are sent with `Executor.run`. However, should -be used for debugging purpose only. The users are encouraged to use -the read data OPs. - - -## References: - -[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf) - -[2] [TensorFlow: A System for Large-Scale Machine Learning](https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf) diff --git a/develop/doc_cn/_sources/design/dist_refactor/multi_cpu.md.txt b/develop/doc_cn/_sources/design/dist_refactor/multi_cpu.md.txt deleted file mode 100644 index a8d8ee0422acc84835170a44eb83f9b5f0c6bb40..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/dist_refactor/multi_cpu.md.txt +++ /dev/null @@ -1,43 +0,0 @@ -# Design Doc: Execute the Program with Multi CPU - -## Abstract - -This Design Doc propose an approach to make the user-defined Op graph -running with multi-CPU, we will use an auto transpiler to convert the user-defined -Op graph to a multi-CPU Op graph, and run `ParallelDo` Op to run the graph. - -## Transpiler - - - -After converted: - - - -## Implement - -- `Multi-CPU Transpiler` will convert the graph to a multi-CPU graph - which would be executed with multi-threads. -- `BlockingCounter` will `Init/Decrement` an atomic counter, and Blocking `Wait` - for the atomic counter become `0`: - ```cpp - BlockingCounter bc(thread_count); - for (int i = 0; i < thread_count; ++i) { - thread_pool->Start([&bc] {bc.DecrementCount(); }) - } - bc.Wait(); - ``` -- `ParallelDo` Operator - - Initialize a thread pool which is a Singleton. - - Use a block id as the input, and create run the specify Block on independent scope - with multi-threads. - - Initialize a `BlockingCounter` instance and wait until all threads are done. -- `Split` Operator will split the Input Tensor into a TensorArray. -- `Merge` merge all the gradients which calculated in different threads - with `mean/sum/max/min...` method, and then run the Optimizer Op to optimize `W`. - -## TODO - -- Improve the optimizer stage with multi-threads, since we could - assign the parameters to the different threads and execute - optimizer with multi-threads. diff --git a/develop/doc_cn/_sources/design/dist_refactor/parameter_server.md.txt b/develop/doc_cn/_sources/design/dist_refactor/parameter_server.md.txt deleted file mode 100644 index 805dd13048d41b995d2a01cda52b2ea33e4bbe1d..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/dist_refactor/parameter_server.md.txt +++ /dev/null @@ -1,96 +0,0 @@ -# Design Doc: Parameter Server - -## Abstract - -We propose an approach to implement the parameter server. In this -approach, there is no fundamental difference between the trainer and -the parameter server: they both run subgraphs, but subgraphs of -different purposes. - -## Background - -The previous implementations of the parameter server do not run a -fluid sub-program. Parameter initialization, optimizer computation, network -communication and checkpointing are implemented twice on both the -trainer as well as the parameter server. - -It would be great if we can write code once and use them on both: the -trainer and the parameter server, since this reduces code duplication and -improves extensibility. Given that after the current refactoring, we are -representing everything as a computation graph on the -trainer. Representing everything as a computation graph on the parameter -server becomes a natural extension. - -## Design - -### Distributed Transpiler - -The *Distributed Transpiler* converts the user-defined fluid program -into sub-programs to be scheduled on different nodes with the following -steps: - -1. OP placement: the OPs will be placed on different nodes according - to a heuristic that minimizes the estimated total computation - time. Currently we will use a simple heuristic that puts parameter - variable on parameter server workers and everything else on trainer - workers. -1. Add communication OPs to enable the communication between nodes. - -We will need these OPs: *Send*, *Recv*, *Enqueue*, *Dequeue*. - -Below is an example of converting the user defined graph to the -subgraphs for the trainer and the parameter server: - - - -After converting: - - - -1. The parameter variable W and its optimizer program are placed on the parameter server. -1. Operators are added to the program. - - *Send* sends data to the connected *Recv* operator. The - scheduler on the receive node will only schedule *Recv* operator - to run when the *Send* operator has ran (the *Send* OP will mark - the *Recv* OP runnable automatically). - - *Enqueue* enqueues the input variable, it can block until space - become available in the queue. - - *Dequeue* outputs configurable numbers of tensors from the - queue. It will block until the queue has the required number of - tensors. - - -### Benefits - -- Model parallelism becomes easier to implement: it is an extension to - the trainer - parameter server approach. We can have several "Transpilers" - to achieve different goals. -- User-defined optimizer is easier to add - user can now express it as - a sub-program. -- No more duplication logic inside the trainer and the parameter - server mentioned in the background section. - -### Challenges - -- It is important to balance the parameter shards on multiple - parameter servers. If a single parameter is very big (for example: some - word-embedding, fully connected, softmax layer), we need to - automatically partition the single parameter onto different - parameter servers when possible (only element-wise optimizer depends - on the parameter variable). -- In the "Async SGD" figure, the "W" variable on the parameter server - could be read and written concurrently. See - [here](https://github.com/PaddlePaddle/Paddle/pull/6394) for more - details about concurrent program in Fluid. - -### Discussion - -- Can the Enqueue OP be implemented under our current tensor design - (put the input tensor into the queue tensor)? -- *Dequeue* OP will have variable numbers of output (depending on the - `min_count` attribute), does our current design support it? (similar - question for the *Add* OP) - - -### References: -[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf) diff --git a/develop/doc_cn/_sources/design/error_clip.md.txt b/develop/doc_cn/_sources/design/error_clip.md.txt deleted file mode 100644 index 58aa73b8cd38d01e2426278a3479714e4fb6a3b0..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/error_clip.md.txt +++ /dev/null @@ -1,92 +0,0 @@ -# Error Clip - -## Overview - -Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables' gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next `grad_op` and be shrunk if necessary. -## Usage - -Users are allowed to assign different error clip methods or attributes to different `Variable`s. Users can specify it as a parameter of `Variable`'s constructor: - -```python -var = framework.Variable(..., error_clip=myErrorClip, ...) -``` - -The default value of `error_clip` is `None`, which means no error clip is employed. When it's not `None`, it should take an object of `BaseErrorClipAttr`'s derived class. So far, `BaseErrorClipAttr` has only one derived class: `ErrorClipByValue`, whose constructor is: - -```python -ErrorClipByValue(max, min=None) -``` - -`max` and `min` represent the maximal and minimal clip threshold respectively. In backward pass, all values of `var`'s gradient greater than `max` or less than `min` will be clipped to `max` and `min` respectively. When the `min` is None, the minimal threshold will be assigned with `-max` automatically. - -So we can enable the error clip with threshold `[-5.0, 5.0]` for variable `var` by: - -```python -var = framework.Variable(..., error_clip=ErrorClipByValue(max=5.0), ...) -``` - -## Implementation - -The `BaseErrorClipAttr` and its derived class `ErrorClipByValue` are defined in *clip.py*. - -```python -class BaseErrorClipAttr(object): - def append_clip_op(self, block, grad_name): - raise NotImplementedError() - - -class ErrorClipByValue(BaseErrorClipAttr): - def __init__(self, max, min=None): - max = float(max) - if min is None: - min = -max - else: - min = float(min) - self.max = max - self.min = min - - def append_clip_op(self, block, grad_name): - clip_op_desc = block.desc.append_op() - clip_op_desc.set_type("clip") - clip_op_desc.set_input("X", [grad_name]) - clip_op_desc.set_output("Out", [grad_name]) - clip_op_desc.set_attr("min", self.min) - clip_op_desc.set_attr("max", self.max) -``` - -The `BaseErrorClipAttr` have one main member functions: `append_clip_op(self, block, grad_name)`. - -This function is used to create a `clip_op` and append it to the end of given `block`. For different error clip algorithm require different `clip_op`, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function. - -These `clip_op`s should be inserted after `grad_op`s whose output gradients need to be clipped. It is equivalent to appending some `clip_op`s to the end of the target block every time a new `grad_op` is added. - -```python -for op_desc in grad_op_descs: - new_op_desc = target_block.desc.append_op() - new_op_desc.copy_from(op_desc) - callback(block=target_block, context=grad_to_var) -``` - -Here we employ a callback function to complete this kind of jobs. In `_append_backward_ops_` function, each time after a `grad_op` is added to the `target_block`, a callback function is invoked. The logic of `clip_op` appending can be implemented inside the callback function. - -The callback function for `clip_op` appending is defined in *clip.py*: - -```python -def error_clip_callback(block, context): - # the context is a grad_to_var map - grad_to_var = context - op_desc = block.desc.op(block.desc.op_size() - 1) - for grad_n in filter(lambda n: grad_to_var.has_key(n), - op_desc.output_arg_names()): - fwd_var = block.var_recursive(grad_to_var[grad_n]) - error_clip = getattr(fwd_var, "error_clip", None) - if not (error_clip is None or isinstance(error_clip, - BaseErrorClipAttr)): - raise TypeError( - "Variable's error_clip should be an instance of BaseErrorClipAttr or None." - ) - if error_clip is not None: - error_clip.append_clip_op(block, grad_n) -``` - -This function takes a `block` and a `context`(which is actually a grad\_to\_var map) as inputs. It checks each output of the last `OpDesc` in the `block`. Notice that the last `OpDesc` of the `block` must be a `grad_op` and its outputs must be some forward variables' gradients. If an output gradient's corresponding forward variable has an attribute of `error_clip`, `error_clip_callback` will call the `error_clip`'s `append_clip_op` function to append the required `clip_op` into the `block`. diff --git a/develop/doc_cn/_sources/design/evaluator.md.txt b/develop/doc_cn/_sources/design/evaluator.md.txt deleted file mode 100644 index 11cc129d56905a9ee666da92fbe6f8559c6d325a..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/evaluator.md.txt +++ /dev/null @@ -1,58 +0,0 @@ -## Evaluator Design - -### Problem Statement - -During training or inference, we provide an evaluation function to measure the model performance, for example, accuracy, precision, etc. In the operator based framework design, the data passes through the network pipeline batch by batch. As a result, inside the operator, we only calculate the metrics for one minibatch. Thus, we need to provide a mechanism to calculate the metrics for each N pass/batch the user wants. - -### Evaluator Design -Currently, every operation is expressed in the graph. We divide the evaluator process into three steps. - -1. Initialize the metric state and add it into the block. - -2. Calculate the concerned metrics for every mini-batch. The single evaluator operator is only responsible for calculating the necessary statistics for one mini-batch. For example, the accuracy operator only calculates the accuracy for a minibatch data if run once. - - -3. Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. When it comes to distributed training/Multi-GPU training, aggregate the value from different devices. - -### Implementation -This design is shown in the Python API. -Each metric operator needs to caculate the metric statistic and return the batch-aware states. Python side is responsible for accumulating the states for each pass. - - -```python -class Evaluator(object): - """ - Evaluator Base class. - """ - def __init__(self, name, **kwargs): - """ - Different evaluator may has different metric states. E.g, Accuracy need two variables, total and right sample counts. - Auc need four variables, `true_positives`, - `true_negatives`, `false_positives` and `false_negatives`. So every evaluator should create its needed variables and append to main_program - - The initialization of Evaluator should be responsible for: - create metric states and append to the main_program - """ - pass - - def _update_ops(self, input, label, **kwargs) - """ - Add mini-batch evaluator caculate operators to the main_program. - Add increment operator to accumulate the metric states. - """ - - - def reset(self, executor, reset_program=None): - """ - Reset metric states at the begin of each pass/user specified batch number. - Execute the reset_program to reset the states. - """ - - - def eval(self, executor, eval_program=None): - """ - Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. - Execute the eval_program and return the result. - """ - return eval_result -``` diff --git a/develop/doc_cn/_sources/design/executor.md.txt b/develop/doc_cn/_sources/design/executor.md.txt deleted file mode 100644 index 2d4b371cc56db82ce5747da6db07f05aa7f7e6c1..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/executor.md.txt +++ /dev/null @@ -1,29 +0,0 @@ -# Executor Design Doc - -## Motivation -In [fluid](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/fluid.md), we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message -[`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree). - -The executor runs the `ProgramDesc` like an interpreter. `ProgramDesc` contains the intrinsics (operators in this case) and variables which will be used, executor explicitly executes the stored precompiled code. - -## Overview - -An executor takes a `ProgramDesc`, a `block_id` and a `Scope`. The `ProgramDesc` is a list of blocks and each block contains the protobuf definition of all the parameters and operators in the block. The `block_id` specifies the entrance block. And the `Scope` is the container of all the variable instances, which is persistent throughout different runs. - -## Executor - -The `Executor` explicitly executes all the intrinsics (operators here) in the `block_id`th block of a `ProgramDesc`. Essentially, it instantiates Variables and Operators, then runs all the operators in sequence one-by-one. -It is very similar to how a push stack frame works when entering a block, following which it cleans up all the temporary variables when a mini-batch is finished. It does not however, have the stack frame pop process. - -### The interface -```c++ - Executor(places); -``` -A executor does not own any computing resources, a user can only construct an executor using the specified places. - -### Running an Executor - -``` - void Run(ProgramDesc, Scope, block_id, create_local_scope); -``` -An `Executor` only provides a unified way to execute `ProgramDesc`. `ProgramDesc` is the target that will be executed, the `Scope` specifies the variable container, the `block_id` indicates the entrance block and `create_local_scope` is a boolean that states whether it will destroy the temporary variables after the execution is finished. diff --git a/develop/doc_cn/_sources/design/file_manager/README.md.txt b/develop/doc_cn/_sources/design/file_manager/README.md.txt deleted file mode 100644 index 3df10d801e568834729f902aace483d033340e2d..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/file_manager/README.md.txt +++ /dev/null @@ -1,87 +0,0 @@ -# FileManager设计文档 -## 目标 -在本文档中,我们设计说明了名为FileManager系统,方便用户上传自己的训练数据以进行分布式训练 - -主要功能包括: - -- 提供常用的命令行管理命令管理文件和目录 -- 支持大文件的断点上传、下载 - -## 名词解释 -- PFS:是`Paddlepaddle cloud File System`的缩写,是对用户文件存储空间的抽象,与之相对的是local filesystem。目前我们用CephFS来搭建。 -- [CephFS](http://docs.ceph.com/docs/master/cephfs/):一个POSIX兼容的文件系统。 -- Chunk:逻辑划上文件分块的单位。 - -## 模块 -### 架构图 - - -### PFSClient -- 功能: 详细设计[link](./pfs/pfsclient.md) - - 提供用户管理文件的命令 - - 需要可以跨平台执行 - -- 双向验证 - PFSClient需要和Ingress之间做双向验证[tls](#tls),所以用户需要首先在`cloud.paddlepaddle.org`上注册一下,申请用户空间,并且把系统生成的CA(certificate authority)、Key、CRT(CA signed certificate)下载到本地,然后才能使用PFSClient。 - -### [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) -- 功能: - 提供七层协议的反向代理、基于粘性会话的负载均衡功能。 - -- 透传用户身份的办法 - Ingress需要把PFSClient的身份信息传给PFSServer,配置的方法参考[link](http://www.integralist.co.uk/posts/clientcertauth.html#3) - -### PFSServer -PFSServer提供RESTful API接口,接收处理PFSClient端的文件管理请求,并且把结果返回PFSClient端。 - -RESTful API - -- /api/v1/files - - `GET /api/v1/files`: Get metadata of files or directories. - - `POST /api/v1/files`: Create files or directories. - - `PATCH /api/v1/files`: Update files or directories. - - `DELETE /api/v1/files`: Delete files or directories. - -- /api/v1/file/chunks - - `GET /api/v1/storage/file/chunks`: Get chunks's metadata of a file. - -- /api/v1/storage/files - - `GET /api/v1/storage/files`: Download files or directories. - - `POST /api/v1/storage/files`: Upload files or directories. - -- /api/v1/storage/file/chunks - - `GET /api/v1/storage/file/chunks`: Download chunks's data. - - `POST /api/v1/storage/file/chunks`: Upload chunks's data. - -## 文件传输优化 - -### 分块文件传输 -用户文件可能是比较大的,上传到Cloud或者下载到本地的时间可能比较长,而且在传输的过程中也可能出现网络不稳定的情况。为了应对以上的问题,我们提出了Chunk的概念,一个Chunk由所在的文件偏移、数据、数据长度及校验值组成。文件的上传和下载都是通过对Chunk的操作来实现的。由于Chunk比较小(默认256K),完成一个传输动作完成的时间也比较短,不容易出错。PFSClient需要在传输完毕最后一个Chunk的时候检查destination文件的MD5值是否和source文件一致。 - -一个典型的Chunk如下所示: - -``` -type Chunk struct { - fileOffset int64 - checksum uint32 - len uint32 - data []byte -} -``` - -### 生成sparse文件 -当destination文件不存在或者大小和source文件不一致时,可以用[Fallocate](https://Go.org/pkg/syscall/#Fallocate)生成sparse文件,然后就可以并发写入多个Chunk。 - -### 覆盖不一致的部分 -文件传输的的关键在于需要PFSClient端对比source和destination的文件Chunks的checksum是否保持一致,不一致的由PFSClient下载或者传输Chunk完成。这样已经传输成功的部分就不用重新传输了。 - -## 用户使用流程 -参考[link](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/data_dispatch.md) - -## 框架生成 -用[swagger](https://github.com/swagger-api/swagger-codegen)生成PFSClient和PFSServer的框架部分,以便我们可以把更多的精力放到逻辑本身上。 - -## 参考文档 -- [TLS complete guide](https://github.com/k8sp/tls/blob/master/tls.md) -- [aws.s3](http://docs.aws.amazon.com/cli/latest/reference/s3/) -- [linux man document](https://linux.die.net/man/) diff --git a/develop/doc_cn/_sources/design/file_manager/pfs/pfsclient.md.txt b/develop/doc_cn/_sources/design/file_manager/pfs/pfsclient.md.txt deleted file mode 100644 index 56bc70c54bbc92b78d66e04fb495b1300cf8ebe0..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/file_manager/pfs/pfsclient.md.txt +++ /dev/null @@ -1,129 +0,0 @@ -# PFSClient - -## Description -The `pfs` command is a Command Line Interface to manage your files on PaddlePaddle Cloud - -## Synopsis -``` -paddle [options] pfs [parameters] -``` - -## Options -``` ---profile (string) - Use a specific profile from your credential file. - ---help (string) - Display more information about command - ---version - Output version information and exit - ---debug - Show detailed debugging log - ---only-show-errors (boolean) - Only errors and warnings are displayed. All other output is suppressed. -``` - -## Path Arguments -When using a command, we need to specify path arguments. There are two path argument type: `localpath` and `pfspath`. - -A `pfspath` begin with `/pfs`, eg: `/pfs/$DATACENTER/home/$USER/folder`. - -[Here](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/data_dispatch.md#上传训练文件) is how to config datacenters. - -## order of Path Arguments -Commonly, if there are two path arguments, the first is the source, and the second is the destination. - -## Subcommonds -- rm - remove files or directories - -``` -Synopsis: - rm [-r] [-v] ... - -Options: - -r - Remove directories and their contents recursively - -v - Cause rm to be verbose, showing files after they are removed. - -Examples: - paddle pfs rm /pfs/$DATACENTER/home/$USER/file - paddle pfs rm -r /pfs/$DATACENTER/home/$USER/folder -``` -- mv - move (rename) files - -``` -Synopsis: - mv [-f | -n] [-v] - mv [-f | -n] [-v] ... - mv [-f | -n] [-v] - mv [-f | -n] [-v] ... - mv [-f | -n] [-v] - mv [-f | -n] [-v] ... - -Options: - -f - Do not prompt for confirmation before overwriting the destination path. (The -f option overrides previous -n options.) - -n - Do not overwrite an existing file. (The -n option overrides previous -f options.) - -v - Cause mv to be verbose, showing files after they are moved. - -Examples: - paddle pfs mv ./text1.txt /pfs/$DATACENTER/home/$USER/text1.txt -``` -- cp - copy files or directories - -``` -Synopsis: - cp [-r] [-f | -n] [-v] [--preserve--links] - cp [-r] [-f | -n] [-v] [--preserve--links] ... - cp [-r] [-f | -n] [-v] [--preserve--links] - cp [-r] [-f | -n] [-v] [--preserve--links] ... - cp [-r] [-f | -n] [-v] [--preserve--links] - cp [-r] [-f | -n] [-v] [--preserve--links] ... - -Options: - -r - Copy directories recursively - -f - Do not prompt for confirmation before overwriting the destination path. (The -f option overrides previous -n options.) - -n - Do not overwrite an existing file. (The -n option overrides previous -f options.) - -v - Cause cp to be verbose, showing files after they are copied. - --preserve--links - Reserve links when copy links - -Examples: - paddle pfs cp ./file /pfs/$DATACENTER/home/$USER/file - paddle pfs cp /pfs/$DATACENTER/home/$USER/file ./file -``` -- ls- list files - -``` -Synopsis: - ls [-r] ... - -Options: - -R - List directory(ies) recursively - -Examples: - paddle pfs ls /pfs/$DATACENTER/home/$USER/file - paddle pfs ls /pfs/$DATACENTER/home/$USER/folder -``` - -- mkdir - mkdir directory(ies) -Create intermediate directory(ies) as required. - -``` -Synopsis: - mkdir ... - -Examples: - paddle pfs mkdir /pfs/$DATACENTER/home/$USER/folder -``` diff --git a/develop/doc_cn/_sources/design/float16.md.txt b/develop/doc_cn/_sources/design/float16.md.txt deleted file mode 100644 index 1ea95ed6b5d6792171569b6ff76d09be92fcb13e..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/float16.md.txt +++ /dev/null @@ -1,105 +0,0 @@ -# Design Doc: float16 - -## Why float16 -Half precision (float16) is a binary floating-point format that occupies 16 bits in memory. float16 is half the size of traditional 32-bit single precision format (float) and has lower precision and smaller range. - -When high precision computation is not required, using float16 data type could potentially - -- reduce storage space, memory bandwidth, and power usages; -- increase the chance of data fitting into a smaller cache of lower latency; -- provide arithmetic speed up if supported by hardware. - -## Survey of current float16 support -A brief survey of float16 support on different compilers, hardwares, and libraries can be found below. Interested readers can refer to [link1](https://github.com/PaddlePaddle/Paddle/issues/4853) and [link2](https://github.com/Xreki/Xreki.github.io/blob/master/multi_data_types_in_dl_framework/ppt/float16_and_quantized_type.md) for more info. - -The goal of float16 is to serve as a key for the executor to find and run the correct version of compute method specialized for float16 in operator kernel. It should be compatible with various natively supported float16 implementations including `__half` for cuda, `float16_t` for ARM, and `Eigen::half` for Eigen to make writing customized float16 kernels easier. - -### Compiler -- nvcc supports `__half` data type after CUDA 7.5. -- `__fp16` or `float16_t` is supported as storage type for gcc >= 6.1 and clang >= 3.4. -- `__fp16` or `float16_t` is supported as arithmetic type for gcc >= 7.1 and clang >= 3.9. - -### Hardware -- `__half` is supported on GPU with compute capability >= 5.3. -- `__fp16` is supported as storage type for ARMv7-A, ARMv8-A, and above. -- `__fp16` is supported as arithmetic type after ARMv8.2-A (currently, the only microarchitecture implementing ARMv8.2-A is ARM Cortex-A75, which is announced in May 2017. There seems to be no application processors currently available on market that adopts this architecture. It is reported that Qualcomm Snapdragon 845 uses Cortex-A75 design and will be available in mobile devices in early 2018). - -### Libraries -- [Eigen](https://github.com/RLovelett/eigen) >= 3.3 supports float16 calculation on both GPU and CPU using the `Eigen::half` class. It is mostly useful for Nvidia GPUs because of the overloaded arithmetic operators using cuda intrinsics. It falls back to using software emulation on CPU for calculation and there is no special treatment to ARM processors. -- [ARM compute library](https://github.com/ARM-software/ComputeLibrary) >= 17.02.01 supports NEON FP16 kernels (requires ARMv8.2-A CPU). - -### CUDA version issue -There are currently three versions of CUDA that supports `__half` data type, namely, CUDA 7.5, 8.0, and 9.0. -CUDA 7.5 and 8.0 define `__half` as a simple struct that has a `uint16_t` data (see [`cuda_fp16.h`](https://github.com/ptillet/isaac/blob/9212ab5a3ddbe48f30ef373f9c1fb546804c7a8c/include/isaac/external/CUDA/cuda_fp16.h)) as follows: -``` -typedef struct __align__(2) { - unsigned short x; -} __half; - -typedef __half half; -``` -This struct does not define any overloaded arithmetic operators. So you have to directly use `__hadd` instead of `+` to correctly add two half types: -``` -__global__ void Add() { - half a, b, c; - c = __hadd(a, b); // correct - c = a + b; // compiler error: no operator "+" matches these operands -} -``` -CUDA 9.0 provides a major update to the half data type. The related code can be found in the updated [`cuda_fp16.h`](https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.h) and the newly added [`cuda_fp16.hpp`](https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.hpp). - -Essentially, CUDA 9.0 renames the original `__half` type in 7.5 and 8.0 as `__half_raw`, and defines a new `__half` class type that has constructors, conversion operators, and also provides overloaded arithmetic operators such as follows: -``` -typedef struct __CUDA_ALIGN__(2) { - unsigned short x; -} __half_raw; - - -struct __CUDA_ALIGN__(2) __half { -protected: - unsigned short __x; -public: - // constructors and conversion operators from/to - // __half_raw and other built-in data types -} - -typedef __half half; - -__device__ __forceinline__ -__half operator+(const __half &lh, const __half &rh) { - return __hadd(lh, rh); -} - -// Other overloaded operators -``` -This new design makes `c = a + b` work correctly for CUDA half data type. - -## Implementation -The float16 class holds a 16-bit `uint16_t` data internally. -``` -struct float16 { - uint16_t x; -}; -``` - -float16 supports the following features: - - constructors / assignment operators that take input from primitive data types including bool, integers of various length, float, and double. - - constructors / assignment operators that take input from `__half` on cuda, `float16_t` on ARM, and `Eigen::half` on Eigen. - - conversion operators to primitive data types and half precision data types on cuda, ARM and Eigen. - - overloaded arithmetic operators for cuda, arm, and non-arm cpu, respectively. These operators will take advantage of the cuda and ARM intrinsics on the corresponding hardware. - -To support the above features, two fundamental conversion functions are provided: -``` -float16 float_to_half_rn(float f); // convert to half precision in round-to-nearest-even mode -float half_to_float(float16 h); -``` -which provides one-to-one conversion between float32 and float16. These twos functions will do different conversion routines based on the current hardware. CUDA/ARM instrinsics will be used when the corresonding hardware is available. If the hardware or compiler level does not support float32 to float16 conversion, software emulation will be performed to do the conversion. - -## To do -After float16 class is available, some of the future items are below: - -- Update pybind/tensor_py.h to bind c++ float16 with numpy float16. - -- Modify `GetKernelType()` method in `framework/operator.h` to make it compatible with float16. - -- Create a type-casting operator that can convert the data type in tensor between float16 and other types. diff --git a/develop/doc_cn/_sources/design/fluid.md.txt b/develop/doc_cn/_sources/design/fluid.md.txt deleted file mode 100644 index f78fa8c1914124f33b9730f918c8887ced4f8d9d..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/fluid.md.txt +++ /dev/null @@ -1,114 +0,0 @@ -# Design Doc: PaddlePaddle Fluid - -## Why Fluid - -When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system at the time was Caffe. However, when PaddlePaddle was open-sourced in 2016, many other choices were available. There was a challenge -- what is the need for open sourcing yet another deep learning framework? - -Fluid is the answer. Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the "process" of training or inference using the concept of a model. In fact in PyTorch, TensorFlow Eager Execution and Fluid, there is no concept of a model at all. The details are covered in the sections below. Fluid is currently more extreme in the above mentioned idea than PyTorch and Eager Execution, and we are trying to push Fluid towards the directions of a compiler and a new programming language for deep learning. - -## The Evolution of Deep Learning Systems - -Deep learning infrastructure is one of the fastest evolving technologies. Within four years, there have already been three generations of technologies invented. - -| Existed since | model as sequence of layers | model as graph of operators | No model | -|--|--|--|--| -| 2013 | Caffe, Theano, Torch, PaddlePaddle | | | -| 2015 | | TensorFlow, MxNet, Caffe2, ONNX, n-graph | | -| 2016 | | | PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid | - -From the above table, we see that the deep learning technology is evolving towards getting rid of the concept of a model. To understand the reasons behind this direction, a comparison of the *programming paradigms* or the ways to program deep learning applications using these systems, would be helpful. The following section goes over these. - -## Deep Learning Programming Paradigms - -With the systems listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following: - -```python -x = layer.data("image") -l = layer.data("label") -f = layer.fc(x, W) -s = layer.softmax(f) -c = layer.mse(l, s) - -for i in xrange(1000): # train for 1000 iterations - m = read_minibatch() - forward({input=x, data=m}, minimize=c) - backward(...) - -print W # print the trained model parameters. -``` - -The above program includes two parts: - -1. The first part describes the model, and -2. The second part describes the training process (or inference process) for the model. - -This paradigm has a well-known problem that limits the productivity of programmers. If the programmer made a mistake in configuring the model, the error messages wouldn't show up until the second part is executed and `forward` and `backward` propagations are performed. This makes it difficult for the programmer to debug and locate a mistake that is located blocks away from the actual error prompt. - -This problem of being hard to debug and re-iterate fast on a program is the primary reason that programmers, in general, prefer PyTorch over the older systems. Using PyTorch, we would write the above program as following: - -```python -W = tensor(...) - -for i in xrange(1000): # train for 1000 iterations - m = read_minibatch() - x = m["image"] - l = m["label"] - f = layer.fc(x, W) - s = layer.softmax(f) - c = layer.mse(l, s) - backward() - -print W # print the trained model parameters. -``` - -We can see that the main difference is the moving the model configuration part (the first step) into the training loop. This change would allow the mistakes in model configuration to be reported where they actually appear in the programming block. This change also represents the model better, or its forward pass, by keeping the configuration process in the training loop. - -## Describe Arbitrary Models for the Future - -Describing the process instead of the model also brings Fluid, the flexibility to define different non-standard models that haven't been invented yet. - -As we write out the program for the process, we can write an RNN as a loop, instead of an RNN as a layer or as an operator. A PyTorch example would look like the following: - -```python -for i in xrange(1000): - m = read_minibatch() - x = m["sentence"] - for t in xrange x.len(): - h[t] = the_step(x[t]) -``` - -With Fluid, the training loop and the RNN in the above program are not really Python loops, but just a "loop structure" provided by Fluid and implemented in C++ as the following: - -```python -train_loop = layers.While(cond) -with train_loop.block(): - m = read_minibatch() - x = m["sentence"] - rnn = layers.While(...) - with rnn.block(): - h[t] = the_step(input[t]) -``` - -An actual Fluid example is described [here](https://github.com/PaddlePaddle/Paddle/blob/bde090a97564b9c61a6aaa38b72ccc4889d102d9/python/paddle/fluid/tests/unittests/test_while_op.py#L50-L58). - -From the example, the Fluid programs look very similar to their PyTorch equivalent programs, except that Fluid's loop structure, wrapped with Python's `with` statement, could run much faster than just a Python loop. - -We have more examples of the [`if-then-else`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/if_else_op.md) structure of Fluid. - -## Turing Completeness - -In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine. For a programming language, if it provides if-then-else and loop, it is Turing complete. From the above examples, Fluid seems to be Turing complete; however, it is noteworthy to notice that there is a slight difference between the `if-then-else` of Fluid and that of a programming language. The difference being that the former runs both of its branches and splits the input mini-batch into two -- one for the True condition and another for the False condition. This hasn't been researched in depth if this is equivalent to the `if-then-else` in programming languages that makes them Turing-complete. Based on a conversation with [Yuang Yu](https://research.google.com/pubs/104812.html), it seems to be the case but this needs to be looked into in-depth. - -## The Execution of a Fluid Program - -There are two ways to execute a Fluid program. When a program is executed, it creates a protobuf message [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree). - -There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program. - -Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md). - -## Backward Compatibility of Fluid - -Given all the advantages from the removal of the concept of a *model*, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph). Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators. - -For Fluid, we can write a converter that extracts the parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format. diff --git a/develop/doc_cn/_sources/design/fluid_compiler.md.txt b/develop/doc_cn/_sources/design/fluid_compiler.md.txt deleted file mode 100644 index 2a6beafc52e815fa067b273bb5887ddcf6ab15ae..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/fluid_compiler.md.txt +++ /dev/null @@ -1,110 +0,0 @@ -# PaddlePaddle Fluid: Towards a Compiled Programming Language - -As described in [fluid.md](fluid.md), when a Fluid application program -runs, it generates a `ProgramDesc` protobuf message as an intermediate -representation of itself. The C++ class `Executor` can run this -protobuf message as an interpreter. This article describes the Fluid -compiler. - -![](fluid-compiler.png) - -## ProgramDesc - -Before we go deeper into the idea of compiled language, let us take a -look at a simple example Fluid application. - -```python -import "fluid" - -func paddlepaddle() { - X = fluid.read(...) - W = fluid.Tensor(...) - Y = fluid.mult(X, W) -} -``` - -This program consists of a [block](block.md) of three operators -- -`read`, `assign`, and `mult`. Its `ProgramDesc` message looks like -the following - -```protobuf -message ProgramDesc { - block[0] = Block { - vars = [X, W, Y], - ops = [ - read(output = X) - assign(input = ..., output = W) - mult(input = {X, W}, output = Y) - ], - } -} -``` - -## Transpilers - -We can write a transpiler program that takes a `ProgramDesc`, e.g., -the above one, and outputs another `ProgramDesc`. Let us take some -examples: - -1. *Memory optimization transpiler*: We can write a transpiler that - inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so - to free memory early, before the end of an iteration, so to keep a - small memory footprint. - -1. *Distributed training transpiler*: We can write a transpiler that - converts a`ProgramDesc` into its distributed version of two - `ProgramDesc`s -- one for running by the trainer processes and the - other for the parameter server. - -In the rest of this article, we talk about a special kind of -transpiler, *Native code generator*, which takes a `ProgramDesc` and -generates a `.cu` (or `.cc`) file, which could be built by C++ -compilers (gcc, nvcc, icc) into binaries. - -## Native Code Generator - -For the above example, the native code generator transpiler, say, the -CUDA code generator, should generate a `main` function: - -```c++ -void main() { - auto X = fluid_cuda_read(...); - auto W = fluid_cuda_create_tensor(...); - auto Y = fluid_cuda_mult(X, W); -} -``` - -and the definitions of functions `fluid_cuda_read`, -`fluid_cuda_create_tensor`, and `fluid_cuda_mult`. Please be aware -that each function could just define a C++ instance of an operator and -run it. For example - -```c++ -paddle::Tensor fluid_cuda_read(...) { - paddle::Tensor t; - paddle::operator::Read r(&t, ...); - r.Run(); - return t; -} -``` - -For computational operators that have multiple *kernels*, each for a -specific hardware platform, for example, the `mult` operator, the -generated code should call its CUDA kernel: - -```c++ -paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a, - const paddle::Tensor& b) { - paddle::Tensor t; - paddle::operator::Mult m(a, b, ...); - Mult.Run(cuda_context); -} -``` - -where `cuda_context` could be a global variable of type -`paddle::CUDADeviceContext`. - -## Multi-Block Code Generation - -Most Fluid application programs may have more than one blocks. To -execute them, we need to trace [scopes](scope.md). diff --git a/develop/doc_cn/_sources/design/functions_operators_layers.md.txt b/develop/doc_cn/_sources/design/functions_operators_layers.md.txt deleted file mode 100644 index 984b59f4c6971dfb6f46dfe342f2751f392c0e88..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/functions_operators_layers.md.txt +++ /dev/null @@ -1,100 +0,0 @@ -# Design Doc: Functions, Operators, and Layers - -In a DL system, we can compose one or more fine grained operators into a coarse grained one. For example, the FC layer can be composed of a multiplication operator and an add operator. - -Historically, some fine grained operations are known as operators, and some coarse level ones are known as layers. But we need a well-defined separation. - -In general, operators are those very fine grained operations, e.g., mul and add. In the implementation, we can write them as C++ functions: - -```c++ -template T add(T x, T y) { return x + y; } -template T mul(T x, T y) { return x * y; } -``` - -Then we can wrap them into operators which are C++ classes and can be created from Python bindings by name. A C macro can do this. For example, the following macro invocation - -```c++ -#define MAKE_FUNCTION_OPERATOR(mul); -``` - -generates - -```c++ -template class mulOp : public OperatorBase {...}; -REGISTER_OP(mulOp, "mul"); -``` - -so that in Python we can create operator mul by: - -```python -X1 = Var() -X2 = Var() -Y = Var() -paddle.cpp.create_operator("mul", input=[X1, X2], output=Y) -``` - -Also, at the same time, we can compose a coarse level C++ operator class by composing functions `mul` and `add`: - -```c++ -template -class FCOp : public OperatorBase { - public: - void Run(...) { - add(mul(Input("X"), Input("W")), Input("b"); - } -}; -REGISTER_OP(FCOp, "fc"); -``` - -We need to support such composition in Python as well. To do so, we need a higher level Python wrapping of operator creation than `paddle.cpp.create_operator`. This higher level operator API should be compatible with the layer API. - -Let's explain using an example. Suppose that we are going to compose the FC using mul and add in Python, we'd like to have Python functions `mul` and `add` defined in module `operator`: - -```python -def operator.mul(X1, X2): - O = Var() - paddle.cpp.create_operator("mul", input={X1, Y1}, output=O) - return O - -def operator.add(X1, X2): - O = Var() - paddle.cpp.create_operator("add", input={X1, X2}, output=O) - return O -``` - -Above code snippets are automatically generated. Given them, users can define - -```python -def layer.fc(X): - W = Var() - b = Var() - return operator.add(operator.mul(X, W), b) -``` - -If we don't have `operator.mul` and `operator.add`, the definiton of `layer.fc` would be complicated: - -```python -def layer.fc(X): - W = Var() - b = Var() - O1 = Var() - paddle.cpp.create_operator("mul", input=[X, W], output=O1) - O2 = Var() - paddle.cpp.create_operator("add", input=[O1, b], output=O2) - return O2 -``` - -We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example: - - -| C++ functions/functors | mul | add | | | -|------------------------|--------------|--------------|-------------|----------| -| C++ operator class | mulOp | addOp | FCOp | | -| Python binding | operator.mul | operator.add | operator.fc | | -| Python function | | | | layer.fc | - - -This is how we differentiate layer and operators in PaddlePaddle: - -- those defined in C++ and have a lightweighted Python wrapper in module `operators` are operators; whereas -- those who don't have C++ implementations but a Python implementation that compose C++ operators are known as layers. diff --git a/develop/doc_cn/_sources/design/gan_api.md.txt b/develop/doc_cn/_sources/design/gan_api.md.txt deleted file mode 100644 index fb41df8615f73d9fd4c32995eab265833eac1a55..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/gan_api.md.txt +++ /dev/null @@ -1,253 +0,0 @@ -# Design for GAN - -GAN (General Adversarial Net [https://arxiv.org/abs/1406.2661]) is an important model for unsupervised learning and widely used in many areas. - -It applies several important concepts in machine learning system design, including building and running subgraphs, dependency tracing, different optimizers in one executor and so forth. - -In our GAN design, we wrap it as a user-friendly easily customized python API to design different models. We take the conditional DC-GAN (Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [https://arxiv.org/abs/1511.06434]) as an example due to its good performance on image generation. - -

                                -
                                -Figure 1. The overall running logic of GAN. The black solid arrows indicate the forward pass; the green dashed arrows indicate the backward pass of generator training; the red dashed arrows indicate the backward pass of the discriminator training. The BP pass of the green (red) arrow should only update the parameters in the green (red) boxes. The diamonds indicate the data providers. d\_loss and g\_loss marked in red and green are the two targets we would like to run. -

                                - -The operators, layers and functions required/optional to build a GAN demo is summarized in https://github.com/PaddlePaddle/Paddle/issues/4563. - -

                                -
                                -Figure 2. Photo borrowed from the original DC-GAN paper. -

                                - -## The Conditional-GAN might be a class. -This design we adopt the popular open source design in https://github.com/carpedm20/DCGAN-tensorflow and https://github.com/rajathkmp/DCGAN. It contains following data structure: - -- DCGAN(object): which contains everything required to build a GAN model. It provides following member functions methods as API: - -- __init__(...): Initialize hyper-parameters (like conv dimension and so forth), and declare model parameters of discriminator and generator as well. - -- generator(z, y=None): Generate a fake image from input noise z. If the label y is provided, the conditional GAN model will be chosen. -Returns a generated image. - -- discriminator(image): -Given an image, decide if it is from a real source or a fake one. -Returns a 0/1 binary label. - -- build_model(self): -build the whole GAN model, define training loss for both generator and discrimator. - -## Discussion on Engine Functions required to build GAN -- Trace the tensor and variable dependency in the engine executor. (Very critical, otherwise GAN can'be be trained correctly) -- Different optimizers responsible for optimizing different loss. - -To be more detailed, we introduce our design of DCGAN as following: - -### Class member Function: Initializer -- Set up hyper-parameters, including condtional dimension, noise dimension, batch size and so forth. -- Declare and define all the model variables. All the discriminator parameters are included in the list self.theta_D and all the generator parameters are included in the list self.theta_G. -```python -class DCGAN(object): - def __init__(self, y_dim=None): - - # hyper parameters - self.y_dim = y_dim # conditional gan or not - self.batch_size = 100 - self.z_dim = z_dim # input noise dimension - - # define parameters of discriminators - self.D_W0 = pd.Variable(shape=[3,3, 1, 128], data=pd.gaussian_normal_randomizer()) - self.D_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data - self.D_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer()) - self.D_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data - self.D_W2 = pd.Varialble(np.random.rand(128, 1)) - self.D_b2 = pd.Variable(np.zeros(128)) - self.theta_D = [self.D_W0, self.D_b0, self.D_W1, self.D_b1, self.D_W2, self.D_b2] - - # define parameters of generators - self.G_W0 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer()) - self.G_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data - self.G_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer()) - self.G_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data - self.G_W2 = pd.Varialble(np.random.rand(128, 1)) - self.G_b2 = pd.Variable(np.zeros(128)) - self.theta_G = [self.G_W0, self.G_b0, self.G_W1, self.G_b1, self.G_W2, self.G_b2] -``` - -### Class member Function: Generator -- Given a noisy input z, returns a fake image. -- Concatenation, batch-norm, FC operations required; -- Deconv layer required, which is missing now... -```python -class DCGAN(object): - def generator(self, z, y = None): - # input z: the random noise - # input y: input data label (optional) - # output G_im: generated fake images - - if not self.y_dim: - z = pd.layer.concat(1, [z, y]) - - G_h0 = pd.layer.fc(z, self.G_w0, self.G_b0) - G_h0_bn = pd.layer.batch_norm(G_h0) - G_h0_relu = pd.layer.relu(G_h0_bn) - - G_h1 = pd.layer.deconv(G_h0_relu, self.G_w1, self.G_b1) - G_h1_bn = pd.layer.batch_norm(G_h1) - G_h1_relu = pd.layer.relu(G_h1_bn) - - G_h2 = pd.layer.deconv(G_h1_relu, self.G_W2, self.G_b2)) - G_im = pd.layer.tanh(G_im) - return G_im -``` - -### Class member function: Discriminator -- Given a noisy input z, returns a fake image. -- Concatenation, Convolution, batch-norm, FC, Leaky-ReLU operations required; -```python -class DCGAN(object): - def discriminator(self, image): - # input image: either generated images or real ones - # output D_h2: binary logit of the label - - D_h0 = pd.layer.conv2d(image, w=self.D_w0, b=self.D_b0) - D_h0_bn = pd.layer.batchnorm(h0) - D_h0_relu = pd.layer.lrelu(h0_bn) - - D_h1 = pd.layer.conv2d(D_h0_relu, w=self.D_w1, b=self.D_b1) - D_h1_bn = pd.layer.batchnorm(D_h1) - D_h1_relu = pd.layer.lrelu(D_h1_bn) - - D_h2 = pd.layer.fc(D_h1_relu, w=self.D_w2, b=self.D_b2) - return D_h2 -``` - -### Class member function: Build the model -- Define data readers as placeholders to hold the data; -- Build generator and discriminators; -- Define two training losses for discriminator and generator, respectively. -If we have execution dependency engine to back-trace all tensors, the module building our GAN model will be like this: -```python -class DCGAN(object): - def build_model(self): - if self.y_dim: - self.y = pd.data(pd.float32, [self.batch_size, self.y_dim]) - self.images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size]) - self.faked_images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size]) - self.z = pd.data(tf.float32, [None, self.z_size]) - - # step 1: generate images by generator, classify real/fake images with discriminator - if self.y_dim: # if conditional GAN, includes label - self.G = self.generator(self.z, self.y) - self.D_t = self.discriminator(self.images) - # generated fake images - self.sampled = self.sampler(self.z, self.y) - self.D_f = self.discriminator(self.G) - else: # original version of GAN - self.G = self.generator(self.z) - self.D_t = self.discriminator(self.images) - # generate fake images - self.sampled = self.sampler(self.z) - self.D_f = self.discriminator(self.images) - - # step 2: define the two losses - self.d_loss_real = pd.reduce_mean(pd.cross_entropy(self.D_t, np.ones(self.batch_size)) - self.d_loss_fake = pd.reduce_mean(pd.cross_entropy(self.D_f, np.zeros(self.batch_size)) - self.d_loss = self.d_loss_real + self.d_loss_fake - - self.g_loss = pd.reduce_mean(pd.cross_entropy(self.D_f, np.ones(self.batch_szie)) -``` - -If we do not have dependency engine but blocks, the module building our GAN model will be like this: -```python -class DCGAN(object): - def build_model(self, default_block): - # input data in the default block - if self.y_dim: - self.y = pd.data(pd.float32, [self.batch_size, self.y_dim]) - self.images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size]) - # self.faked_images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size]) - self.z = pd.data(tf.float32, [None, self.z_size]) - - # step 1: generate images by generator, classify real/fake images with discriminator - with pd.default_block().g_block(): - if self.y_dim: # if conditional GAN, includes label - self.G = self.generator(self.z, self.y) - self.D_g = self.discriminator(self.G, self.y) - else: # original version of GAN - self.G = self.generator(self.z) - self.D_g = self.discriminator(self.G, self.y) - self.g_loss = pd.reduce_mean(pd.cross_entropy(self.D_g, np.ones(self.batch_szie)) - - with pd.default_block().d_block(): - if self.y_dim: # if conditional GAN, includes label - self.D_t = self.discriminator(self.images, self.y) - self.D_f = self.discriminator(self.G, self.y) - else: # original version of GAN - self.D_t = self.discriminator(self.images) - self.D_f = self.discriminator(self.G) - - # step 2: define the two losses - self.d_loss_real = pd.reduce_mean(pd.cross_entropy(self.D_t, np.ones(self.batch_size)) - self.d_loss_fake = pd.reduce_mean(pd.cross_entropy(self.D_f, np.zeros(self.batch_size)) - self.d_loss = self.d_loss_real + self.d_loss_fake -``` -Some small confusion and problems with this design: -- D\_g and D\_f are actually the same thing, but has to be written twice; i.e., if we want to run two sub-graphs conceptually, the same codes have to be written twice if they are shared by the graph. -- Requires ability to create a block anytime, rather than in if-else or rnn only; - -## Main function for the demo: -Generally, the user of GAN just need to the following things: -- Define an object as DCGAN class; -- Build the DCGAN model; -- Specify two optimizers for two different losses with respect to different parameters. -```python -# pd for short, should be more concise. -from paddle.v2 as pd -import numpy as np -import logging - -if __name__ == "__main__": - # dcgan class in the default graph/block - # if we use dependency engine as tensorflow - # the codes, will be slightly different like: - # dcgan = DCGAN() - # dcgan.build_model() - with pd.block() as def_block: - dcgan = DCGAN() - dcgan.build_model(def_block) - - # load mnist data - data_X, data_y = self.load_mnist() - - # Two subgraphs required!!! - with pd.block().d_block(): - d_optim = pd.train.Adam(lr = .001, beta= .1) - d_step = d_optim.minimize(dcgan.d_loss, dcgan.theta_D) - with pd.block.g_block(): - g_optim = pd.train.Adam(lr = .001, beta= .1) - g_step = pd.minimize(dcgan.g_loss, dcgan.theta_G) - - # executor - sess = pd.executor() - - # training - for epoch in xrange(10000): - for batch_id in range(N / batch_size): - idx = ... - # sample a batch - batch_im, batch_label = data_X[idx:idx+batch_size], data_y[idx:idx+batch_size] - # sample z - batch_z = np.random.uniform(-1., 1., [batch_size, z_dim]) - - if batch_id % 2 == 0: - sess.run(d_step, - feed_dict = {dcgan.images: batch_im, - dcgan.y: batch_label, - dcgan.z: batch_z}) - else: - sess.run(g_step, - feed_dict = {dcgan.z: batch_z}) -``` - -# More thinking about dependency engine v.s. block design: -- What if we just want to run an intermediate result? Do we need to run the whole block/graph? -- Should we call eval() to get the fake images in the first stage? And then train the discriminator in the second stage? diff --git a/develop/doc_cn/_sources/design/graph.md.txt b/develop/doc_cn/_sources/design/graph.md.txt deleted file mode 100644 index 7519a65df835a39fe14f6ef45530afff170191ff..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/graph.md.txt +++ /dev/null @@ -1,70 +0,0 @@ -# Design Doc: Computations as a Graph - -A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before. - -This document explains that the construction of a graph as three steps: - -- construct the forward part -- construct the backward part -- construct the optimization part - -## The Construction of a Graph - -Let us take the problem of image classification as a simple example. The application program that trains the model looks like: - -```python -x = layer.data("images") -l = layer.data("label") -y = layer.fc(x) -cost = layer.mse(y, l) -optimize(cost) -train(cost, reader=mnist.train()) -``` - -### Forward Part - -The first four lines of above program build the forward part of the graph. - -![](images/graph_construction_example_forward_only.png) - -In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x. `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b, and the initialization operators. - -Initialization operators are kind of "run-once" operators -- the `Run` method increments a class data member counter so to run at most once. By doing so, a parameter wouldn't be initialized repeatedly, say, in every minibatch. - -In this example, all operators are created as `OpDesc` protobuf messages, and all variables are `VarDesc`. These protobuf messages are saved in a `BlockDesc` protobuf message. - -### Backward Part - -The fifth line `optimize(cost)` calls two functions, `ConstructBackwardGraph` and `ConstructOptimizationGraph`. - -`ConstructBackwardGraph` traverses the forward graph in the `BlockDesc` protobuf message and builds the backward part. - -![](images/graph_construction_example_forward_backward.png) - -According to the chain rule of gradient computation, `ConstructBackwardGraph` would - -1. create a gradient operator G for each operator F, -1. make all inputs, outputs, and outputs' gradient of F as inputs of G, -1. create gradients for all inputs of F, except for those who don't have gradients, like x and l, and -1. make all these gradients as outputs of G. - -### Optimization Part - -For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient. Here results in the complete graph: - -![](images/graph_construction_example_all.png) - -## Block and Graph - -The word block and graph are interchangable in the desgin of PaddlePaddle. A [Block](https://github.com/PaddlePaddle/Paddle/pull/3708) is a metaphore of the code and local variables in a pair of curly braces in programming languages, where operators are like statements or instructions. A graph of operators and variables is a representation of the block. - -A Block keeps operators in an array `BlockDesc::ops` - -```protobuf -message BlockDesc { - repeated OpDesc ops = 1; - repeated VarDesc vars = 2; -} -``` - -in the order that they appear in user programs, like the Python program at the beginning of this article. We can imagine that in `ops`, we have some forward operators, followed by some gradient operators, and then some optimization operators. diff --git a/develop/doc_cn/_sources/design/graph_survey.md.txt b/develop/doc_cn/_sources/design/graph_survey.md.txt deleted file mode 100644 index 6c6db08f463ae0a2b94fc4546f123a1d7c151870..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/graph_survey.md.txt +++ /dev/null @@ -1,232 +0,0 @@ -## Survey on Graph - -Neural network framework often provides symbolic API for users to write network topology conveniently. This doc manily focus on symbolic API in most popular neural network frameworks, and try to find out how to parse symbolic configuration to a portable file, such as protobuf or json. - -### Mxnet - -The core concept of symbolic API is `Symbol`. Mxnet implements `Symbol` class in C++, and export to Python using C-API. Please refer to the comments in Mxnet: - - -`Symbol` is help class used to represent the operator node in Graph. -`Symbol` acts as an interface for building graphs from different components like Variable, Functor and Group. `Symbol` is also exported to python front-end (while Graph is not) to enable quick test and deployment. Conceptually, symbol is the final operation of a graph and thus including all the information required (the graph) to evaluate its output value. - - -A simple network topology wrote by Symbol is as follows: - -```python -def get_symbol(num_classes=10, **kwargs): - data = mx.symbol.Variable('data') - data = mx.symbol.Flatten(data=data) - fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128) - act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu") - fc2 = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64) - act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu") - fc3 = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=num_classes) - mlp = mx.symbol.SoftmaxOutput(data = fc3, name = 'softmax') - return mlp -``` - - - -Varible here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own NodeAttr. There is a op field in NodeAttr class, when a Symbol represents Variable(often input data), the op field is null. - -Symbol contains a data member, std::vector outputs, and NodeEntry cantains a poniter to Node. We can follow the Node pointer to get all the Graph. - -And Symbol can be saved to a Json file. - -Here is a detailed example: - -``` ->>> import mxnet as mx ->>> data = mx.symbol.Variable('data') ->>> print data.debug_str() -Variable:data - ->>> data = mx.symbol.Flatten(data=data) ->>> print data.debug_str() -Symbol Outputs: - output[0]=flatten0(0) -Variable:data --------------------- -Op:Flatten, Name=flatten0 -Inputs: - arg[0]=data(0) version=0 - ->>> fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128) ->>> print fc1.debug_str() -Symbol Outputs: - output[0]=fc1(0) -Variable:data --------------------- -Op:Flatten, Name=flatten0 -Inputs: - arg[0]=data(0) version=0 -Variable:fc1_weight -Variable:fc1_bias --------------------- -Op:FullyConnected, Name=fc1 -Inputs: - arg[0]=flatten0(0) - arg[1]=fc1_weight(0) version=0 - arg[2]=fc1_bias(0) version=0 -Attrs: - num_hidden=128 - -``` - - -### TensorFlow - - -The core concept of symbolic API is `Tensor`. Tensorflow defines `Tensor` in Python. Please refer to the comments in TensorFlow: - -A `Tensor` is a symbolic handle to one of the outputs of an `Operation`. It does not hold the values of that operation's output, but instead provides a means of computing those values in a TensorFlow [Session](https://www.tensorflow.org/api_docs/python/tf/Session). - -A simple example is as follows: - -```python - # Build a dataflow graph. - c = tf.constant([[1.0, 2.0], [3.0, 4.0]]) - d = tf.constant([[1.0, 1.0], [0.0, 1.0]]) - e = tf.matmul(c, d) - - # Construct a `Session` to execute the graph. - sess = tf.Session() - - # Execute the graph and store the value that `e` represents in `result`. - result = sess.run(e) -``` - - -The main method of `Tensor` is as follows: - - -```python -@property -def op(self): - """The `Operation` that produces this tensor as an output.""" - return self._op - -@property -def dtype(self): - """The `DType` of elements in this tensor.""" - return self._dtype - -@property -def graph(self): - """The `Graph` that contains this tensor.""" - return self._op.graph - -@property -def name(self): - """The string name of this tensor.""" - if not self._op.name: - raise ValueError("Operation was not named: %s" % self._op) - return "%s:%d" % (self._op.name, self._value_index) - -@property -def device(self): - """The name of the device on which this tensor will be produced, or None.""" - return self._op.device -``` - - -Tensor can be taken as target to run by session. Tensor contains all the information of Graph, and tracks data dependency. - - -Here is a detailed example: - - -``` ->>> import tensorflow as tf ->>> c = tf.constant([[1.0, 2.0], [3.0, 4.0]]) ->>> print c.graph - ->>> d = tf.constant([[1.0, 1.0], [0.0, 1.0]]) ->>> print d.graph - ->>> e = tf.matmul(c, d) ->>> print e.graph - -``` - -### Dynet - - -The core concept of symbolic API is `Expression`, and Dynet defines `Expression` class in C++. - - -A simple example is as follows: - -```cpp -ComputationGraph cg; -Expression W = parameter(cg, pW); - -Expression in = input(cg, xs[i]); -Expression label = input(cg, ys[i]); -Expression pred = W * in; -Expression loss = square(pred - label); -``` - -The input data and parameter are also represented by Expression. Every basci Expression corresponds to a Node. And input data is also a Node. - -Expression has a data member ComputationGraph, and ComputationGraph will be modified in users' configuring process. Expression can be a running target, beacuse Expression contains all dependency. - - -Here is a detailed example: - -write topology in C++ - -``` -ComputationGraph cg; -Expression W = parameter(cg, pW); -cg.print_graphviz(); - -Expression pred = W * xs[i]; -cg.print_graphviz(); - -Expression loss = square(pred - ys[i]); -cg.print_graphviz(); -``` - -compile and print - -``` -# first print -digraph G { - rankdir=LR; - nodesep=.05; - N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"]; -} -# second print -digraph G { - rankdir=LR; - nodesep=.05; - N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"]; - N1 [label="v1 = v0 * -0.98"]; - N0 -> N1; -} -# third print -digraph G { - rankdir=LR; - nodesep=.05; - N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"]; - N1 [label="v1 = v0 * -0.98"]; - N0 -> N1; - N2 [label="v2 = -1.88387 - v1"]; - N1 -> N2; - N3 [label="v3 = -v2"]; - N2 -> N3; - N4 [label="v4 = square(v3)"]; - N3 -> N4; -} -``` - -### Conclusion - - -Actually, Symbol/Tensor/Expression in Mxnet/TensorFlow/Dynet are the same level concepts. We use a unified name Expression here, this level concept has following features: - -- Users wirte topoloy with symbolic API, and all return value is Expression, including input data and parameter. -- Expression corresponds with a global Graph, and Expression can also be composed. -- Expression tracks all dependency and can be taken as a run target diff --git a/develop/doc_cn/_sources/design/if_else_op.md.txt b/develop/doc_cn/_sources/design/if_else_op.md.txt deleted file mode 100644 index 26d140f06db4ecefa86be015eaa731ffddc6910c..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/if_else_op.md.txt +++ /dev/null @@ -1,51 +0,0 @@ -# The `IfElse` Operator - -PaddlePaddle's `IfElse` operator differs from TensorFlow's: - -- the TensorFlow version takes a scalar boolean value as the condition so that the whole mini-batch goes to either the true or the false branch, whereas -- the PaddlePaddle version takes a vector of boolean value as the condition, and instances corresponding to true values go to the true branch, those corresponding to false values go to the false branch. - -## Example - -The following PaddlePaddle program shows the usage of the IfElse operator: - -```python -import paddle as pd - -x = minibatch([10, 20, 30]) # shape=[None, 1] -y = var(1) # shape=[1], value=1 -z = minibatch([10, 20, 30]) # shape=[None, 1] -cond = larger_than(x, 15) # [false, true, true] - -ie = pd.ifelse() -with ie.true_block(): - d = pd.layer.add(x, y) - ie.output(d, pd.layer.softmax(d)) -with ie.false_block(): - d = pd.layer.fc(z) - ie.output(d, d+1) -o1, o2 = ie(cond) -``` - -A challenge to implement the `IfElse` operator is to infer those variables to be split, or, say, to identify the variable of the mini-batch or those derived from the mini-batch. - -An equivalent C++ program is as follows: - -```c++ -namespace pd = paddle; - -int x = 10; -int y = 1; -int z = 10; -bool cond = false; -int o1, o2; -if (cond) { - int d = x + y; - o1 = z; - o2 = pd::layer::softmax(z); -} else { - int d = pd::layer::fc(z); - o1 = d; - o2 = d+1; -} -``` diff --git a/develop/doc_cn/_sources/design/infer_var_type.md.txt b/develop/doc_cn/_sources/design/infer_var_type.md.txt deleted file mode 100644 index d9d5397becba2ef1806d9341cd49cd9aabbf4a6a..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/infer_var_type.md.txt +++ /dev/null @@ -1,78 +0,0 @@ -# Design Doc: InferVarType - -## The Problem Posed - -The variable in our design can hold variant types. Such as `LoDTensor` and `SelectedRows`. An operator should be able to inference the variable types of its output. - -For example, a `lookup table` operator takes two `LoDTensor`; one is a float tensor as the embedding table, the other is an int tensor as word ID. The gradient operator of `lookup table` will generate a `SelectedRows` as its output. A `sum` operator can take both `LoDTensor` and `SelectedRows` as its inputs and will generate a `LoDTensor` if any of its inputs is `LoDTensor`, otherwise, the `sum` operator will generate `SelectedRows` as its output. - -The variable type will be constant at runtime. Every variable's type can either be set by the user (input data and parameter) or be inferred by the operator in compile time. - -## Proposed Solution - -The `InferVarType` is a compile-time function which is registered to each operator. The inferface of that function is: - - -```c++ -using InferVarTypeFN = std::function< - void (const OpDescBind& /*op_desc*/, BlockDescBind* /*block*/)>; -``` - -It takes an operator description as its input and will write the output variable type and store them in block description. - -The `InferVarTypeFN` will be registered in `OpInfo`, to replace `infer_var_type_` field. The `OpInfo` should be - -```cpp -struct OpInfo { - InferVarTypeFN infer_var_type_; - ... -}; -``` - -The default `InferVarType` will set output type as `LoDTensor`. It can be done by `GetInferVarType()`. - -```cpp -void DefaultInferVarType(const OpDescBind& op_desc, BlockDescBind* block) { - // set the output type of variable as `LoDTensor`. - // ... -} - -struct OpInfo { - InferVarTypeFN infer_var_type_; - InferVarTypeFN GetInferVarType() const { - if (infer_var_type_) { - return infer_var_type_; - } else { - return DefaultInferVarType; - } - } -}; -``` - -## Register InferVarType - -We provide a thin base class for registering an `InferVarTypeFN`. To use a base class will ease the implementation of registry since we can detect the registry entry is an `InferVarTypeFN` or not. - -```cpp -class VarTypeInferer { -public: - virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const = 0; -} -``` - -Operator developers can write the specialize `VarTypeInferer` as follow. - -```cpp -class SpecialVarTypeInferer : public VarTypeInferer { -public: - virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const { - // .. own logic - } -} -``` - -Then user can register the `InferVarType` just like `GradOpDescMaker` and `OpInfoMaker`. - -``` -REGISTER_OPERATOR(some_op, OpType, SpecialVarTypeInferer, ...); -``` diff --git a/develop/doc_cn/_sources/design/kernel_hint_design.md.txt b/develop/doc_cn/_sources/design/kernel_hint_design.md.txt deleted file mode 100644 index a54b7da045e1a362626ef066f9ebb56af2c3181a..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/kernel_hint_design.md.txt +++ /dev/null @@ -1,57 +0,0 @@ -## Problem -In PaddlePaddle's [Design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md), one Operator may have multiple kernels. Users may have some personal preference to choose a certain type of kernel for an operator, such as `force_cpu` to choose a CPU kernel, `use_cudnn` to choose a CUDNN kernel, we need to provide a way for users to do this. - -In the current design, we use KernelType to describe one kernel. - -```cpp -struct KernelType { - Place place_; - DataType data_type_; - LayoutType layout_; -}; -``` - `place_` `data_type_` and `layout_` can be got from the input tensors of the operator, `GetActualKernelType(inputs)` use inputs to infer the proper kernel key that fit the incoming data, but users can not directly configure it. - -The [design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md) also provides a virtual method `GetExpectedKernelType` that user can overload and use to choose the KernelType they want to use. - -So we should send the information user defined in proto to `GetExpectedKernelType` for choosing a kernel. - -The problem is, how should we define and send the information for `GetExpectedKernelType` to use? - -## Solution - -### Potential choice -1. Do nothing, let the user add the information they want to operator‘s attribute and get them inside `GetExpectedKernelType`, this can work properly. But there is a little problem that users may define many kinds of hints for the same purpose, such as `force_cpu`, `use_cpu`, `cpu_kernel` to choose CPU kernel, and `use_cudnn`, `force_cudnn`, `cudnn_kernel` to choose CUDNN kernel. - -2. Pre-define all the needed option and use a single attr key such as `kernel_hint` for the user, this is not so flexible if the user wants to define some more kind of hint. - -### Final choice -To provide enough flexibility while avoiding confusion definition, we can define some global constants for these attribute names, such as `force_cpu`, `use_cudnn`, `use_mkldnn` for a user to choose. - -In C++ - -```cpp -const std::string kForceCPU = "force_cpu"; -const std::string kUseCUDNN = "use_cudnn"; -const std::string kUseMKLDNN = "use_mkldnn"; - -KernelType GetExpectedKernelType() { - if (Attr(kForceCPU)) { - return KernelType(CPUPlace, ...) - } else { - ... - } -} -``` - -In Python code - -```python -FORCE_CPU = core.kForceCPU() - -def xx_layer(..., force_cpu=false): - layer_helper = LayerHelper(...) - layer_helper.append_op( - type="xx", - attr={FORCE_CPU: force_cpu}) -``` diff --git a/develop/doc_cn/_sources/design/kernel_selection.md.txt b/develop/doc_cn/_sources/design/kernel_selection.md.txt deleted file mode 100644 index 9719e031c70979cd95400701efd30879662e19bc..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/kernel_selection.md.txt +++ /dev/null @@ -1,99 +0,0 @@ -## Background -Every operator has many kernels because there are multiple data types, places, data layout, library type that Fluid supports. We use the `OpKernelType ` to describe kernel types that operators can hold. - -The `OpKernelType ` is as follows: - -```cpp -struct OpKernelType { - Place place_; - DataType data_type_; - DataLayout data_layout_; - LibraryType library_type_; -}; -``` - -- The `place_` is a descriptor of the device, e.g., CPUPlace, CUDAPlace. - -- The `data_type_` is the data type that this kernel performs on, e.g., `FP32`, `INT64`. Note that one kernel may have inputs with different data types. However, it will be a major `data_type`. For example, the `cross_entropy` takes `int64` as it label, and `double`/`float` as its input logit and output cost. The major `data_type` of `cross_entropy` is `float` or `double`. - -- The `data_layout_ ` is useful for some computational library. One example is that MKLDNN uses many kinds of layout, such as `nChw8c`. Each kind of layout will invoke the different kernel. - -- The `library_type_` describes the computational library, e.g., `MKLDNN`, `CUDNN`. - -## Problem - -We register a kernel for every operator and every kernel type ideally. However, it is impracticable for the following situations. - -1. Some operators, like CRF, are complicated and inefficient to be implemented on GPU. The CRF operator will only have a CPU kernel. -2. Some operators will take too many memory. It is better to force them into CPU. However, the rest of operators in this neural network will be performed on GPU, i.e., model parallel problem. -3. Some layout and place are particular. One example is that MKLDNN uses `nChw8` and there is no other library uses `nChw8c`. - -Take one situation to give a detailed explanation, if we have two Operators: OP1 and OP2, OP1 has one output `op1_to_op2`, and `op1_to_op2` is the input of OP2. - -If OP1 and OP2 run on the same place(for example CPUPlace), then `op1_2_op2` can be used directly by OP2. - -``` -OP1(CPUPlace) - | - op1_2_op2 - | -OP2(CPUPlace) -``` - -If OP1 and OP2 run one different place, then OP2 cannot `use op1_2_op2` directly. - -Problems under these situations are similar. We can formalize this problem as follow. - -We register kernels with types $KT = \{kt_1, kt_2, kt_3, ...\}$ for one operator. The inputs of this operator should be run on kernel type $kt_{?}$, which the $kt_{?} \notin KT$. How to cast the input of this operator from $kt_{?}$ to any of kernel type in $KT$. - -## Solution: data transform - -It is clear that transforming inputs of an operator to adapt another kernel type is not related to the particular operator. So we should register these transformation methods as global methods. - -We can infer kernel type for each input of an operator. We let this kernel type as `actual kernel type for var`, which means this kernel type is the kernel type that can process this input variable. - -We can get a kernel type by 1) The configuration of operator description. (Users may want to force use `MKL` for `conv` operator). 2) The place of the current executor. (Executor is running on GPU). This kernel type is what we expect the operator will be performed on. We let this kernel type as `expect kernel type`. - -We transform the input data from `actual` to `expect` if the actual kernel type is not as same as expect kernel type. - -The algorithm is described as following - -```cpp -void OperatorWithKernel::Run( - const Scope& scope, - const platform::Place& place) const { - ExecutionContext ctx(...); - auto expected_kernel_key = this->GetExpectedKernelType(ctx); - - Scope& new_scope = scope.NewScope(); - - for (auto& var_name : this->Inputs()) { - auto* tensor_in = GetTensor(var_name); - auto kernel_type_for_var = this->GetKernelTypeForVar(...); - if (kernel_type_for_var.place_ != expected_kernel_key.place_) { - auto* trans_var = new_scope.Var(var_name); - auto* out = DataTransform(expected_kernel_key, - kernel_type_for_var, - *tensor_in); - CopyVariableWithTensor(...); - } - } - - auto kernel = kernels.find(expected_kernel_key); - kernel->Compute(ExecutionContext(...)); -} -``` - -then the actual process for the multi-device above will be: - -``` -OP1(CPUPlace) - | -op1_2_op2(on CPU) - | -[transform](from CPU to GPU) - | -op1_2_op2(on GPU) - | -OP2(CUDAPlace) -``` diff --git a/develop/doc_cn/_sources/design/memory_optimization.md.txt b/develop/doc_cn/_sources/design/memory_optimization.md.txt deleted file mode 100644 index 285464ada728d8f7a086a26beca6cfa4418e98e4..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/memory_optimization.md.txt +++ /dev/null @@ -1,217 +0,0 @@ -# Memory Optimization - - -## Problem - -In a lecture from Andrew Ng, he attributes the recent sucess of AI due to a combination of these: - -- Availability of Big Data -- Supercomputing power to process this Big Data over very large neural networks -- Modern algorithms - -Following graph shows the details: - -![](images/deep_learning.png) - -Larger model usually bring better performance. However, GPU memory is limited. For example, the memory size of a GTX TITAN X is only 12GB. To train complex and large models, we have to take care of memory usage. Besides, memory optimization is also necessary in both online/mobile inference. - -## Solution - -### Basic Strategy - -There are some basic strategies to improve memory usage, including in-place operations and memory sharing. - -#### In-place Operation -In a relu activation operator: - -$y = \max(x, 0)$ - -If the variable x is not used in any other operator, we can make an in-place operation. In other words, the memory block of variable y and variable x will be the same. In-place operations will save 50% memory occupancy immediately. - -#### Memory Sharing - -Not all operators support in-place operations. Memory sharing is a more general strategy. - -Following is an example: - -``` -a = op1(b, c); -d = op2(a) -e = op3(d, f) -``` - -In this case, variable a is no longer used, and op2 does not support in-place operation. After op2 finishes, we can put the memory of variable a to a memory pool. Then, variable e can share the memory of variable a from the pool. - - -### Live Variable Analysis - -It's not enough to only have some basic strategies. The pre-requisite of memory optimization is to know if a variable is still "live" after an operation. - -In our design, the neural network topology is defined as a program. Luckily, [live variable analysis](https://en.wikipedia.org/wiki/Live_variable_analysis) is a classic problem in compilers which can be used in many stages, such as register allocation. - -In compilers, the front end of the compiler translates programs into an intermediate language with an unbounded number of temporary variables. This program must run on a machine with a bounded number of registers. Two temporary variables a and b can fit into the same register, if a and b are never "in use" at the same time. Thus, many temporary variables can fit in few registers; if they don't all fit, the excess tempory variables can be kept in memory. - -Therefore, the compiler needs to analyze the intermediate-representation program to determine which temporary variables are in use at the same time. We say a variable is "live" if it holds a value that may be needed in the future, so this analysis is called liveness analysis. - -We can leran these techniques from compilers. There are mainly two stages to make live variable analysis: - -- construct a control flow graph -- solve the dataflow equations - - -#### Control Flow Graph -To perform analysis on a program, it is often useful to make a control flow graph. A [control flow graph](https://en.wikipedia.org/wiki/Control_flow_graph) (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. Each statement in the program is a node in the flow graph; if statemment x can be followed by statement y, there is an egde from x to y. - -Following is the flow graph for a simple loop. - -![](images/control_flow_graph.png) - -#### Dataflow Analysis - -Liveness of variable "flows" around the edges of the control flow graph; determining the live range of each variable is an example of a dataflow problem. [Dataflow analysis](https://en.wikipedia.org/wiki/Data-flow_analysis) is a technique for gathering information about the possible set of values calculated at various points in a computer program. - -A simple way to perform data-flow analysis of programs is to set up dataflow equations for each node of the control flow graph and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes. - -- Flow Graph Terminology - -A flow graph node has out-edges that lead to sucessor nodes, and in-edges that come from predecessor nodes. The set *pred[n]* is all the predecessors of node n, and *succ[n]* is the set of sucessors. -In former control flow graph, the out-edges of node 5 are 5 --> 6 and 5 --> 2, and *succ[5]* = {2, 6}. The in-edges of 2 are 5 --> 2 and 1 --> 2, and *pred[2]* = {1, 5}. - -- Uses and Defs - -An assignmemt to a variable or temporary defines that variable. An occurence of a variable on the right-hand side of an assginment(or in other expressions) uses the variable. We can define the *def* of a variable as the set of graph nodes that define it; or the *def* of a graph node as the set of variables that it defines; and the similarly for the *use* of a variable or graph node. In former control flow graph, *def(3)* = {c}, *use(3)* = {b, c}. - -- Liveness - -A variable is *live* on an edge if there is a directed path from that edge to a *use* of the variable that does not go through any *def*. A variable is *live-in* at a node if it is live on any of the in-edges of that node; it is *live-out* at a node if it is live on any of the out-edges of the node. - - -The calcution of liveness can be solved by iteration until a fixed pointer is reached. Following is the recursive formula: - -![](images/dataflow_equations.png) - -### Memory optimization transpiler - -At last, we take basic strategy and liveness analysis techniques learning from compilers to implement our memory optimization transpiler. - -#### add in-place attribute - -In-place is a built-in attribute of an operator. Since we treat in-place and other operators differently, we have to add an in-place attribute for every operator. - - -#### contruct control flow graph - -Following is the ProgramDesc protobuf of [machine translation](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_machine_translation.py) example. - -- Block0: - -``` -lookup_table -mul -... -while(sub-block idx 1) -... -array_to_lod_tensor -cross_entropy -... -while_grad(sub-block idx 2) -read_from_array -array_to_lod_tensor -... -``` - -- Block1 - -``` -read_from_array -read_from_array -... -write_to_array -increment -write_to_array -less_than -``` - -- Block2 - -``` -read_from_array -increment -... -write_to_array -write_to_array -``` - -We can transfer all the operators and variables in ProgramDesc to build a control flow graph. - -```python -class ControlFlowGraph(object): - def __init__(self, Program): - self._sucessors = defaultdict(set) - self._presucessors = defaultdict(set) - self._uses = defaultdict(set) - self._defs = defaultdict(set) - self._live_in = defaultdict(set) - self._live_out = defaultdict(set) - self._program = Program - - def build(self): - pass - - def dataflow_analysis(self): - pass - - def memory_optimization(self): - pass - - def get_program(self): - return self._program -``` - -#### Make dataflow analysis - -We follow the guide from compilers and try to solve the dataflow equation to get liveness of every variable. If the live-in of an operator node is different from the live-out, then we can make memory sharing. - -For example: - -``` -a = op1(b, c); -d = op2(a) -e = op3(d, f) -``` - -The dataflow analysis result is: - -``` -live_in(op1) = {b, c, f} -live_out(op1) = {a, f} - -live_in(op2) = {a, f} -live_out(op2) = {d, f} - -live_in(op3) = {d, f} -live_out(op3) = {} -``` - -After op1, we can process variable b and variable c; After op2, we can process variable a. After op3, we can process variable d and variable f. - -#### memory sharing policy - -A memory pool will be mantained in the stage of memory optimization. Each operator node will be scanned to determine memory optimization is done or not. If an operator satifies the requirement, following policy will be taken to handle input/output variables. - -``` -if op.support_inplace(): - i --> pool - pool --> o -else: - pool --> o - i --> pool -``` - - - -## Reference - -- [Lecture Notes From Artificial Intelligence Is The New Electricity By Andrew Ng](https://manavsehgal.com/lecture-notes-from-artificial-intelligence-is-the-new-electricity-by-andrew-ng-4712dcbf26e5) -- Modern compiler implementation in ML, by Andrew W. Appel -- [Optimizing Memory Consumption in Deep learning](https://mxnet.incubator.apache.org/architecture/note_memory.html) diff --git a/develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt b/develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt deleted file mode 100644 index 0123315ad4368e68b377f66119949bfd6c1c7860..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/mkl/mkl_packed.md.txt +++ /dev/null @@ -1,108 +0,0 @@ -# Intel® MKL Packed on PaddlePaddle: Design Doc - - -## Contents - -- [Overview](#overview) -- [Key Points](#key-points) - - [Background](#background) - - [Solution](#solution) -- [Actions](#actions) - - [CMake](#cmake) - - [Layers](#layers) - - [Unit Tests](#unit-tests) - - [Python API](#python-api) - - [Benchmarking](#benchmarking) - - -## Overview -我们计划将 Intel® MKL 中引入的 GEMM Packed APIs\[[1](#references)\] 集成到 PaddlePaddle 中,充分发挥英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。 -现阶段的优化主要针对 Recurrent Neural Network(以下简称RNN)相关层(包括`RecurrentLayer`, `GatedRecurrentLayer`和`LstmLayer`), 以及 PaddlePaddle V1 API。 - -## Key Points - -### Background -目前PaddlePaddle采用了 Intel® MKL库的[cblas_?gemm](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm)函数,这个函数本身会在计算前将原数据转换为更适合英特尔平台的内部格式。 - -1. 转换耗时 \ -这一数据格式的转换操作(Packing),在问题本身的计算量比较小的时候,显得相对来说较为耗时。例如在DeepSpeech2 \[[2](#references)\] 的Vanilla RNN部分中,矩阵大小是`batch_size * 2048`。 -2. 转换冗余 \ -由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。 - -为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API: - * [cblas_?gemm_alloc](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-alloc) - * [cblas_?gemm_pack](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-pack) - * [cblas_?gemm_compute](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-compute) - * [cblas_?gemm_free](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm-free) - -通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。 - -### Solution -在RNN的情况下,同一次前向、后向(forward/backward)过程中所有时间步(time step)共享同一个权重(weight)。当只做推断(inference)时,各次前向之间也都使用了相同的权重,没有必要在每次前向中每个时间步的计算时对权重进行重复的Packing操作。 - -我们通过使用新引入的GEMM Packed APIs,在层初始化的时候,先完成对权重的Packing操作,然后在前向,后向时复用已经转换过的权重,并在每次权重更新后,对新的权重进行转换用于下次迭代。 - -* 优化前,对于序列长度(sequence length)为`T`的网络模型(model), `N`次迭代执行的转换次数为: - - `inference`: `N * T` - - `training`: `2 * N * T` -* 优化后,对于同样设置的网络模型,其转换次数减少至: - - `inference`: `1` - - `training`: `2 * N` - -## Actions - -添加的相关文件和目录结构如下: - -```txt -PaddlePaddle/Paddle -├── ... -└── paddle/ - ├── ... - └── gserver/ - ├── ... - ├── layers/ - │ ├── ... - │ ├── MKLPackedRecurrentLayer.* - | ├── MKLPackedGatedRecurrentLayer.* - | ├── MKLPackedLstmLayer.* - | └── MKLPackedGemm.h - └── tests/ - ├── ... - └── test_MKLPacked.cpp -``` - -### CMake -在对应的`CMakeLists.txt`中根据`WITH_MKL`是否打开,来决定是否开启MKL Packed相关功能。 - -### Layers -所有的`MKLPacked*Layer`都继承于PaddlePaddle的基类`Layer`, 并添加头文件 `MKLPackedGemm.h`,该文件对相关GEMM Packed APIs做了封装。 - -### Unit Tests -我们会添加`test_MKLPacked.cpp`用于MKL Packed优化后layer的测试。 -对于每一个新加的RNN layer,我们会对比如下2个方面: -1. 对比优化后layer自身,sequence mode(`rnn_use_batch=false`)与batch mode(`rnn_use_batch=true`)的结果。 -2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。 - -### Python API -计划在`paddle/utils.Flags`中添加`use_mkl_packed`的flag,用于选择是否使用相关功能,并且当编译时`WITH_MKL=ON`的情况下,默认设置为`true`。 - -同时,在`python/paddle/trainer/config_parser.py`中对应的layer处,添加`use_mkl_packed`这个选择,方便用户在Python端选择是否启用这个功能。 - -具体实现方式比如: - -```python -use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0))) -if use_mkl_packed: - self.layer_type = mkl_packed_* -``` - -所有相关的`layer_type`会以*mkl_packed_*开头,这些会在`MKLPacked*Layer`注册layer的时候保证,以示区分。 - - -### Benchmarking -会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。 - -## References -1. [Introducing the new Packed APIs for GEMM](https://software.intel.com/en-us/articles/introducing-the-new-packed-apis-for-gemm) -2. [DeepSpeech2 on PaddlePaddle](https://github.com/PaddlePaddle/DeepSpeech#deepspeech2-on-paddlepaddle) - diff --git a/develop/doc_cn/_sources/design/mkl/mkldnn.md.txt b/develop/doc_cn/_sources/design/mkl/mkldnn.md.txt deleted file mode 100644 index e2fe1e6b26ffa73fda81863abfadf697c0acbfcf..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/mkl/mkldnn.md.txt +++ /dev/null @@ -1,210 +0,0 @@ -# Intel® MKL-DNN on PaddlePaddle: Design Doc - -我们计划将英特尔深度神经网络数学库[Intel MKL-DNN](https://github.com/01org/mkl-dnn) -(Intel Math Kernel Library for Deep Neural Networks)集成到PaddlePaddle, -充分展现英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。 - -
                                -
                                -Figure 1. PaddlePaddle on IA -
                                - -近期目标 - -- 完成常用Layer的MKL-DNN实现。 -- 完成常见深度神经网络VGG,GoogLeNet 和 ResNet的MKL-DNN实现。 - -目前的优化,主要针对PaddlePaddle在重构之前的代码框架以及V1的API。 -具体的完成状态可以参见[这里](https://github.com/PaddlePaddle/Paddle/projects/21)。 - -## Contents - -- [Overview](#overview) -- [Actions](#actions) - - [CMake](#cmake) - - [Matrix](#matrix) - - [Layers](#layers) - - [Activations](#activations) - - [Parameters](#parameters) - - [Gradients](#gradients) - - [Unit Tests](#unit-tests) - - [Python API](#python-api) - - [Benchmarking](#benchmarking) - - [Others](#others) -- [Design Concerns](#design-concerns) - -## Overview - -我们会把MKL-DNN会作为第三方库集成进PaddlePaddle,与其他第三方库一样,会在编译PaddlePaddle的时候下载并编译MKL-DNN。 - -同时,为了进一步提升PaddlePaddle在基本数学运算的计算速度,我们也将MKLML即(MKL small library\[[1](#references)\]) -作为另一个第三方库集成进PaddlePaddle,它只会包括生成好的动态库和头文件。 - -MKL,MKLML以及MKL-DNN三者关系如下表: - -| Name | Open Source | License | Descriptions | -| :---------- | :--------------- | :---------- | :------------ | -| MKL | No | Proprietary | Accelerate math processing routines | -| MKLML | No | Proprietary | Small package of MKL, especially for Machine Learning | -| MKL-DNN | Yes | Apache 2.0 | Accelerate primitives processing routines especially for Deep Neural Networks | - -MKLML可以与MKL-DNN共同使用,以此达到最好的性能。 - -
                                -
                                -Figure 2. PaddlePaddle with MKL Engines -
                                - -## Actions - -添加的相关文件和目录结构如下: - -```txt -PaddlePaddle/Paddle -├── ... -├── cmake/ -│ ├── external/ -│ │ ├── ... -│ │ ├── mkldnn.cmake -│ │ └── mklml.cmake -└── paddle/ - ├── ... - ├── math/ - │ ├── ... - │ └── MKLDNNMatrix.* - └── gserver/ - ├── ... - ├── layers/ - │ ├── ... - │ └── MKLDNN*Layer.* - ├── activations/ - │ ├── ... - │ └── MKLDNNActivations.* - └── tests/ - ├── ... - ├── MKLDNNTester.* - └── test_MKLDNN.cpp -``` - -### CMake -在`CMakeLists.txt`中提供一个与MKL有关的总开关:`WITH_MKL`,它负责决定编译时是否使用MKLML和MKL-DNN - -- `WITH_MKLML` 控制是否使用MKLML库。 -当打开`WITH_MKL`时,会自动使用MKLML库作为PaddlePaddle的CBLAS和LAPACK库,同时会开启Intel OpenMP用于提高MKLML的性能。 -编译时会把对应的头文件和库放在`build/third_party/install/mklml/*`目录下对应的地方。 -MKLML的库目前都是动态库,主要包括`libiomp5.so`和`libmklml_intel.so`。 -- `WITH_MKLDNN` 控制是否使用MKL-DNN。 -当开启`WITH_MKL`时,会自动根据硬件配置[[2](#references)]选择是否编译MKL-DNN。 -编译时会把对应的头文件和库放在`build/third_party/install/mkldnn/*`目录下对应的地方。 -MKL-DNN的库目前只有动态库`libmkldnn.so`。 - -### Matrix -目前在PaddlePaddle中数据都是以`NCHW`的格式存储,但是在MKL-DNN中的排列方式不止这一种。 -所以我们定义了一个`MKLDNNMatrix`用于管理MKL-DNN数据的不同格式以及相互之间的转换。 - -
                                -
                                -Figure 3. MKLDNNMatrix -
                                - -### Layers -所有MKL-DNN的Layers都会继承于`MKLDNNLayer`,该类继承于PaddlePaddle的基类`Layer`。 -在`MKLDNNLayer`中会提供一些必要的接口和函数,并且会写好`forward`和`backward`的基本逻辑, -子类只需要使用定义好的接口,实现具体的函数功能即可。 - -
                                -
                                -Figure 4. MKLDNNLayer -
                                - -每个MKLDNNLayer都包含用于内部存储和外部存储的一系列MKLDNNMatrix: - -- 内部存储(internel memory):`inVal_`,`inGrad_`,`outVal_`和`outGrad_`,分别代表输入数据,输入梯度,输出数据和输出梯度。 -- 外部存储(external memory):都是以ext开头,比如`extInVal_`和`extInGrad_`,它们主要是用于, -当数据格式与PaddlePaddle默认的`NCHW`格式不匹配时,转换内存的工作。 -需要注意的是,PaddlePaddle的activation会直接使用`output_.value`和`output_.grad`, -所以`extOutVal_`和`extOutGrad_`必须分别与`output_.value`和`output_.grad`共享内存, -如果不需要外部存储用于转换,那么对应的内部存储也会与它们共享内存。 -- 转换函数(resetXXX): 包括`resetInValue`,`resetInGrad`,`resetOutValue`和`resetOutGrad`, -表示对输入数据,输入梯度,输出数据和输出梯度的转换。 -这些函数会根据输入参数重新设置内部和外部存储,当然这两者也可以相等,即表示不需要转换。 - -注意:每个`MKLDNNlayer`的子类只需要使用内部存储就可以了,所有外部的转换工作都会在reset系列函数中都准备好。 - -### Activations -在重构前的PaddlePaddle中,激活函数是独立于`Layer`的概念,并且输入输出都是共用一块内存, -所以添加了对应的`MKLDNNActivation`来实现,方式类似于`MKLDNNLayer`。 - -### Parameters -对于有参数的层,我们会保证`MKLDNNLayer`使用的参数与PaddlePaddle申请的buffer共用一块内存。 -如果存在数据排列格式不一样的情况时,我们会在网络训练之前把格式转换为MKL-DNN希望的格式, -在训练结束的时候再保存为PaddlePaddle的格式,但是整个训练过程中不需要任何转换。 -这样既使得最终保存的参数格式与PaddlePaddle一致,又可以避免不必要的转换。 - -### Gradients -由于MKL-DNN的操作都是直接覆盖的形式,也就是说输出的结果不会在原来的数据上累加, -这样带来的好处就是不需要一直清空memory,节省了不必要的操作。 -但是注意的是,当网络出现分支且在`backward`的时候,需要累加不同Layer传过来的梯度。 -所以在`MKLDNNlayer`中实现了一个merge的方法,此时每个小分支的`Input Gradient` -会先临时保存在`MKLDNNMatrix`中,由分支处的Layer负责求和,并把结果放到当前层的`output_.grad`中。 -所以整体上,在实现每个子类的时候就不需要关心分支的事情了。 - -
                                -
                                -Figure 5. Merge Gradients -
                                - -### Unit Tests -我们会添加`test_MKLDNN.cpp`和`MKLDNNTester.*`用于MKL-DNN的测试。 -测试分为每个Layer(或Activation)的单元测试和简单网络的整体测试。 -每个测试会对比PaddlePaddle中CPU算出的结果与MKL-DNN的结果,小于某个比较小的阈值认为通过。 - -### Python API -目前只考虑**v1 API**。 - -计划在`python/paddle/trainer/config_parser.py`里面添加`use_mkldnn`这个选择,方便用户选择使用MKL-DNN的layers。 - -具体实现方式比如: - -```python -use_mkldnn = bool(int(g_command_config_args.get("use_mkldnn", 0))) -if use_mkldnn - self.layer_type = mkldnn_* -``` - -所有MKL-DNN的`layer_type`会以*mkldnn_*开头,这些会在`MKLDNN*Layer`注册layer的时候保证,以示区分。 - -同时,会在`paddle/utils.Flags`中添加一个`use_mkldnn`的flag,用于选择是否使用MKL-DNN的相关功能。 - -### Benchmarking -会添加相应的脚本在[这里](https://github.com/PaddlePaddle/Paddle/tree/develop/benchmark/paddle/image),用于测试和对比在使用MKL-DNN前后的CNN网络性能。 -测试的性能对比结果会在[IntelOptimizedPaddle.md](https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/IntelOptimizedPaddle.md) - -### Others -1. 如果在使用MKL-DNN的情况下,会把CPU的Buffer对齐为4096,具体可以参考MKL-DNN中的[memory](https://github.com/01org/mkl-dnn/blob/master/include/mkldnn.hpp#L673)。 -2. 深入PaddlePaddle,寻找有没有其他可以优化的可能,进一步优化。比如可能会用OpenMP改进SGD的更新性能。 - -## Design Concerns - -为了更好的符合PaddlePaddle的代码风格\[[3](#references)\],同时又尽可能少的牺牲MKL-DNN的性能\[[4](#references)\]。 - -我们总结出一些特别需要注意的点: - -1. 使用**deviceId_**。为了尽可能少的在父类Layer中添加变量或者函数, -我们决定使用已有的`deviceId_`变量来区分layer的属性,定义`-2`为`MKLDNNLayer`特有的设备ID。 -2. 重写父类Layer的**init**函数,修改`deviceId_`为`-2`,代表这个layer是用于跑在MKL-DNN的环境下。 -3. 创建`MKLDNNBase`,定义一些除了layer和memory相关的类和函数。 -包括MKL-DNN会用到`MKLDNNStream`和`CPUEngine`,和未来可能还会用到`FPGAEngine`等。 -4. 如果MKL-DNN layer的后面接有cpu device,那么就会使`output_.value`与`extOutVal_`共享内存, -同时数据格式就是`NCHW`,这样下一个cpu device就能拿到正确的数据。 -在有普通的CPU layer时, `extOutVal_`和`extOutGrad_`的格式始终是`NCHW`或者`NC`。 - -## References -1. [MKL small library](https://github.com/01org/mkl-dnn#linking-your-application)是[Intel MKL](https://software.intel.com/en-us/mkl)的一个子集。 -主要包括了深度学习相关的数学原语与操作,一般由MKL-DNN在发布[新版本](https://github.com/01org/mkl-dnn/releases)时一起更新。 -2. [MKL-DNN System Requirements](https://github.com/01org/mkl-dnn#system-requirements)。 -目前在PaddlePaddle中,仅会在支持AVX2指令集及以上的机器才使用MKL-DNN。 -3. [原来的方案](https://github.com/PaddlePaddle/Paddle/pull/3096)会引入**nextLayer**的信息。 -但是在PaddlePaddle中,无论是重构前的layer还是重构后的op,都不会想要知道next layer/op的信息。 -4. MKL-DNN的高性能格式与PaddlePaddle原有的`NCHW`不同(PaddlePaddle中的cuDNN部分使用的也是`NCHW`,所以不存在这个问题)。 -所以需要引入一个转换方法,并且只需要在必要的时候转换这种格式,才能更好的发挥MKL-DNN的性能。 diff --git a/develop/doc_cn/_sources/design/mkl/mkldnn_fluid.md.txt b/develop/doc_cn/_sources/design/mkl/mkldnn_fluid.md.txt deleted file mode 100644 index bef126f3f0577b69f646dfe5d10539b372c6a8a5..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/mkl/mkldnn_fluid.md.txt +++ /dev/null @@ -1,149 +0,0 @@ -# Design Doc: Add MKLDNN Kernel in Fluid Operator - -## Principles - -First of all, we should follow some basical principles like: -1. [How to write a new operator](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md). We are trying to add a new kind of kernel into operators, so basically we should follow this doc. -2. [Supporting new Device/Library](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/support_new_device.md). Since MKLDNN is a new library to fluid, we should add `MKLDNNDeviceContext` and maybe `mkldnn_helper.h`, just like [cudnn_helper.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/cudnn_helper.h). -3. [Switch Kernel](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md). Another important point is that we should ensure the data synchronization between different kernel types, which is this [topic](https://github.com/PaddlePaddle/Paddle/issues/6549). So basically we should override `GetExpectedKernelType` and `trans` functions to support switching kernels. -4. [The Keys of Operator Kernel Type](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md). Kernel Type is a pivotal conception which can record the `Place`, `Library`, `DataType` and `Layout`. - -## Sulution - -In general, there are four parts we should follow to run a MKL-DNN primitive. -- Create a primitive descriptor that describe this operator -- Create a primitive itself by primitive descriptor and the engine -- Create all memory buffers that primitive needed -- Launch a stream to execute the primitive created -More details can refer to [here](http://01org.github.io/mkl-dnn). - -It's better to avoid reinitialization of primitives and memory handles in the first three stages in every iteration. \ -So we plan to create a map to record all the `primitive` and `memory`, which should not take too much memories as discussed [here](https://github.com/PaddlePaddle/Paddle/issues/6822). - -It's assumed that following three conditions should be satisfied. -1. there is a unique key for each operator instance. May be the actual name of `Output Tensor`. -2. the `Input Tensor` inside `Compute` function is the one after converted. -3. we can get the phase(eg. `is_test`) inside `Compute` function, otherwise we need to expose this attribue to user. - -### Compute -The algorithm of `Compute` would be described as follow, let's take conv like an example. - -```c++ - - PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), "It must use CPUPlace."); - PADDLE_ENFORCE(platform::is_mkldnn_library(ctx.GetLibrary()), "It must use MKLDNN Library."); - - auto& dev_ctx = ctx.template device_context(); - - // find primitive by unique key from mkldnn context - // the op_key should be a unique name of this op instance - auto& p = dev_ctx.findPrimitive(op_key + "_fwd"); - - // assuming the input tensor inside this compute function is the one after converted - // this point should be guarantee by another mechanism - auto& i = dev_ctx.findMemory(op_key + "_input"); - - if (p == nullptr || i == nullptr || inputSizeChanged(p, i)) { - auto fwd_primitive_desc = createPrimitiveDesc(ctx); - auto* input = ctx.Input("Input"); - auto* filter = ctx.Input("Filter"); - auto* output = ctx.Output("Output"); - shared_ptr in(new mkldnn::memory(fwd_primitive_desc->src_primitive_desc(), input->data())); - shared_ptr wgt(new mkldnn::memory(fwd_primitive_desc->weights_primitive_desc(), filter->data())); - shared_ptr out(new mkldnn::memory(fwd_primitive_desc->dst_primitive_desc(), output->mutable_data(ctx.GetPlace()))); - shared_ptr fwd_primitive(new mkldnn::conv_fwd(*fwd_primitive_desc, *in, *wgt, *out)); - - dev_ctx.addMemory(op_key+"_input", in); - dev_ctx.addMemory(op_key+"_output", out); - dev_ctx.addMemory(op_key+"_filer", wgt); - dev_ctx.addPrimitive(op_key+"_fwd", fwd_primitive); - dev_ctx.addPrimitiveDesc(op_key+"_fwd_PD", fwd_primitive_desc); - } - - p = dev_ctx.findPrimitive(op_key + "_fwd"); - - PADDLE_ENFORCE(p, "Should have forward Primitive"); - PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_input"), "Should have input memory"); - PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_output"), "Should have output memory"); - PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_filter"), "Should have filter memory"); - PADDLE_ENFORCE(dev_ctx.findPrimitiveDesc(op_unique_key+"_fwd_PD"), "Should have forward PrimitiveDesc"); - dev_ctx.submit(p); - dev_ctx.execute(); // the convert primitive should have already contained. - -``` - -The `createPrimitiveDesc` returns the primitive descripotor of this operator, would be like this: -```c++ - auto* input = ctx.Input("Input"); - auto* filter = ctx.Input("Filter"); - auto* output = ctx.Output("Output"); - std::vector strides = ctx.Attr>("strides"); - std::vector paddings = ctx.Attr>("paddings"); - std::vector dilations = ctx.Attr>("dilations"); - int groups = ctx.Attr("groups"); - algorithm algo = static_cast(ctx.Attr("convolution_algorithm_option")); - prop_kind pk = ctx.Attr("is_test") ? prop_kind::forward_inference : prop_kind::forward_training; - - auto fwd_desc = mkldnn::conv_fwd::desc(/* all the setting above*/); - shared_ptr fwd_primitive_desc(new mkldnn::conv_fwd::primitive_desc(fwd_desc, ctx.getEngine())); - - return fwd_primitive_desc; - } -``` - -### MKLDNNDeviceContext -`MKLDNNDeviceContext`, which is very straightforward, should contain some base information like: `stream`, `engine` and the map needed. - - -### mkldnn_helper -Some functions would be put in `paddle/platform/mkldnn_helper.h`. -- create MKLDNN memories -- create MKLDNN primitives -- error check function -- etc - - -### Kernel Switch -We should `reorder` the different Layout from other device or to other device. `GetExpectedKernelType` and `trans` functions can help us to implement it. - -`GetExpectedKernelType` should get the context, and this operator can return the best `KernelType`. -`trans` would be like this: - -```c++ -void trans(inputs, ctx) override { - if (NoNeedTrans()) { - return; - } - // find reorder primitive by op_key from context - auto& dev_ctx = ctx.template device_context(); - auto& p = dev_ctx.findPrimitive(op_key + "_reorder_input"); - auto& i = dev_ctx.findMemory(op_key + "_src_input"); - - if (p == nullptr || i == nullptr || changeSized(i, input)) { - auto prim = createPrimitiveDesc(ctx); - auto src = createMemory(memoryDesc(input->dims(), actual_layout), input->data); - auto newbuffer = paddle::memory::Alloc(ctx.GetPlace(), input->size_in_bytes()); - auto dst = createMemory(p->expected_desc(), newbuffer->data); - auto reorder_primitive(new mkldnn::reorder(src, dst)); - - dev_ctx.addMemory(op_key+"_src_input", src); - dev_ctx.addMemory(op_key+"_input", dst); - dev_ctx.addPrimitive(op_key+"_reorder_input", reorder_primitive); - } - - p = dev_ctx.findPrimitive(op_key + "_reorder_input"); - PADDLE_ENFORCE(p, "Should have Reorder Primitive"); - dev_ctx.submit(p); - if (! this->isMKLDNNKernel()) { - // execute immediately only if this is not mkldnn kernel function. - // otherwise, it can be executed with the operator primitive in Compute - dev_ctx.stream(); - } - // after submit, the input tensor in ExecutionContext should be changed as the converted one - // there should be another mechanism to ensure this -} -``` - -### Unit Test -All the functions should be tested corresponding. -TBD diff --git a/develop/doc_cn/_sources/design/model_format.md.txt b/develop/doc_cn/_sources/design/model_format.md.txt deleted file mode 100644 index e29129fddf775939c9f7a8b49d850d523e6e5a45..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/model_format.md.txt +++ /dev/null @@ -1,36 +0,0 @@ -# Design Doc: Model Format - -## Motivation - -A model is an output of the training process. One complete model consists of two parts, the **topology** and the **parameters**. In order to support industrial deployment, the model format must be self-complete and must not expose any training source code. - -As a result, In PaddlePaddle, the **topology** is represented as a [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/doc/design/program.md), which describes the model structure. The **parameters** contain all the trainable weights in the model. We must support large size parameters and efficient serialization/deserialization of parameters. - -## Implementation - -The topology is saved as a plain text in a detailed self-contain protobuf file. - -The parameters are saved as a binary file. As we all know, the protobuf message has a limit of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We have done a [benchmark experiment](https://github.com/PaddlePaddle/Paddle/pull/4610), which shows that protobuf is not fit for the task. - -As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of [LoDTensorDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99). We save the DescProto as the byte string header. It contains all the necessary information, such as the `dims`, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is, - -The table below shows a tensor's byte view in detail. Note that all the signed values are written in the little-endian format. - -|field name | type | description | -| --- | --- | --- | -| version | uint32_t | Version of saved file. Always 0 now. | -| tensor desc length | uint32_t | TensorDesc(Protobuf message) length in bytes. | -| tensor desc | void* | TensorDesc protobuf binary message | -| tensor data | void* | Tensor's data in binary format. The length of `tensor_data` is decided by `TensorDesc.dims()` and `TensorDesc.data_type()` | -| lod_level | uint64_t | Level of LoD | -| length of lod[0] | uint64_t | [Optional] length of lod[0] in bytes. | -| data of lod[0] | uint64_t* | [Optional] lod[0].data() | -| ... | ... | ... | - - - -## Summary - -- We introduce a model format. -- The model represented by its forward-pass computation procedure is saved in a **ProgramDesc** protobuf message. -- A bunch of specified format binary tensors describe the **parameters**. diff --git a/develop/doc_cn/_sources/design/multi_language_interface/00.why_plain_c.md.txt b/develop/doc_cn/_sources/design/multi_language_interface/00.why_plain_c.md.txt deleted file mode 100644 index a1443093342c5a3ed698fb6b52a751dfc7cb5319..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/multi_language_interface/00.why_plain_c.md.txt +++ /dev/null @@ -1,118 +0,0 @@ -# Paddle多语言接口实现 -## 背景 - -Paddle需要一个多语言接口,这个接口需要做到: - -* 有标准的,良好的文档 - * 例如Python可以使用[Sphinx](http://www.sphinx-doc.org/en/stable/)生成API文档,golang可以使用[GoDoc](https://godoc.org/golang.org/x/tools/cmd/godoc)生成文档。这都需要这个接口按照约定俗成的规则来注释完备。 -* 不同语言的接口适应不同语言的特性 - * 例如Java与Python的错误处理是直接扔出来Exception,而对于golang错误处理应该使用返回值。 - -## 基本要求 - -Paddle的多语言接口实现包括一下几个方面: - -* 我们使用动态库来分发Paddle。在这个动态库中不嵌入任何其他语言的解释器,也不使用其他动态库。 -* 这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号。 -* 不导出Paddle内部的结构体、类,仅仅使用`void*`指针作为类型的句柄(handler)。 -* 不使用SWIG这种代码生成器,而是手写多语言绑定。 - - -## 原因 - -### 使用动态库来分发Paddle - -* Paddle的链接方式比较复杂 - * 如果用户要把Paddle的静态库(libpaddle.a)链接到自己的程序里,得使用 `--whole-archive` (for GCC) 或者 `--force_load` (for Clang) 参数,来确保把 libpaddle.a 里所有的符号都写入自己的程序的二进制文件里。这是因为 Paddle 的源码里使用了[object factory design pattern](http://stackoverflow.com/a/1310326/724872)。 -* 编译型语言,例如C/C++使用静态库和动态库难度差不多。但是解释性语言,例如[Python](http://stackoverflow.com/questions/19560594/how-to-import-static-library-in-python)或者[Java](http://stackoverflow.com/questions/24493337/linking-static-library-with-jni),只能调用Paddle的动态库,否则得把Paddle静态库链接到解释器里。 - * 解释性语言实际运行的二进制是解释器本身,如果调用静态库只能将静态库与解释器链接。例如对于Java来说,便是将静态库加入JVM中。这对于通常的Java的开发者来说,是不常见的做法。 - -### 动态库中不嵌入任何其他语言的解释器 - -* 目前Paddle的进程模型是C++内部驱动Python解释器进行模型配置解析和数据读取 -* 我们最终的动态库中不嵌入Python或者其他任何语言的解释器。模型配置解析,数据读取均交由其他语言完成 - -现阶段Paddle有一个问题是,Paddle内嵌的Python解释器和外部使用的Python如果版本不同,会直接报错退出。 - -### Paddle动态库中,不引用其他动态库 - -* 即这个动态库是不依赖于其他任何文件的,可以在任何机器上执行的。 - -### 这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号 - -* 由于C++编译器没有[名字修饰](https://en.wikipedia.org/wiki/Name_mangling#C.2B.2B)的规范,不同版本的编译器之间,对于同一段C++代码生成的符号可能不一致。而多语言接口需要直接读取生成的二进制(动态库),需要有稳定的导出符号。 -* C语言是有导出符号的标准的,并且在常见的平台上,都是ABI调用标准的。 -* 大多数语言都支持使用C语言API -* 使用C99而不使用C89,是因为C99支持[Fixed-width integer types](https://en.wikipedia.org/wiki/C_data_types#Fixed-width_integer_types)和[Boolean type](https://en.wikipedia.org/wiki/C_data_types#Boolean_type)。 -* 使用C99而不使用C11的原因是,[C11](https://en.wikipedia.org/wiki/C11_(C_standard_revision))并没有Paddle特别需要的特性,且C99相对于C11使用更加广泛。 - -### 不导出Paddle内部的结构体、类,仅仅使用`void*`指针作为类型的句柄(handler) - -* Paddle内部的类为C++书写,直接导出到C的接口比较困难。 -* 在C-API中使用`void*`来表示Paddle内部类。再在每一个API中自己检查类型。 - -在C的头文件 `paddle_matrix.h` 中: - -```C -typedef void* paddle_matrix; -typedef int paddle_error; - -extern "C" -paddle_error paddle_matrix_get_shape(paddle_matrix matrix, - uint64_t* width, - uint64_t* height); -``` -而在CPP里面实现这个C的接口,文件 `paddle_matrix.cpp` - -```cpp -#include "paddle/math/matrix.h" -extern "C" -paddle_error paddle_matrix_shape(paddle_matrix matrix, - uint64_t *width, - uint64_t *height) { - auto m = (paddle::capi::CMatrix*)(matrix); - *width = m->width(); - *height = m->height(); -} -``` - -其中`paddle/capi/CMatrix.hpp`文件内容为: - -```cpp -namespace paddle { -namespace math { - -class CMatrix { - std::shared_ptr mat; -}; - -} // namespace math -} // namespace paddle -``` - -### 不使用SWIG这种代码生成器,而是手写多语言绑定 - -* [SWIG](http://www.swig.org/)是一个多语言接口的代码生成器。他的目标是使用C/C++写代码,SWIG直接读取C/C++的头文件,生成各种语言的绑定代码。 - * 对于多语言接口,SWIG需要写一个interface文件。这个文件具有独特的语法,学习成本高。且增加一个第三方语言,就需要对这个第三方语言增加一些定义。有的时候,interface文件的写法非常[tricky](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/api/Paddle.swig#L36)。社区贡献代码学习成本高。 - * SWIG暴露的接口保留了C++的接口样式,很难保证多语言代码风格的一致性。(函数命名,错误处理) - * 因为SWIG在第三方语言中暴露的函数名,类名和C++中完全一致。C++的命名风格并不能适应其他第三方语言。如果使用SWIG我们需要将在interface文件里,将大量的`SomeCppClass`重命名成`some_python_class`,或者`SomeGoTypes`。 - * 对于不同语言,错误处理的方式也不尽相同。例如对于Java或者Python,最常见的错误处理方式是Exception,而对于Golang,错误处理方式是返回值。而SWIG只能简单的暴露C++接口,无法做到对于各种语言错误处理方式的适配。 - * 对于大多数语言,直接使用C语言的.h并不困难。例如Python的[cffi](https://cffi.readthedocs.io/en/latest/overview.html#simple-example-abi-level-in-line)或者[Cython](http://cython.org/), golang的[cgo](https://golang.org/cmd/cgo/)。 - * SWIG支持的语言或者解释器有局限。例如对于Python,使用SWIG只支持CPython解释器,而不支持PyPy解释器。 - - -## 原因列表 - -| 结论 | 对比 | 原因 | -|---| --- | --- | -| 使用动态库 | 不使用静态库 | 解释型语言只能调用动态库,Paddle静态库链接复杂 | -| 不嵌入其他语言解释器 | 不嵌入Python解释器 | Paddle C++目前嵌入Python解释器,会导致不同版本Python在一个进程里的bug | -| 不引用其他动态库 | | Paddle一个动态库可以在任何Linux系统上运行 | -| 使用C99做接口 | 不使用C++做接口 | C有标准的ABI,C99是目前C最广泛的使用标准,且C99支持bool类型和定长整数(uint64_t等)类型 | -| 使用void*作为类句柄 | 不显示的写每个类具体包含什么| 实现简单,并且让接口脱离实现细节 | -| 手写多语言绑定 | 不使用SWIG | 使用SWIG需要多语言绑定的开发人员熟练掌握SWIG配置,社区参与困难。SWIG生成的代码不能保证多语言代码风格的一致性 | - - -## 实现 - -参考[Inference implementation](01.inference_implementation.md) diff --git a/develop/doc_cn/_sources/design/multi_language_interface/01.inference_implementation.md.txt b/develop/doc_cn/_sources/design/multi_language_interface/01.inference_implementation.md.txt deleted file mode 100644 index 9820284523246a062581f322616d196f575c9d29..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/multi_language_interface/01.inference_implementation.md.txt +++ /dev/null @@ -1,131 +0,0 @@ -# C-API 模型推断实现文档 - -本文档描述Paddle C-API的实现细节。Paddle C-API是多语言API的基础部分。Paddle需要暴露的API很多。先实现模型推断的API,通过模型推断API的实现作为一个样例,来进行讨论。至于为什么需要C-API,请参考[Why Plain C](./00.why_plain_c.md)。 - -## Table of Contents - * [C-API 模型推断实现文档](#c-api-模型推断实现文档) - * [暴露接口原则](#暴露接口原则) - * [目录结构](#目录结构) - * [实现方式](#实现方式) - * [capi.h](#capih) - * [具体某种类型的头文件](#具体某种类型的头文件) - * [capi_private.h](#capi_privateh) - * [具体某种类型的实现文件](#具体某种类型的实现文件) - * [libpaddle_capi_shared.{so, dylib}](#libpaddle_capi_sharedso-dylib) - * [libpaddle_capi_whole.a](#libpaddle_capi_wholea) - * [examples](#examples) - * [编译选项](#编译选项) - - -## 暴露接口原则 - -1. 所有的接口均为C接口。即使用`extern "C"` -2. 除构造某种类型的函数(`paddle_matrix_create`等),其他函数均返回`paddle_error`。且调用时不能抛出异常或出现运行时错误。 -3. 所有类型名为`paddle_类型名`,所有与类型相关的函数,函数名为`paddle_类型名_函数名` -4. 如果某一个Paddle Core概念(GradientMachine/Matrix)需要被暴露到其他语言,那么 - * 为了暴露的接口尽量简单。只暴露概念的接口,而不暴露概念的实现。即暴露`GradientMachine`或者`Matrix`但不暴露`RecurrentGradientMachine`和`CpuSparseMatrix`。 - * 暴露这个概念必要函数。`必要`是指,即完成某一个任务的最少函数。 -5. 不在`capi`接口层做过多封装。 - * 如果某一个Paddle概念必须要暴露,但是又过于琐碎。不在`capi`这一层进行封装,而是直接修改Paddle Core。让Paddle核心中,这一概念不再琐碎。 - - -## 目录结构 - -```text -Paddle - `-- paddle - `-- capi - `-- examples # The example project for C-API. - `-- tests # unittests for C-API - `-- capi.h # C-API header file. - `-- capi_private.h # The shared header file between implementation sources. - `-- matrix.{h, cpp} - `-- gradient_machine.{h, cpp} - `-- ... -``` - - -Paddle的C-API目录结构如上图表所示。这个目录中除了`capi_private.h`之外的所有头文件,均会被安装到include/paddle路径下。C-API生成的二进制文件会被安装到`lib`目录下。即,安装后的目录结构为 - -```text -`-- include - `-- paddle - `-- capi.h - `-- matrix.h - `-- gradient_machine.h - `-- ... -`-- lib - `-- libpaddle_capi_shared.{so, dylib} # In mac, dynamic libary's file name extention is `dylib` - `-- libpaddle_capi_whole.a # static library for all symbols of Paddle. -``` - -## 实现方式 - -下面分别介绍某一类文件的实现方式。 - -### capi.h - -`capi.h`是用户使用C-API时所唯一需要引入的头文件。在`capi.h`中,引入了类型的头文件,`matrix.h`, `gradient_machine.h`。在引入其他类型的头文件时,使用相对路径的引用方式。即`#include "matrix.h"` - -### 具体某种类型的头文件 - -具体某种类型的头文件,即例如`matrix.h`,`gradient_machine.h`等。在这些头文件中,包含了某种类型的类型定义和暴露的全部函数。 - -这个头文件不假设其他文件的引用顺序,即使用户直接引用某种类型的头文件,也不应该报错(虽然不鼓励这样)。如果某一个类型需要引用另一个类型,例如`gradient_machine`需要引用`matrix`,则直接引入另一种类型的头文件,即`#include "matrix.h"`。 - -### capi_private.h - -`capi_prviate.h`是各个实现中共享的头文件,他主要包含了实际暴露的类型结构。在用户使用C-API时,Paddle的类型全部退化成`void *`,即`typedef paddle_matrix void*`。但,对于每种C-API暴露的类型,均是在`capi_private.h`中实现的结构体。 - -```cpp -struct CMatrix { - int type = MatrixType; - std::shared_ptr mat; -}; -``` - -通常,这个结构体包含两个项目。 - -* `type`是一个类型的标志。对于每种类型,type字段均不尽相同。这样,即使C-API接受的类型全是`void *`,我们也可以确定每一个参数的类型。 - - ```cpp - void some_c_api_function(void* some_instance) { - int* type = (int *) some_instance; - switch (*type) { - case MatrixType: - CMatrix* mat = (CMatrix *) some_instance; - ... - ... - } - } - ``` -* 这个结构体中的另一个项目是,Paddle Core中这一类型接口的智能指针(shared_ptr)。 - * 使用智能指针的原因是: 用户可以安全的释放某个C-API的实例,而不必在意Paddle Core是否还在使用这个实例。 - * 例如,用户通过C-API获得了神经网络的参数实例。当用户使用完这个参数后,直接删除这个参数即可。即便Paddle Core中的模型还在使用这个参数,这个参数也不会一并删除。 - -### 具体某种类型的实现文件 - -具体某种类型的实现文件,即`matrix.cpp`, `gradient_machine.cpp`等文件。在这些文件中,使用C++ 11实现了C-API的接口,并且使用`extern "C"`导出这些接口。在实现过程中,对输入参数的安全性进行了必要的判断,并将C-API接口的参数转发给`Paddle Core`。 - -### libpaddle\_capi_shared.{so, dylib} - -`libpaddle_capi_shared`是C-API导出的动态库。这个动态库的连接参数与Paddle的其他二进制(例如`paddle_trainer`)类似。用户可以直接使用这个动态库来引入Paddle C-API。具体使用方法为`-lpaddle_capi_shared`。 - -### libpaddle\_capi_whole.a - -`libpaddle_capi_whole`是C-API导出的静态库。这个静态库包含了Paddle的全部符号。他是将`libpaddle_gserver.a`, `libpaddle_math.a`, `libpaddle_capi.a`等全部静态库中的目标文件全部打包后产生的文件。具体使用方法为`--whole-archive -lpaddle_capi_whole --no-whole-archive`。 - - -### examples - -在样例中,使用`C99`开发了模型预测的样例代码。具体请参考[example/README.md](../../../paddle/capi/examples/README.md)。 - -## 编译选项 - -C-API的编译选项默认关闭,打开这个编译选项,需要在cmake的时候,设置 - -```bash -cmake ${YOUR_SOURCE_ROOT} -DWITH_C_API=ON -DWITH_PYTHON=OFF -DWITH_SWIG_PY=OFF -``` - -编译C-API的时候推荐Paddle不嵌入Python解释器,也不生成`SWIG`接口,具体原因参考[Why Plain C](./00.why_plain_c.md)。 diff --git a/develop/doc_cn/_sources/design/operator_kernel_type.md.txt b/develop/doc_cn/_sources/design/operator_kernel_type.md.txt deleted file mode 100644 index f86e6b7a564ed23f2bddbec25da1c110014f941d..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/operator_kernel_type.md.txt +++ /dev/null @@ -1,91 +0,0 @@ -# Design Doc: The Keys of Operator Kernel Type -## Problem -An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses `OpKernelType` as a key to identify a unique kernel. Before an operator runs, a certain type of kernel must be chosen via a key of `OpKernelType`. Currently, `OpKernelType` is defined as follows: - -```cpp -struct OpKernelType { - platform::Place place_; - proto::DataType data_type_; -}; -``` -For more details, please refer to [codes](https://github.com/PaddlePaddle/Paddle/blob/2d5ec16bc8a09fb8e0f62c89b116b0cd1d333907/paddle/framework/operator.h#L348-L374) in github. - -It contains two keys, `Place` and `DataType`. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys do not provide enough information. We need a more complete representation of `OpKernelType`. - -We often implement a kernel of an operator with some computing library on certain device(place). Please note that computing library and device do not have a one-to-one correspondence. A device can have a lot of computing libraries and a computing library can also support different devices. - -For example, Eigen library supports Nvidia GPU/AMD GPU/CPU and MKLDNN library supports Intel CPU/Intel FPGA. Both `Place` and `Library` should be a key of `OpKernelType`. - -Different DataTypes, such as fp64/fp32/int8, will obviously have different kernels. But different data layout of a Tensor will also lead to different implementations. Please refer to the batch norm operator [kernels](https://github.com/PaddlePaddle/Paddle/blob/a948fac4d0ad7e0412d373b8aabeb711c2899563/paddle/operators/batch_norm_op.cc#L180-L209) as an example. Data layout should also be taken into consideration. - -## Solution - -There are four keys to determine a kernel type of an operator: `Place`/`Library`/`DataType`/`Layout`. - -```cpp -struct OpKernelType { - platform::Place place_; - platform::Library library_; - proto::DataType data_type_; - framework::Layout layout_; -}; -``` - -The details are as follows: - -### Place - -`Place` is defined as: - -```cpp -typedef boost::variant Place; -``` - -`Place` represents the device memory where data is located. - - -### Library - -One operator kernel is usually implemented based on one library. `Library` is defined as a enum variable: - -```cpp -enum Library { Plain, MKLDNN, CUDNN }; -``` - -We use `Plain` enumerator to represent default library. Since most operators in Fluid are implemented based on the `Eigen` library, we take `Eigen` library as the `Plain` enumerator. -A library usually has a corresponding `DeviceContext` which contains some handles needed for computation. Fluid now has two default DeviceContexts for CPU and CUDA, namely, `CPUDeviceContext` and `CUDADeviceContext`. `CPUDeviceContext` contains an Eigen library handle and `CDUADeviceContext` contains an Eigen library handle and a cuBLAS handle. - -If we want to support new library, a new enumerator need to be added to `Library` and a corresponding new `LibraryDeviceContext` need to be created. - - -### DataType - - -`DataType` is defined in [framework.proto](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto). Currently, int32/int64/fp32/fp64 are supported. - -### Layout - -Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout. - -Different layout leads to different implementation of the operator kernel. There are mainly 4 principles we have to follow to support layout in our Fluid framework. - -- We take layout as a data member of Tensor. Layout is actually a enum variable. If Fluid is built with MKLDNN, then the memory format in MKLDNN will also be added into this enum variable. - -- Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout for generating data. Of course, we can have some default layout, like NCHW. - -- The inference of Layout is at run-time, not at compile-time. - -- Every operator has to implement different kernels for different layouts. Let's take MKLDNN as an example. If we want to implement an MKLDNN convolution operator, we have to implement all the kernels for different layouts, which are listed [here](http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html). And we will have a special macro to register kernels for MKLDNN operators. - -`Layout` is also defined as a enum variable: - -```cpp -enum Layout { - kNCHW, - kNHWC, -#ifdef PADDLE_WITH_MKLDNN - knChw8c - ... -#endif -}; -``` diff --git a/develop/doc_cn/_sources/design/ops/rnn.md.txt b/develop/doc_cn/_sources/design/ops/rnn.md.txt deleted file mode 100644 index 2f4854793fa1f0b02e4dc17b51a48a972be61c06..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/ops/rnn.md.txt +++ /dev/null @@ -1,153 +0,0 @@ -# RNNOp design - -This document describes the RNN (Recurrent Neural Network) operator and how it is implemented in PaddlePaddle. The RNN op requires that all instances in a mini-batch have the same length. We will have a more flexible dynamic RNN operator in the future. - -## RNN Algorithm Implementation - -

                                - -

                                - -The above diagram shows an RNN unrolled into a full network. - -There are several important concepts here: - -- *step-net*: the sub-graph that runs at each step. -- *memory*, $h_t$, the state of the current step. -- *ex-memory*, $h_{t-1}$, the state of the previous step. -- *initial memory value*, the memory of the first (initial) step. - -### Step-scope - -There could be local variables defined in each step-net. PaddlePaddle runtime realizes these variables in *step-scopes* which are created for each step. - -

                                -
                                -Figure 2 illustrates the RNN's data flow -

                                - -Please be aware that every step runs the same step-net. Each step does the following: - -1. Creates the step-scope. -2. Initializes the local variables including step-outputs, in the step-scope. -3. Runs the step-net, which uses the above mentioned variables. - -The RNN operator will compose its output from step outputs in each of the step scopes. - -### Memory and Ex-memory - -Let's give more details about memory and ex-memory using a simple example: - -$$ -h_t = U h_{t-1} + W x_t -$$, - -where $h_t$ and $h_{t-1}$ are the memory and ex-memory (previous memory) of step $t$ respectively. - -In the implementation, we can make an ex-memory variable either "refer to" the memory variable of the previous step, -or copy the memory value of the previous step to the current ex-memory variable. - -### Usage in Python - -For more information on Block, please refer to the [design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md). - -We can define an RNN's step-net using a Block: - -```python -import paddle as pd - -X = some_op() # x is some operator's output and is a LoDTensor -a = some_op() - -# declare parameters -W = pd.Variable(shape=[20, 30]) -U = pd.Variable(shape=[20, 30]) - -rnn = pd.create_rnn_op(output_num=1) -with rnn.stepnet(): - x = rnn.add_input(X) - # declare a memory (rnn's step) - h = rnn.add_memory(init=a) - # h.pre_state(), the previous memory of rnn - new_state = pd.add_two( pd.matmul(W, x) + pd.matmul(U, h.pre_state())) - # update current memory - h.update(new_state) - # indicate that h variables in all step scopes should be merged - rnn.add_outputs(h) - -out = rnn() -``` - -Python API functions in above example: - -- `rnn.add_input`: indicates that the parameter is a variable that will be segmented into step-inputs. -- `rnn.add_memory`: creates a variable used as the memory. -- `rnn.add_outputs`: marks the variables that will be concatenated across steps into the RNN output. - -### Nested RNN and LoDTensor - -An RNN whose step-net includes other RNN operators is known as an *nested RNN*. - -For example, we could have a 2-level RNN, where the top level corresponds to paragraphs, and the lower level corresponds to sentences. Each step of the higher level RNN also receives an input from the corresponding step of the lower level, and additionally the output from the previous time step at the same level. - -The following figure illustrates feeding in text into the lower level, one sentence at a step, and the feeding in step outputs to the top level. The final top level output is about the whole text. - -

                                - -

                                - -```python -import paddle as pd - -W = pd.Variable(shape=[20, 30]) -U = pd.Variable(shape=[20, 30]) - -W0 = pd.Variable(shape=[20, 30]) -U0 = pd.Variable(shape=[20, 30]) - -# a is output of some op -a = some_op() - -# chapter_data is a set of 128-dim word vectors -# the first level of LoD is sentence -# the second level of LoD is a chapter -chapter_data = pd.Variable(shape=[None, 128], type=pd.lod_tensor, level=2) - -def lower_level_rnn(paragraph): - ''' - x: the input - ''' - rnn = pd.create_rnn_op(output_num=1) - with rnn.stepnet(): - sentence = rnn.add_input(paragraph, level=0) - h = rnn.add_memory(shape=[20, 30]) - h.update( - pd.matmul(W, sentence) + pd.matmul(U, h.pre_state())) - # get the last state as sentence's info - rnn.add_outputs(h) - return rnn - -top_level_rnn = pd.create_rnn_op(output_num=1) -with top_level_rnn.stepnet(): - paragraph_data = rnn.add_input(chapter_data, level=1) - low_rnn = lower_level_rnn(paragraph_data) - paragraph_out = low_rnn() - - h = rnn.add_memory(init=a) - h.update( - pd.matmul(W0, paragraph_data) + pd.matmul(U0, h.pre_state())) - top_level_rnn.add_outputs(h) - -# output the last step -chapter_out = top_level_rnn(output_all_steps=False) -``` - -In the above example, the construction of the `top_level_rnn` calls `lower_level_rnn`. The input is an LoD Tensor. The top level RNN segments input text data into paragraphs, and the lower level RNN segments each paragraph into sentences. - -By default, the `RNNOp` will concatenate the outputs from all the time steps. -If the `output_all_steps` is set to False, it will only output the final time step. - - -

                                - -

                                diff --git a/develop/doc_cn/_sources/design/ops/sequence_decoder.md.txt b/develop/doc_cn/_sources/design/ops/sequence_decoder.md.txt deleted file mode 100644 index c4a9bbeeefca0e05c335dd60233691e8bac33015..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/ops/sequence_decoder.md.txt +++ /dev/null @@ -1,229 +0,0 @@ -# Design: Sequence Decoder Generating LoDTensors -In tasks such as machine translation and visual captioning, -a [sequence decoder](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md) is necessary to generate sequences, one word at a time. - -This documentation describes how to implement the sequence decoder as an operator. - -## Beam Search based Decoder -The [beam search algorithm](https://en.wikipedia.org/wiki/Beam_search) is necessary when generating sequences. It is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set. - -In the old version of PaddlePaddle, the C++ class `RecurrentGradientMachine` implements the general sequence decoder based on beam search, due to the complexity involved, the implementation relies on a lot of special data structures that are quite trivial and hard to be customized by users. - -There are a lot of heuristic tricks in the sequence generation tasks, so the flexibility of sequence decoder is very important to users. - -During the refactoring of PaddlePaddle, some new concepts are proposed such as: [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) and [TensorArray](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md) that can better support the sequence usage, and they can also help make the implementation of beam search based sequence decoder **more transparent and modular** . - -For example, the RNN states, candidates IDs and probabilities of beam search can be represented all as `LoDTensors`; -the selected candidate's IDs in each time step can be stored in a `TensorArray`, and `Packed` to the sentences translated. - -## Changing LoD's absolute offset to relative offsets -The current `LoDTensor` is designed to store levels of variable-length sequences. It stores several arrays of integers where each represents a level. - -The integers in each level represent the begin and end (not inclusive) offset of a sequence **in the underlying tensor**, -let's call this format the **absolute-offset LoD** for clarity. - -The absolute-offset LoD can retrieve any sequence very quickly but fails to represent empty sequences, for example, a two-level LoD is as follows -```python -[[0, 3, 9] - [0, 2, 3, 3, 3, 9]] -``` -The first level tells that there are two sequences: -- the first's offset is `[0, 3)` -- the second's offset is `[3, 9)` - -while on the second level, there are several empty sequences that both begin and end at `3`. -It is impossible to tell how many empty second-level sequences exist in the first-level sequences. - -There are many scenarios that rely on empty sequence representation, for example in machine translation or visual captioning, one instance has no translation or the empty candidate set for a prefix. - -So let's introduce another format of LoD, -it stores **the offsets of the lower level sequences** and is called **relative-offset** LoD. - -For example, to represent the same sequences of the above data - -```python -[[0, 3, 6] - [0, 2, 3, 3, 3, 9]] -``` - -the first level represents that there are two sequences, -their offsets in the second-level LoD is `[0, 3)` and `[3, 5)`. - -The second level is the same with the relative offset example because the lower level is a tensor. -It is easy to find out the second sequence in the first-level LoD has two empty sequences. - -The following examples are based on relative-offset LoD. - -## Usage in a simple machine translation model -Let's start from a simple machine translation model that is simplified from the [machine translation chapter](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation) to draw a blueprint of what a sequence decoder can do and how to use it. - -The model has an encoder that learns the semantic vector from a sequence, and a decoder which uses the sequence encoder to generate new sentences. - -**Encoder** -```python -import paddle as pd - -dict_size = 8000 -source_dict_size = dict_size -target_dict_size = dict_size -word_vector_dim = 128 -encoder_dim = 128 -decoder_dim = 128 -beam_size = 5 -max_length = 120 - -# encoder -src_word_id = pd.data( - name='source_language_word', - type=pd.data.integer_value_sequence(source_dict_dim)) -src_embedding = pd.embedding(size=source_dict_size, size=word_vector_dim) - -src_word_vec = pd.lookup(src_embedding, src_word_id) - -encoder_out_seq = pd.gru(input=src_word_vec, size=encoder_dim) - -encoder_ctx = pd.last_seq(encoder_out_seq) -# encoder_ctx_proj is the learned semantic vector -encoder_ctx_proj = pd.fc( - encoder_ctx, size=decoder_dim, act=pd.activation.Tanh(), bias=None) -``` - -**Decoder** - -```python -def generate(): - decoder = pd.while_loop() - with decoder.step(): - decoder_mem = decoder.memory(init=encoder_ctx) # mark the memory - generated_ids = decoder.memory() # TODO init to batch_size s - generated_scores = decoder.memory() # TODO init to batch_size 1s or 0s - - target_word = pd.lookup(trg_embedding, gendrated_ids) - # expand encoder_ctx's batch to fit target_word's lod - # for example - # decoder_mem.lod is - # [[0 1 3], - # [0 1 3 6]] - # its tensor content is [a1 a2 a3 a4 a5] - # which means there are 2 sentences to translate - # - the first sentence has 1 translation prefixes, the offsets are [0, 1) - # - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6) - # the target_word.lod is - # [[0, 1, 6] - # [0, 2, 4, 7, 9 12]] - # which means 2 sentences to translate, each has 1 and 5 prefixes - # the first prefix has 2 candidates - # the following has 2, 3, 2, 3 candidates - # the encoder_ctx_expanded's content will be - # [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5] - encoder_ctx_expanded = pd.lod_expand(encoder_ctx, target_word) - decoder_input = pd.fc( - act=pd.activation.Linear(), - input=[target_word, encoder_ctx_expanded], - size=3 * decoder_dim) - gru_out, cur_mem = pd.gru_step( - decoder_input, mem=decoder_mem, size=decoder_dim) - scores = pd.fc( - gru_out, - size=trg_dic_size, - bias=None, - act=pd.activation.Softmax()) - # K is an config - topk_scores, topk_ids = pd.top_k(scores, K) - topk_generated_scores = pd.add_scalar(topk_scores, generated_scores) - - selected_ids, selected_generation_scores = decoder.beam_search( - topk_ids, topk_generated_scores) - - # update the states - decoder_mem.update(cur_mem) # tells how to update state - generated_ids.update(selected_ids) - generated_scores.update(selected_generation_scores) - - decoder.output(selected_ids) - decoder.output(selected_generation_scores) - -translation_ids, translation_scores = decoder() -``` -The `decoder.beam_search` is an operator that, given the candidates and the scores of translations including the candidates, -returns the result of the beam search algorithm. - -In this way, users can customize anything on the input or output of beam search, for example: - -1. Make the corresponding elements in `topk_generated_scores` zero or some small values, beam_search will discard this candidate. -2. Remove some specific candidate in `selected_ids`. -3. Get the final `translation_ids`, remove the translation sequence in it. - -The implementation of sequence decoder can reuse the C++ class: [RNNAlgorithm](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/paddle/operators/dynamic_recurrent_op.h#L30), -so the python syntax is quite similar to that of an [RNN](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop). - -Both of them are two-level `LoDTensors`: - -- The first level represents `batch_size` of (source) sentences. -- The second level represents the candidate ID sets for translation prefix. - -For example, 3 source sentences to translate, and has 2, 3, 1 candidates. - -Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape, and an `lod_expand` operator is used to expand the LoD of the previous state to fit the current state. - -For example, the previous state: - -* LoD is `[0, 1, 3][0, 2, 5, 6]` -* content of tensor is `a1 a2 b1 b2 b3 c1` - -the current state is stored in `encoder_ctx_expanded`: - -* LoD is `[0, 2, 7][0 3 5 8 9 11 11]` -* the content is - - a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates) - - a2 a2 - - b1 b1 b1 - - b2 - - b3 b3 - - None (c1 has 0 candidates, so c1 is dropped) - -The benefit from the relative offset LoD is that the empty candidate set can be represented naturally. - -The status in each time step can be stored in `TensorArray`, and `Pack`ed to a final LoDTensor. The corresponding syntax is: - -```python -decoder.output(selected_ids) -decoder.output(selected_generation_scores) -``` - -The `selected_ids` are the candidate ids for the prefixes, and will be `Packed` by `TensorArray` to a two-level `LoDTensor`, where the first level represents the source sequences and the second level represents generated sequences. - -Packing the `selected_scores` will get a `LoDTensor` that stores scores of each translation candidate. - -Packing the `selected_generation_scores` will get a `LoDTensor`, and each tail is the probability of the translation. - -## LoD and shape changes during decoding -

                                - -

                                - -According to the image above, the only phase that changes the LoD is beam search. - -## Beam search design -The beam search algorithm will be implemented as one method of the sequence decoder and has 3 inputs: - -1. `topk_ids`, the top K candidate ids for each prefix. -2. `topk_scores`, the corresponding scores for `topk_ids` -3. `generated_scores`, the score of the prefixes. - -All of these are LoDTensors, so that the sequence affiliation is clear. Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix. - -It will return three variables: - -1. `selected_ids`, the final candidate beam search function selected for the next step. -2. `selected_scores`, the scores for the candidates. -3. `generated_scores`, the updated scores for each prefix (with the new candidates appended). - -## Introducing the LoD-based `Pack` and `Unpack` methods in `TensorArray` -The `selected_ids`, `selected_scores` and `generated_scores` are LoDTensors that exist at each time step, -so it is natural to store them in arrays. - -Currently, PaddlePaddle has a module called `TensorArray` which can store an array of tensors. It is better to store the results of beam search in a `TensorArray`. - -The `Pack` and `UnPack` in `TensorArray` are used to pack tensors in the array to an `LoDTensor` or split the `LoDTensor` to an array of tensors. -It needs some extensions to support the packing or unpacking an array of `LoDTensors`. diff --git a/develop/doc_cn/_sources/design/optimizer.md.txt b/develop/doc_cn/_sources/design/optimizer.md.txt deleted file mode 100644 index 691081c268b848811bf5ee6d6a41edfe0f47eec0..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/optimizer.md.txt +++ /dev/null @@ -1,91 +0,0 @@ -## Optimizer Design - -### The Problem - -A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works: - -1. the forward pass, which computes intermediate results and the cost(s), -1. the backward pass, which derives gradients from intermediate results and costs, and -1. the optimization pass, which update model parameters to optimize the cost(s). - -These works rely on three kinds of operators: - -1. forward operators, -1. gradient operators, and -1. optimization operators. - -It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically. - -In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass. - - -### High-level Python API to describe the training process - -1. User write code to describe the network: - - ```python - images = layer.data("images") - labels = layer.data("labels") - w1 = pd.var("w1") - b1 = pd.var("b1") - hidden = layer.fc(images, w=w1, b=b1) - cost = layer.mse(hidden, labels) - ``` - - The above code snippet will create forward operators in [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md). - - -2. Users create a certain kind of Optimizer with some argument. - - ```python - optimizer = AdagradOptimizer(learing_rate=0.001) - ``` - -3. Users use the optimizer to `minimize` a certain `cost` through updating parameters in parameter_list. - - ```python - opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1]) - ``` - The above code snippet will create gradient and optimization operators in Block. The return value of `minimize()` is list of optimization operators that will be run by session. - -4. Users use Session/Executor to run this opt_op_list as target to do training. - - ```python - sess.run(target= opt_op_list, ...) - ``` - -#### Optimizer Python interface: - -```python -class Optimizer(object): - """Optimizer Base class. - - """ - - def __init__(self): - pass - - def create_optimization_pass(self, parameters_and_grads): - """Add optimization operators to update gradients to variables. - - Args: - parameters_and_grads: a list of (variable, gradient) pair to update. - - Returns: - optmization_op_list: a list of optimization operator that will update parameter using gradient. - """ - return None - - def minimize(self, loss, parameter_list): - """Add operations to minimize `loss` by updating `parameter_list`. - - This method combines interface `append_backward()` and - `create_optimization_pass()` into one. - """ - params_grads = self.create_backward_pass(loss, parameter_list) - update_ops = self.create_optimization_pass(params_grads) - return update_ops - -``` - -Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer. diff --git a/develop/doc_cn/_sources/design/paddle_nccl.md.txt b/develop/doc_cn/_sources/design/paddle_nccl.md.txt deleted file mode 100644 index c7dac70998a6cfec3a6d2fc72b698ff9722e6805..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/paddle_nccl.md.txt +++ /dev/null @@ -1,65 +0,0 @@ -# Design Doc: NCCL support in Paddle Fluid - -## Abstract - -This Design Doc refers to the NCCL feature in paddle. We propose an approach to support NCCL library both on a single machine and multiple machines. We wrapper the NCCL primitives `Broadcast`, `Allreduce`, `Reduce` as operators to utilize Multi-GPU powers in one script. - - -## Motivation - -[NCCL](https://developer.nvidia.com/nccl) is a NVIDIA library support Multi-GPU communicating and optimized for NVIDIA GPUs, it provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, that can achieve high bandwidth over PCIe and NVLink high-speed interconnect. With NCCL library, we can easily accelerate the training in parallel. - -- Pros -1. easily plug-in with [NCCL2](https://developer.nvidia.com/nccl) library. -1. high performance in NVIDIA GPUs. -1. MPI like primitives, which have low learning cost for users. - -- Cons -1. Only design for NVIDIA GPUs, not a general multi-device solution. -1. Although NCCL1 is opensourced under BSD license, but NCCL2 is not opensourced anymore. - -At the beginning of training, the framework needs to distribute the same parameters to every GPU, and merge the gradients at any time user interests. - -As a result, during training, we need the operations of peer to peer copy between different GPUs, aggregating gradients/parameters from GPUs, and broadcasting parameters to GPUs. Every GPU only need to run the operator with correct place information. - -Besides, it needs interfaces to synchronize model update with each different GPU Cards. - -## Implementation - -As mentioned above, we wrap the NCCL routines as several kinds of operators. Need to note that NCCL need to create Communicator between gpu at the beginning, so there is a NCCLInit operator created. - -### Transpiler - -To be compatible with [parameter server design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/ops/dist_train.md), the transpiler compiles the user defined operation graph into sub-graphs to be executed on different devices. - -1. The user-defined model will be a single device program - -2. Broadcast/Reduce operators between GPUs will be inserted into the program, even for the multi-node, may insert the `Send`, `Recv` operator. - - *Broadcast, AllReduce in a single machine. And Broadcast, AllReduce, [Send, Recv](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/ops/dist_train.md#graph-converter) in multiple machines* - - - -After compiling, the graph as shows - - - -Operators are added to the sub-graphs. Every GPU assigned a role of `rank0`, `rank1` etc. - -- **Broadcast**. Broadcast operator distribute initialized parameter to all the GPUs from the GPU who owns it. e.g. from`rank0` GPU. -- **AllReduce**. AllReduce operator synchronizes parameters/gradients between GPUs. AllReduce implemented in the Ring-Based communicating method, avoid of the bottle neck in a single GPU. - -Need to notice that AllReduce operator force GPUs synchronized at that point. The whole training process in asynchronous or synchronous mode depends on the AllReduce point in the graph. - -As it shown in the picture, when each GPU compute the gradient of `W`, followed with a `AllReduce` operator, accumulate the `dW` to full batch of data, then run the optimize process individually and apply the gradient to its `W`. - -- **AllReduce** - Need to note that our AllReduce operator is a ring-base AllReduce implementation. If we use the NCCL2 AllReduce primitive, every GPU optimized full batch of data, wasted (n-1) GPU compute resources. In addition, NCCL2 built-in AllReduce will only utilize the communicating resource during synchronization, then update the gradient will be a subsequent phase. In fact, we can amortize the update gradient time cost into the communicating phase. The process is -1. Every parameter has its root card. That card will responsible for aggregating the gradients from GPUs. -2. The whole model's parameter will be hashed to different root card, ensure the load balance between GPUs. -3. Logically neighberhood card will start send parameter to the next one. After one round, the parameter main card will aggregate the full gradients. -4. Then the root card will optimize the parameter. -5. This parameter card will send its optimized result to its neighberhood, then the neighberhood will send parameter to its next one. -6. Finish the sychronization round. - -The total time cost will be 2 * (n-1) * per-parameter-send-time, we reach the goal of amortize the upgrade time into communicating phase. diff --git a/develop/doc_cn/_sources/design/parallel_do.md.txt b/develop/doc_cn/_sources/design/parallel_do.md.txt deleted file mode 100644 index 42bd136f825986d94fafaeaa5f58edb02848a74c..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/parallel_do.md.txt +++ /dev/null @@ -1,163 +0,0 @@ -# Design Doc: Parallel_Do in PaddlePaddle - -In PaddlePaddle, we use parallel_do primitive to represent multithread data parallel processing. - -## Design overview - -The definition of a parallel_do op looks like the following - -```c++ -AddInput(kInputs, "Inputs needed to be split onto different devices").AsDuplicable(); -AddInput(kParameters, "Parameters are duplicated over different devices") - .AsDuplicable(); -AddInput(kPlaces, "Devices used for parallel processing"); -AddOutput(kOutputs, "Outputs needed to be merged from different devices").AsDuplicable(); -AddOutput(kParallelScopes, - "Scopes for all local variables in forward pass. One scope for each device"); -AddAttr(kParallelBlock, - "List of operaters to be executed in parallel"); -``` - -A vanilla implementation of parallel_do can be shown as the following (`|` means single thread and -`||||` means multiple threads) - -``` -In the forward pass - | Split input onto different devices - | Copy parameter onto different devices - |||| Compute forward pass in parallel - | Merge output from different devices - -In the backward pass - | Split output@grad onto different devices - |||| Compute backward pass in parallel - | accumulate param@grad from different devices to the first device - | Merge input@grad from different devices -  | Copy param@grad to the place of parallel_do_op -``` - -This implementation allows to write mixed device program like this - -```python -W1 = fluid.tensor(size=[100,20], parameter=true) -W2 = fluid.tensor(size=[20,15], parameter=true) - -data = layers.data() - -gpu_places = layers.get_place(use_gpu=True) -# parallel processing on multiple GPUs -pd = ParallelDo(gpu_places) -with pd.do(input=data): - prediction = softmax(fc(fc(data, W1), W2)) - write_output(prediction) -prediction = pd() -loss = cross_entropy(prediction, label) -``` - -And the programDesc are like the following - -``` -# start_program will be run by executor(CPUPlace), all w1, w2 will be allocated on CPU -start_program -{ - vars: w1, w2 - ops: init(w1), init(w2) -} - -main_program -{ -block0 { - vars: data, places, w1, w2, w1_grad, w2_grad, - ops: data, get_place, parallel_do(block1), - parallel_do_grad(block2), - sgd(w2, w2_grad), - sgd(w1, w1_grad) -} -block1 { # the forward pass - parent_block: 0 - vars: data, h1, h2, loss - ops: fc, fc, softmax -} -block2 { # the backward pass - parent_block: 1 - vars: data_grad, h1_grad, h2_grad, loss_gard, local_w1_grad, local_w2_grad - ops: softmax_grad, - fc_grad - fc_grad -} -} -``` - -## Performance Imporvement - -There are serial places we can make this parallel_do faster. - -### forward: split input onto different devices - -If the input of the parallel_do is independent from any prior opeartors, we can avoid this step by -prefetching the input onto different devices in a seperate background thread. And the python code -looks like this. -```python -pd = ParallelDo(gpu_places) -with pd.do(): -    feature = get_data_from_prefetch_queue(gpu_places) - prediction = my_net(feature) - write_output(activation) -``` - -### forward: Copy parameter to onto different devices - -We can avoid this step by making each device have a copy of the parameter. This requires: - -1. `fluid.default_start_up_program()` to be run on all devices -1. In the backward, allreduce param@grad at different devices, this requires - 1. `backward.py` add `allreduce` operators at parallel_do_grad - 1. `allreduce` operators need to be called in async mode to achieve maximum throughput -1. apply gradients related op(i.e. cliping, normalization, decay, sgd) on different devices in parallel - -By doing so, we also avoided "backward: accumulate param@grad from different devices to the first device". -And the ProgramDesc looks like the following - -``` -# w1, w2 will be allocated on all GPUs -start_program -{ -block0 { - parallel_do(block1) -} -block1 { - parent_block: 0 - vars: w1, w2 - ops: init(w1), init(w2) -} -} - -main_program -{ -block0 { - vars: data, places, w1, w2 - ops: data, get_place, parallel_do(block1), - parallel_do_grad(block2), # append_backward - parallel_do(block3) # append_optimization - -} -block1 { - parent_block: 0 - vars: data, h1, h2, loss - ops: fc, fc, softmax -} -block2 { - parent_block: 1 - vars: data_grad, h1_grad, h2_grad, loss_gard, w1_grad, w2_grad - ops: softmax_grad, - fc_grad, allreduce(places, scopes, w1_grad), - fc_grad, allreduce(places, scopes, w2_grad) -} -block3 { - parent_block: 0 - vars: lr - ops: sgd(w2, w2_grad), - sgd(w1, w1_grad) -} -} -``` diff --git a/develop/doc_cn/_sources/design/parameter_average.md.txt b/develop/doc_cn/_sources/design/parameter_average.md.txt deleted file mode 100644 index 2c4edee9fe31d502ea62b9fe5c8757c0a4c5e79f..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/parameter_average.md.txt +++ /dev/null @@ -1,72 +0,0 @@ -# Averaging Parameter in PaddlePaddle - -## Why Averaging -In a large scale machine learning setup where the size of the training data is huge, it could take us a large number of iterations over the training data before we can achieve the optimal values of parameters of our model. Looking at the problem setup, it is desirable if we can obtain the optimal values of parameters by going through the data in as few passes as we can. - -Polyak and Juditsky (1992) showed that the test performance of simple average of parameters obtained by Stochastic Gradient Descent (SGD) is as good as that of parameter values that are obtained by training the model over and over again, over the training dataset. - -Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for
                                . The averaging is done as follows: - -
                                - -We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above. - -### How to perform Parameter Averaging in PaddlePaddle - -Parameter Averaging in PaddlePaddle works in the following way during training : -1. It will take in an instance of a normal optimizer as an input, e.g. RMSPropOptimizer -2. The optimizer itself is responsible for updating the parameters. -3. The ParameterAverageOptimizer maintains a separate copy of the parameters for itself: - 1. In concept, the values of this copy are the average of the values of the parameters in the most recent N batches. - 2. However, saving all the N instances of the parameters in memory is not feasible. - 3. Therefore, an approximation algorithm is used. - -Hence, overall we have have two copies of the parameters: one for the optimizer itself, and one for the ParameterAverageOptimizer. The former should be used in back propagation, while the latter should be used during testing and should be saved. - -During the testing/ saving the model phase, we perform the following steps: -1. Perform the delayed operations. -2. Save current values of the parameters to a temporary variable. -3. Replace the values of the parameters with the averaged values. -4. Perform testing and/or save the parameters. -5. Restore the values of the parameters once done. - -### How to implement Averaging of Parameter in PaddlePaddle - -We can add the ParameterAverageOptimizer op to the graph through Python API. Using this approach, we manually add this op to the graph and direct the output of the optimizer op to this op during training. - - **Advantages**: - - Allows for greater flexibility to the users of PaddlePaddle. Using this approach, the users can plug different optimizers into ParameterAverageOptimizer by passing in the optimizer to the op. - - Makes it easy for the users to customize and extend the framework. - - **Disadvantages**: - - Implementation requires re-writing the averaging methodology in Python. - -### Low-Level implementation - -In the new design, we propose to create a new operation for averaging parameter updates (ParameterAverageOptimizer). For now, we can add an op that takes in the following as input: -- the optimizer -- the window_size to keep the updates - -The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for [Operators](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/rmsprop_op.h). We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU. - -The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) in Python API. - -### Python API implementation for ParameterAverageOptimizer - -Based on Polyak and Juditsky (1992), we can generalize the averaging of updates to any optimizer. The input to the op would be the following: -- Any optimizer (RMSProp , AdaGrad etc.) -- A window size. The op keeps accumulating updated parameter values over a window of N batches and takes an average. Move the averaged value to a buffer when window is full to avoid loss of precision. - -Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions. -We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other [Optimizers](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/rmsprop_op.cc) - -#### Creation of the ParameterAverageOptimizer operator -There are two ways for creating the ParameterAverageOptimizer op: -1. We create the op immediately while building the computation graph. -2. We add the op in a lazy manner, just before the backward pass, similar to the way the optimization ops are added. - -The proposal is to add the op immediately while building the computation graph. - -#### High-level API - -In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions. diff --git a/develop/doc_cn/_sources/design/parameters_in_cpp.md.txt b/develop/doc_cn/_sources/design/parameters_in_cpp.md.txt deleted file mode 100644 index a7ac3f17c44ca94a669a8f1e283b291bceb42317..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/parameters_in_cpp.md.txt +++ /dev/null @@ -1,41 +0,0 @@ -# Design Doc: The C++ Class `Parameters` - -`Parameters` is a concept we designed in PaddlePaddle V2 API. `Parameters` is a container of parameters, which makes PaddlePaddle capable of sharing parameter between topologies. We described usages of `Parameter` in [api.md](./api.md). - -We used Python to implement Parameters when designing V2 API before. There are several defects for the current implementation: -* We just use `memcpy` to share Parameters between topologies, but this is very inefficient. -* We did not support sharing Parameters while training. We just trigger `memcpy` when start training. - -It is necessary that we implement Parameters in CPP side. However, it could result a code refactoring for PaddlePaddle, because PaddlePaddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current PaddlePaddle implementation, there are three concepts associated with `Parameters`: - -1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`. -It is evident that we should use `paddle::Parameter` when developing `Parameters`. -However, the `Parameter` class contains many functions and does not have a clear interface. -It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`. -When we developing `Parameters`, we only use `create/store Parameter` functionality. -We should extract functionalities of Parameter into many classes to clean PaddlePaddle CPP implementation. - -2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`. -We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies. -Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs. -`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device. - -3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. -So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD). - - -The step by step approach for implementation Parameters in PaddlePaddle C++ core is listed below. Each step should be a PR and could be merged into PaddlePaddle one by one. - -1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters. - -2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member. - -3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies. -Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs. -`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`. - * We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs. - * The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler. - -4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies. - -5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear. diff --git a/develop/doc_cn/_sources/design/profiler.md.txt b/develop/doc_cn/_sources/design/profiler.md.txt deleted file mode 100644 index b20b5efdc1f1f10ce7cec835adcc6fb374ed4e20..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/profiler.md.txt +++ /dev/null @@ -1,97 +0,0 @@ -## Introduction - -There are many performance analysis tools for [different programming languages and different software frameworks](https://en.wikipedia.org/wiki/List_of_performance_analysis_tools). For most popular deep learning frameworks, they use several programming languages and adapt to heterogeneous platforms. Similar to most of the deep learning frameworks, PaddlePaddle also uses C++, CUDA and Python as the basic programming languages to adapt to run on CPU and GPU devices. The [`nvprof` tools](http://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview) is usually used to analyse the CUDA program. We have [a document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/optimization/cpu_profiling.md) to profile CPU and Python program by [yep](https://pypi.python.org/pypi/yep) and [Google's perftools](https://github.com/google/pprof) to profile only the CPU and Python program. But for [PaddlePaddle fluid](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/fluid.md), the operator is the basic computing unit. The developers usually want to collect the time of each operator and locate bottlenecks. The `nvprof` usually collect the timeline of CUDA-related activities on both CPU and GPU, including kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels. And the `yep` and `Google's perftools` can't collect the timeline for CUDA program. All these tools can't collect time in the operator level. So we design this profiling tool. - -## Architecture - -The work flow for most task is as follows. Each operator will run many times in the all iterations. So the profiler must collect the total time of each operator during the iteration. For more, sometimes, the developers may want to collect more detailed time span inside the operator or record time span for elsewhere, this requires that the profiler must support to record the nested time span. And in order to speedup training, all the deep learning frameworks support parallel computing, including multiple threads on CPU and multiple GPUs. So the profiler must be able to collect the timeline for each thread. In addition, the profiler also occupies certain resources. It must can be easily to be enabled or disabled by the developers. At last, the profiler should present a human-readable report. - -```python -for i in xrange(M): # M is the iteration number - for op in operator_lists: # The `operator_lists` contains all the operators in the network. - op.run(); -``` - -In summary, the proflier should have following features: - -- records time span in loop. -- supports nested time span. -- supports multiple threads/multiple GPUs. -- supports to be enabled and disabled by users. - -But how to record the time for the mixed C++ and CUDA program? There many C++ APIs to get the current calendar time in host program. But for GPU, the CUDA kernels may be executed concurrently if they are in different [streams](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#streams) and the CUDA kernels is asynchronous with the host program if there is no the synchronous aftern the CUDA kernels. CUDA provides [event](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#events) to monitor the device and perform accurate timing. Inspired by PyTorch and CUDA event, we also design and apply the events to record the timeline. Then summarize and present statistics based on these events. - -The overall flow is shown as the following figure. - -
                                - -### Event - -In above work flow, a pair of events are needed before and after the piece of code to collect time. So the event has a flag to mark whether it is a starting event or an ending event. Except this two kinds of event, sometime, a only marker with a text message is needed, for example, a marker to specify the profiling start or end. There are three kinds of event: - -```c++ -enum EventKind { - kMark, - kPushRange, - kPopRange}; -``` -- kMark: only a marker without time range. -- kPushRange: mark the starting event for time range. -- kPopRange: mark the ending event for time range. - -For the CPU code, the events only need to record the current time. For the CUDA code, the [event management functions of CUDA](http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT) are used. For many pieces of code, an event lists are used to record each piece. - -```c++ -class Event { - public: - // The DeviceContext is used to get current CUDA stream. - Event(EventKind kind, std::string name, uint32_t thread_id, - const platform::DeviceContext* dev_ctx = nullptr); - double CpuElapsedUs(const Event& e) const; - double CudaElapsedUs(const Event& e) const; - - private: - EventKind kind_; - std::string name_; - uint32_t thread_id_; - int64_t cpu_ns_; -#ifdef PADDLE_WITH_CUDA - cudaEvent_t event_ = nullptr; - int device_ = -1; -#endif -}; - -struct EventList { - std::forward_list> event_blocks; -}; -``` - -As mentioned above, there is no need to record the timeline when disabling the profiler. So there is a global state to enable or disable the profiler. - -```c++ -enum ProfilerState { - kDisabled, - kCPU, - kCUDA -}; -ProfilerState g_state; -``` -- kDisabled: the disabled state. -- kCPU: CPU profiling state. -- kCUDA: GPU profiling state. - -A pair of starting and ending events are pushed to event lists in constructor and destructor of `RecordEvent`. So the timeline is recorded for the code in the lifecycle of an object of `RecordEvent`. - -```c++ -struct RecordEvent { - explicit RecordEvent(const std::string name, - platform::DeviceContext* dev_ctx = nullptr) { - if (kState == ProfilerState::kDisabled) return; - // push the starting event to the event lists. - } - ~RecordEvent() { - if (kState == ProfilerState::kDisabled) return; - // push the ending event to the event lists. - } -}; -``` diff --git a/develop/doc_cn/_sources/design/program.md.txt b/develop/doc_cn/_sources/design/program.md.txt deleted file mode 100644 index bd2456787c4e336d357a65255a8274a7c9e465cc..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/program.md.txt +++ /dev/null @@ -1,139 +0,0 @@ -# Design Doc: PaddlePaddle Programs - -## Compile and Execution - -A PaddlePaddle program consists of two parts -- the first generates a `ProgramDesc` protobuf message that describes the program, and the second runs this message using a C++ class `Executor`. - -A simple example PaddlePaddle program can be found in [graph.md](./graph.md): - -```python -x = layer.data("images") -l = layer.data("label") -y = layer.fc(x) -cost = layer.mse(y, l) -optimize(cost) -train(cost, reader=mnist.train()) -``` - -The first five lines of the following PaddlePaddle program generates, or, compiles, the `ProgramDesc` message. The last line runs it. - -## Programs and Blocks - -The basic structure of a PaddlePaddle program is some nested blocks, as a C++ or Java program. - -- program: some nested blocks -- [block](./block.md): - - some local variable definitions, and - - a sequence of operators - -The concept of block comes from usual programs. For example, the following C++ program has three blocks: - -```c++ -int main() { // block 0 - int i = 0; - if (i < 10) { // block 1 - for (int j = 0; j < 10; j++) { // block 2 - } - } - return 0; -} -``` - -The following PaddlePaddle program has three blocks: - -```python -import paddle as pd // block 0 - -x = minibatch([10, 20, 30]) # shape=[None, 1] -y = var(1) # shape=[1], value=1 -z = minibatch([10, 20, 30]) # shape=[None, 1] -cond = larger_than(x, 15) # [false, true, true] - -ie = pd.ifelse() -with ie.true_block(): // block 1 - d = pd.layer.add_scalar(x, y) - ie.output(d, pd.layer.softmax(d)) -with ie.false_block(): // block 2 - d = pd.layer.fc(z) - ie.output(d, d+1) -o1, o2 = ie(cond) -``` - -## `BlockDesc` and `ProgramDesc` - -All protobuf messages are defined in `framework.proto`. - -`BlockDesc` is straight-forward -- it includes local variable definitions, `vars`, and a sequence of operators, `ops`. - -```protobuf -message BlockDesc { - required int32 parent = 1; - repeated VarDesc vars = 2; - repeated OpDesc ops = 3; -} -``` - -The parent ID indicates the parent block so that operators in a block can refer to variables defined locally and also those defined in their ancestor blocks. - -All hierarchical blocks in a program are flattened and stored in an array. The block ID is the index of the block in this array. - -```protobuf -message ProgramDesc { - repeated BlockDesc blocks = 1; -} -``` - - -### Global Block - -The global block is the first one in the above array. - -## Operators that Use Blocks - -In the above example, the operator `IfElseOp` has two blocks -- the true branch and the false branch. - -The definition of `OpDesc` shows that an operator could have some attributes: - -```protobuf -message OpDesc { - AttrDesc attrs = 1; - ... -} -``` - -and an attribute could be of type block, which is, in fact, a block ID as described above: - -``` -message AttrDesc { - required string name = 1; - - enum AttrType { - INT = 1, - STRING = 2, - ... - BLOCK = ... - } - required AttrType type = 2; - - optional int32 block = 10; // when type == BLOCK - ... -} -``` - -## InferShape - -With this design, the InferShape function should take the following parameters: - -```c++ -void InferShape(int current_block, - int current_operator, - ProgramDesc* program // might change VarDesc values. - ) { - ... -} -``` - -where - -- `current_block` indices into `ProgramDesc::blocks`, -- `current_operator` indices into `BlockDesc::ops`. diff --git a/develop/doc_cn/_sources/design/prune.md.txt b/develop/doc_cn/_sources/design/prune.md.txt deleted file mode 100644 index 4a5cf10c79a554779137f0cce5494fdd96ef6b7a..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/prune.md.txt +++ /dev/null @@ -1,63 +0,0 @@ -# Prune - -## Motivation - -We want to support running inference, training and checkpointing in one `ProgramDesc`. We implement -`void Prune(const ProgramDesc* input, ProgramDesc* output)` function, which takes a `ProgramDesc` -and generate a pruned `ProgramDesc`. - -## Challenge - -Pruning need to support both variables and operators being evaluation targets. Consider the following -different situations. - -```python -# Case 1: run foward pass. -cost_np = session.run(target=cost) -# Case 2: run backward passing. -opts_np, _ = session.run(target=[cost, opt]) -# Case 3: run checkpointing -_ = session.run(target=checkpoint) -``` - -## Solution - -To support evaluation of operators, we add `is_target` field in the `OpDesc`. - -```c++ -message OpDesc { - required string type = 3; - repeated Var inputs = 1; - repeated Var outputs = 2; - repeated Attr attrs = 4; - optional bool is_target = 5 [ default = false ]; -}; -``` - -To support evaluation of variables, we add [fetch_op](https://github.com/PaddlePaddle/Paddle/pull/4599). -For each variable in the `target`, we insert a `fetch_op` into the `ProgramDesc` with `variable` being -`fetch_op`'s input. Then we also set `fetch_op` is a target. - -### Algorithm - -If an operator needs to be run, it must fall into one of the following cases: - -1. It is the target. -2. It is depended by some other ops, meaning its output is some other op's input. - -The first case can be checked by `op_desc.is_traget()` . The second case can be implement as - -```c++ -bool HasDependentVar(const OpDesc& op_desc, const std::set& dependent_vars) { - for (auto& var : op_desc.outputs()) { - for (auto& argu : var.arguments()) { - if (dependent_vars.count(argu) != 0) { - return true; - } - } - } - return false; -} -``` - -Then the whole algorithm can be implemented as the following [code](https://github.com/tonyyang-svail/Paddle/blob/prune_impl/paddle/framework/prune.cc). diff --git a/develop/doc_cn/_sources/design/python_api.md.txt b/develop/doc_cn/_sources/design/python_api.md.txt deleted file mode 100644 index 73f6d7b90c7dca0d48109cf3d28d5f7cd56b5c0b..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/python_api.md.txt +++ /dev/null @@ -1,304 +0,0 @@ -# Design Doc: Python API - -Due to the refactorization of the PaddlePaddle core, we need Python classes to construct corresponding protobuf messages that describe a DL program. - -| Python classes | Protobuf messages | -| --- | --- | -| Program | ProgramDesc | -| Block | BlockDesc | -| Operator | OpDesc | -| Variable | VarDesc | - -Please be aware that these Python classes need to maintain some construction-time information, which are not part of the protobuf messages. - -## Core Concepts - -### Program - -A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), which is composed of an array of `BlockDesc`s. The `BlockDesc`s in a `ProgramDesc` can have a tree-like hierarchical structure. However, the `ProgramDesc` onlys stores a flattened array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator need to be able to access variables in its ancestor blocks. - -Whenever we create a block, we need to set its parent block to the current block, hence the Python class `Program` needs to maintain a data member `current_block`. - -```python -class Program(objects): - def __init__(self): - self.desc = core.NewProgram() # a C++ ProgramDesc pointer. - self.blocks = vector() - self.blocks.append(Block(self, -1)) # the global block - self.current_block = 0 # initialized to the global block - - def global_block(): - return self.blocks[0] - - def current_block(): - return self.get_block(self.current_block) - - def rollback(): - self.current_block = self.current_block().parent_idx - - def create_block(): - new_block_idx = len(self.block) - self.blocks.append(Block(self, self.current_block)) - self.current_block = new_block_idx - return current_block() -``` - -`Program` is an accessor to the protobuf message `ProgramDesc`, which is created in C++ space, because the InferShape function is in C++, which manipulates `VarDesc` messages, which are in turn members of `BlockDesc`, which is a member of `ProgramDesc`. - -`Program` creates the first block as the global block in its constructor. All parameters and their initializer operators are in the global block. - -### Block - -A [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md) includes - -1. a map from variable names to an instance of the Python `Variable` class, and -1. a list of `Operator` instances. - -```python -class Block(objects): - def __init__(self, program, parent_idx): - self.desc = core.NewBlock(program.desc) - self.program = program - self.vars = map() - self.ops = vector() - self.parent_idx = parent_idx - - def create_var(self, ...): - return Variable(self, ...) - - def _create_global_var(self, ...): - program.global_block().create_var(...) - - def create_parameter(self, name, ...): - # Parameter is a subclass of variable. See Parameter section for details. - self.vars[name] = Parameter(self._create_global_var(...), ...) - return self.vars[name] - - def append_operator(self, ...): - self.ops.append(Operator(self, ...)) - - def prepend_operator(self, ...): # Parameter's ctor prepands initialize operators. - self.ops.prepend(Operator(self, ...)) -``` - -`create_parameter` is necessary because parameters are global variables, defined in the global block, but can be created in some sub-blocks. For example, an FC layer in the step block of an RNN operator. - -`prepend_operator` is necessary because the constructor of `Parameter` needs to create the initialize (or load) operator of the parameter, and would like to put it in the *preamble* of the global block. - -### Operator - -The `Operator` class fills in the `OpDesc` message and calls the C++ function `InferShape` to infer the output shapes from the input shapes. - -```python -class Operator(object): - def __init__(self, - block, # Block - type, # string - inputs, # dict - outputs,# dict - attrs # dict - ): - self.desc = core.NewOpDesc(block.desc, type, inputs, outputs, attrs) - core.infer_shape(self.desc, inputs, outputs) - - def type(self): - return self.desc.type() -``` - -`Operator` creates the `OpDesc` message in C++ space, so that it can call the `InferShape` function, which is in C++. - -### Variable - -Operators take Variables as its inputs and outputs. - -```python -class Variable(object): - def __init__(self, - block=None, # Block - name=None, # string - shape, # tuple - dtype="float32", # string - lod_level=None # int - ): - if name is None: - name = unique_name_generator() - self.name = name - self.block = block - self.desc = core.NewVarDesc(block.desc, name, shape, lod_level) - self.writer = None -``` - -Please be aware of `self.writer`, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each write to a variable is represented by a Variable class. This is guaranteed by the fact that **`core.NewVarDesc` must NOT create a new `VarDesc` message if its name already exists in the specified block**. - -### Parameter - -A parameter is a global variable with an initializer (or load) operator. - -```python -class Parameter(Variable): - def __init__(self, - block=None, # Block - name=None, # string - shape, # tuple - dtype="float32", # string - lod_level=None # int - trainable, # bool - initialize_op_attrs, - optimize_op_attrs): - super(Parameter, self).__init__(block, name, shape, dtype, lod_level) - self.trainable = trainable - self.optimize_op_attrs = optimize_op_attrs - block.prepend(Operator(block, # Block - initialize_op_attrs['type'], # string - None, # no inputs - self, # output is the parameter - initialize_op_attrs) -``` - -When users create a parameter, they can call - -```python -program.create_parameter( - ..., - init_attr={ - type: "uniform_random", - min: -1.0, - max: 1.0, - }) -) -``` - -In above example, `init_attr.type` names an initialize operator. It can also name the load operator - -```python -init_attr={ - type: "load", - filename: "something.numpy", -} -``` - -`optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message. - -## Layer Function - -A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers. - -Layer functions take `Variable` and configuration parameters as its input and return the output variable(s). - -For example, `FullyConnected` take one or more variable as its input. The input could be input data or another layer's output. There are many configuration options for a `FullyConnected` layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The `FullyConnected` layer will return an output variable. - - -### Necessity for reusing code between layer functions - -There are a lot of code that can be reused. Such as - -* Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with `min = -1.0`, `max = 1.0`. and default initialize strategy for bias is to fill zero. -* Append the activation operator. -* Create a temporary variable. -* Create parameter. -* Generate a unique name. -* Add a bias. -* ... - -A mechanism to reuse code between layer functions is necessary. It will be around [150 lines of code](https://github.com/PaddlePaddle/Paddle/pull/4724/files#diff-823b27e07e93914ada859232ae23f846R12) if we write a `FullyConnected` layer without any helper functions. - - - -### Comparision between global functions and helper class - -The `FullyConnected` layer will be as follow when we provide global functions: - -```python -def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None): - if name is None: - name = unique_name("fc") - input = multiple_input(input) - param_attr = default_param_attr(param_attr) - param_attr = multiple_param_attr(param_attr, len(input)) - - # mul - mul_results = [] - for ipt, attr in zip(input, param_attr): - shape = ipt.shape[1:] + [size] - w = g_program.global_block().create_parameter(shape, ipt.dtype, name, attr) - tmp = create_tmp_var(name) - g_program.current_block().append_op("mul", {ipt, w}, {tmp}) - mul_results.append(tmp) - - # add sum - ... - # add bias - ... - # add activation - ... - return out -``` - -We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions: - -1. We need a namespace for these methods, then layer developers can quickly figure out what method they can use. -2. Global functions will force layer developers to pass its parameter time by time. - -So we provide a helper class, `LayerHelper`, to share code between layer functions. The `FullyConnected` Layer will be as follow. - -```python -def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None): - helper = LayerHelper(locals()) # pass all parameter to LayerHelper - - mul_results = [] - for ipt, param in helper.iter_multiple_input_and_param(): - w = helper.create_parameter(shape=ipt.shape[1:] + [size], dtype = ipt.dtype) - tmp = helper.create_tmp_variable() - helper.append_op('mul', {ipt, w}, {tmp}) - mul_results.append(tmp) - - pre_bias = helper.add_sum(mul_results) - pre_activation = helper.add_bias(pre_bias) - return helper.add_activation(pre_activation) -``` - -We not only use the fewer lines of code to write `fc_layer` but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing `helper.` in a python editor. - - -### Implementation of layer helper - -We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The `activation` is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of `add_activation` are: - -```python -class LayerHelper(object): - def __init__(self, **kwargs): # kwargs is short for `keyword arguments` - self.kwargs = kwargs - - def add_activation(self, input_var): - act = self.kwargs.get("act", None) # default value is None - if act is None: # do nothing if no act - return input_var - - tmp = self.create_tmp_var(self) - self.append_op(type=act, input=input_var, output=tmp) - return tmp -``` - -### Return value of layer functions - -The layer will return a Variable, which is also the output of an operator. However, outputs of a layer function have more attributes than an operator. There are parameter variables, and their gradient variables need to return. To return them is useful. For example, - -1. Users can debug the network by printing parameter gradients. -2. Users can append attributes to a parameter, such as, `param.stop_gradient=True` will make a parameter stop generate the gradient. We can fix the parameter value during training by using this attribute. - -However, it is good to return a Variable for layers, since all layers and operators use Variables as their parameters. We can just append a `param` field and a `grad` field for layer function since the Python is dynamic typing. - -The sample usage is - -```python -data = fluid.layers.data(...) -hidden = fluid.layers.fc(data, ...) -... - -executor.run(fetch_list=[hidden.param, hidden.param.grad], ...) -``` - - -## Optimizer - -[Optimizer Design Doc](./optimizer.md) diff --git a/develop/doc_cn/_sources/design/reader/README.md.txt b/develop/doc_cn/_sources/design/reader/README.md.txt deleted file mode 100644 index 2cd4b6225b61cf374458e40afabad7745f61ba71..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/reader/README.md.txt +++ /dev/null @@ -1,206 +0,0 @@ -# Python Data Reader Design Doc - -During the training and testing phases, PaddlePaddle programs need to read data. To help the users write code that performs reading input data, we define the following: - -- A *reader*: A function that reads data (from file, network, random number generator, etc) and yields the data items. -- A *reader creator*: A function that returns a reader function. -- A *reader decorator*: A function, which takes in one or more readers, and returns a reader. -- A *batch reader*: A function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items. - -and also provide a function which can convert a reader to a batch reader, frequently used reader creators and reader decorators. - -## Data Reader Interface - -*Data reader* doesn't have to be a function that reads and yields data items. It can just be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`) as follows: - -``` -iterable = data_reader() -``` - -The item produced from the iterable should be a **single** entry of data and **not** a mini batch. The entry of data could be a single item or a tuple of items. Item should be of one of the [supported types](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int etc.) - -An example implementation for single item data reader creator is as follows: - -```python -def reader_creator_random_image(width, height): - def reader(): - while True: - yield numpy.random.uniform(-1, 1, size=width*height) - return reader -``` - -An example implementation for multiple item data reader creator is as follows: -```python -def reader_creator_random_image_and_label(width, height, label): - def reader(): - while True: - yield numpy.random.uniform(-1, 1, size=width*height), label - return reader -``` - -## Batch Reader Interface - -*Batch reader* can be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list should be a tuple. - -Here are some valid outputs: - -```python -# a mini batch of three data items. Each data item consist three columns of data, each of which is 1. -[(1, 1, 1), -(2, 2, 2), -(3, 3, 3)] - -# a mini batch of three data items, each data item is a list (single column). -[([1,1,1],), -([2,2,2],), -([3,3,3],)] -``` - -Please note that each item inside the list must be a tuple, below is an invalid output: -```python - # wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],). - # Otherwise it is ambiguous whether [1,1,1] means a single column of data [1, 1, 1], - # or three columns of data, each of which is 1. -[[1,1,1], -[2,2,2], -[3,3,3]] -``` - -It is easy to convert from a reader to a batch reader: - -```python -mnist_train = paddle.dataset.mnist.train() -mnist_train_batch_reader = paddle.batch(mnist_train, 128) -``` - -It is also straight forward to create a custom batch reader: - -```python -def custom_batch_reader(): - while True: - batch = [] - for i in xrange(128): - batch.append((numpy.random.uniform(-1, 1, 28*28),)) # note that it's a tuple being appended. - yield batch - -mnist_random_image_batch_reader = custom_batch_reader -``` - -## Usage - -Following is how we can use the reader with PaddlePaddle: -The batch reader, a mapping from item(s) to data layer, the batch size and the number of total passes will be passed into `paddle.train` as follows: - -```python -# two data layer is created: -image_layer = paddle.layer.data("image", ...) -label_layer = paddle.layer.data("label", ...) - -# ... -batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128) -paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...) -``` - -## Data Reader Decorator - -The *Data reader decorator* takes in a single reader or multiple data readers and returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` in the syntax. - -Since we have a strict interface for data readers (no parameters and return a single data item), a data reader can be used in a flexible way using data reader decorators. Following are a few examples: - -### Prefetch Data - -Since reading data may take some time and training can not proceed without data, it is generally a good idea to prefetch the data. - -Use `paddle.reader.buffered` to prefetch data: - -```python -buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100) -``` - -`buffered_reader` will try to buffer (prefetch) `100` data entries. - -### Compose Multiple Data Readers - -For example, if we want to use a source of real images (say reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661). - -We can do the following : - -```python -def reader_creator_random_image(width, height): - def reader(): - while True: - yield numpy.random.uniform(-1, 1, size=width*height) - return reader - -def reader_creator_bool(t): - def reader: - while True: - yield t - return reader - -true_reader = reader_creator_bool(True) -false_reader = reader_creator_bool(False) - -reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader) -# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry. -# And we don't care about the second item at this time. -paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...) -``` - -### Shuffle - -Given the shuffle buffer size `n`, `paddle.reader.shuffle` returns a data reader that buffers `n` data entries and shuffles them before a data entry is read. - -Example: -```python -reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512) -``` - -## Q & A - -### Why does a reader return only a single entry, and not a mini batch? - -Returning a single entry makes reusing existing data readers much easier (for example, if an existing reader returns 3 entries instead if a single entry, the training code will be more complicated because it need to handle cases like a batch size 2). - -We provide a function: `paddle.batch` to turn (a single entry) reader into a batch reader. - -### Why do we need a batch reader, isn't is sufficient to give the reader and batch_size as arguments during training ? - -In most of the cases, it would be sufficient to give the reader and batch_size as arguments to the train method. However sometimes the user wants to customize the order of data entries inside a mini batch, or even change the batch size dynamically. For these cases using a batch reader is very efficient and helpful. - -### Why use a dictionary instead of a list to provide mapping? - -Using a dictionary (`{"image":0, "label":1}`) instead of a list (`["image", "label"]`) gives the advantage that the user can easily reuse the items (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or even skip an item (e.g., using `{"image_a":0, "label":2}`). - -### How to create a custom data reader creator ? - -```python -def image_reader_creator(image_path, label_path, n): - def reader(): - f = open(image_path) - l = open(label_path) - images = numpy.fromfile( - f, 'ubyte', count=n * 28 * 28).reshape((n, 28 * 28)).astype('float32') - images = images / 255.0 * 2.0 - 1.0 - labels = numpy.fromfile(l, 'ubyte', count=n).astype("int") - for i in xrange(n): - yield images[i, :], labels[i] # a single entry of data is created each time - f.close() - l.close() - return reader - -# images_reader_creator creates a reader -reader = image_reader_creator("/path/to/image_file", "/path/to/label_file", 1024) -paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...) -``` - -### How is `paddle.train` implemented - -An example implementation of paddle.train is: - -```python -def train(batch_reader, mapping, batch_size, total_pass): - for pass_idx in range(total_pass): - for mini_batch in batch_reader(): # this loop will never end in online learning. - do_forward_backward(mini_batch, mapping) -``` diff --git a/develop/doc_cn/_sources/design/refactorization.md.txt b/develop/doc_cn/_sources/design/refactorization.md.txt deleted file mode 100644 index f93d6155e1764386b01d2f0df3f141ab75cd55d4..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/refactorization.md.txt +++ /dev/null @@ -1,249 +0,0 @@ -# Design Doc: Refactorization Overview - -The goals of refactoring include: - -1. Making it easy for external contributors to write new elementary computation operations. -1. Making the codebase clean and readable. -1. Designing a new computation representation -- a computation graph of operators and variables. -1. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs. - -## Computation Graphs - -1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs. - - 1. Please refer to [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a concrete example. - -1. Users write Python programs to describe the graphs and run them (locally or remotely). - -1. A graph is composed of *variables* and *operators*. - -1. The description of graphs must be serializable/deserializable, so that: - - 1. It can be sent to the cloud for distributed execution, and - 1. It can be sent to clients for mobile or enterprise deployment. - -1. The Python program does two things - - 1. *Compilation* runs a Python program to generate a protobuf message representation of the graph and send it to - 1. the C++ library `libpaddle.so` for local execution, - 1. the master process of a distributed training job for training, or - 1. the server process of a Kubernetes serving job for distributed serving. - 1. *Execution* executes the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message. - -## Description and Realization of Computation Graph - -At compile time, the Python program generates a protobuf message representation of the graph, or a description of the graph. - -At runtime, the C++ program realizes the graph and runs it. - -| | Representation (protobuf messages) | Realization (C++ class objects) | -|---|---|---| -|Data|[VarDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L107)|[Variable](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24)| -|Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)| -|Block|BlockDesc|Block| - -The word *graph* is interchangeable with *block* in this document. A graph consists of computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`). - -## Compilation and Execution - -1. Run a Python program to describe the graph. In particular, the Python application program does the following: - - 1. Create `VarDesc` to represent local/intermediate variables, - 1. Create operators and set attributes, - 1. Validate attribute values, - 1. Infer the type and the shape of variables, - 1. Plan memory-reuse for variables, - 1. Generate the backward graph - 1. Add optimization operators to the computation graph. - 1. Optionally, split the graph for distributed training. - -1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the Python program does the following: - - 1. Create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block, - 1. realize local variables defined in the BlockDesc message in the new scope, - 1. a scope is similar to the stack frame in programming languages, - - 1. Create an instance of class `Block`, in which, - 1. realize operators in the BlockDesc message, - - 1. Run the Block by calling - 1. `Block::Eval(vector* targets)` for forward and backward computations, or - 1. `Block::Eval(vector* targets)` for optimization. - - -## Intermediate Representation (IR) - -```text -Compile Time -> IR -> Runtime -``` - -### Benefits of IR - -- Optimization - ```text - Compile Time -> IR -> Optimized IR -> Runtime - ``` -- Automatically send partitioned IR to different nodes. - - Automatic Data Parallelism - ```text - Compile Time - |-> Single GPU IR - |-> [trainer-IR-0, trainer-IR-1, pserver-IR] - |-> Node-0 (runs trainer-IR-0) - |-> Node-1 (runs trainer-IR-1) - |-> Node-2 (runs pserver-IR) - ``` - - Automatic Model Parallelism (planned for future) - ---- - -# Operator/OpWithKernel/OpKernel - -![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/49caf1fb70820fb4a6c217634317c9306f361f36/op_op_with_kern_class_diagram.dot) - ---- - -# Operator -![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot) - -* `Operator` is the fundamental building block of the user interface. - * Operator stores input/output variable names and attributes. - * The `InferShape` interface is used to infer the shape of the output variables based on the shapes of the input variables. - * Use `Run` to compute the `output` variables from the `input` variables. - ---- - -# OpWithKernel/Kernel - -![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/9d7f4eba185cf41c8e2fbfb40ae21890dbddcd39/op_with_kernel.dot) - -* `OpWithKernel` inherits `Operator`. -* `OpWithKernel` contains a Kernel map. - * `OpWithKernel::Run` get device's kernel, and invoke `OpKernel::Compute`. - * `OpKernelKey` is the map key. Only device place now, but may be data type later. - ---- - -# Why separate Kernel and Operator - -* Separate GPU and CPU code. - * Make Paddle capable of running without GPU. -* Make one operator (which is a user interface) and create many implementations. - * For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel. ---- - -# Libraries for Kernel development - -* `Eigen::Tensor` contains basic math and element-wise functions. - * Note that `Eigen::Tensor` has broadcast implementation. - * Limit the number of `tensor.device(dev) = ` in your code. -* `thrust::transform` and `std::transform`. - * `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized element-wise kernels. - * `thrust`, in addition, supports more complex APIs, like `scan`, `reduce`, `reduce_by_key`. -* Hand-writing `GPUKernel` and `CPU` code - * Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.) ---- -# Operator Registration - -## Why is registration necessary? -We need a method to build mappings between Op type names and Op classes. - -## How is registration implemented? -Maintaining a map, whose key is the type name and the value is the corresponding Op constructor. - ---- -# The Registry Map - -### `OpInfoMap` - -`op_type(string)` -> `OpInfo` - -`OpInfo`: - -- **`creator`**: The Op constructor. -- **`grad_op_type`**: The type of the gradient Op. -- **`proto`**: The Op's Protobuf, including inputs, outputs and required attributes. -- **`checker`**: Used to check attributes. - ---- -# Related Concepts - -### Op_Maker -It's constructor takes `proto` and `checker`. They are completed during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)) - -### Register Macros -```cpp -REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class) -REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) -``` - ---- -# Registration Process -1. Write an Op class and its gradient Op class, if required. -2. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator. -3. Invoke the macro `REGISTER_OP`. This macro will - 1. Call maker class to complete `proto` and `checker` - 2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap` - ---- -# Backward Module (1/2) -### Create Backward Operator -- Mapping from forward Op to backward Op -![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png) - ---- -# Backward Module (2/2) -### Build Backward Network -- **Input**: a graph of forward operators -- **Output**: a graph of backward operators -- **Corner cases in construction** - - Shared Variables => insert an `Add` operator to combine gradients - - No Gradient => insert a `fill_zero_grad` operator - - Recursive NetOp => call `Backward` recursively - - RNN Op => recursively call `Backward` on stepnet - - RNN Op => recursively call `Backward` on stepnet - - ---- -# Scope, Variable, Tensor - -* `Tensor` is an n-dimension array with type. - * Only dims and data pointers are stored in `Tensor`. - * All operations on `Tensor` are written in `Operator` or global functions. - * Variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) -* `Variable` instances are the inputs and the outputs of an operator, not just `Tensor`. - * `step_scopes` in RNN is a variable and not a tensor. -* `Scope` is where variables are stored. - * map - * `Scope` has a hierarchical structure. The local scope can get variables from its parent scope. - ---- -# Block (in design) -## the difference between original RNNOp and Block -- As an operator is more intuitive than `RNNOp`, -- Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`, -- Fits the compile-time/ runtime separation design paradigm. - - During the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc` - - When graph executes, a Block with `BlockDesc` is passed. It then creates `Op` and `Var` instances and then invokes `Run`. - ---- -# Milestone -- Take Paddle/books as the main line, the requirement of the models motivates framework refactoring, -- Model migration - - Framework development gives **priority support** to model migration, for example, - - the MNIST demo needs a Python interface, - - the RNN models require the framework to support `LoDTensor`. - - Determine some timelines, - - Frequently used Ops need to be migrated first, - - Different models can be migrated in parallel. -- Improve the framework at the same time -- Accept imperfection, concentrate on solving the specific problem at the right price. - ---- -# Control the migration quality -- Compare the performance of migrated models with old ones. -- Follow the google C++ style guide. -- Build the automatic workflow of generating Python/C++ documentations. - - The documentation of layers and ops should be written inside the code. - - Take the documentation quality into account when submitting pull requests. - - Preview the documentations, read and improve them from a user's perspective. diff --git a/develop/doc_cn/_sources/design/register_grad_op.md.txt b/develop/doc_cn/_sources/design/register_grad_op.md.txt deleted file mode 100644 index 8d973eb53178c3e889c845144553a453e11f067c..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/register_grad_op.md.txt +++ /dev/null @@ -1,92 +0,0 @@ -# Design Doc: Gradient Operators Registration - - -## The Problem Posed - -Currently, for each C++ operator class definition, a *gradient operator creator* function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance. - -However, we noticed two problems with the current design: - -1. As we decided to separate the *compilation* and the *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message. - -1. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of *minus* operator consists of two operators -- an *identity* operator followed by a *scale* operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation. - -## The Current Implementation - -Instances of the C++ class `OpInfo` are stored an associative map whose key is the operator type. The `grad_op_type` indicates the associated gradient operator type. An operator can create the gradient operator by invoking `OpInfo::creator_` of the gradient operator. The pseudo code is as follows - -```cpp -struct OpInfo { - std::function creator_; - std::string grad_op_type_; - ... -}; - -map OpInfoMap; - -OperatorBase* CreateGradientOperator(const OperatorBase& op) { - return OpInfoMap.at(op.Type()).creator_(...); -} -``` - -## Proposed Solution - -The mapping relationship between an operator and its gradient operators is a function. The interface of this function is: - -```cpp -// (OpDesc) --> vector -std::function(const OpDescBind&)>; -``` - -The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for the protobuf message `OpDesc` for rapid manipulation of `OpDesc`. - -The `GradOpDescMaker` will be registered in `OpInfo` and will replace the `grad_op_type_` field. The `OpInfo` should look like - -```cpp -struct OpInfo { - std::function>(const OpDescBind&)> grad_op_maker_; - ... -}; -``` - -The `grad_op_maker_ ` is a `nullptr` if the operator does not have any associated gradient operators. - -We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is - -```cpp -class GradOpDescMakerBase { -public: - GradOpDescMakerBase(const OpDescBind& ); - virtual std::vector> operator()()const = 0; -}; -``` - -We can convert `GradOpDescMakerBase` to `std::function>(const OpDescBind&)>` by - -```cpp -using GradOpMaker = ...; -std::function(const OpDescBind&)> func; -func = [] (const OpDescBind& fwd_op) { - GradOpMaker maker(fwd_op); - return maker(); -}; -``` - -We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator. - -We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`. - -The user interface should be - -```cpp -vector MinusOpGradMaker(OpDesc) {...} -REGISTER_OPERATOR(minus, MinusOp, MinusOpProtoAndCheckerMaker, SumOpGradMaker); -// Developers can still manually implement gradient operator. -REGISTER_OPERATOR(minus_grad, MinusGradOp); -``` - -The interface of current `REGISTER_OP` macro could not be changed. In `REGISTER_OP`, it will invoke `REGISTER_OPERATOR` two times and generate GradOpDescMaker inside. - -```cpp -REGISTER_OP(minus, MinusOp, MinusOpProtoAndCheckerMaker, minus_grad, MinusGradOp); -``` diff --git a/develop/doc_cn/_sources/design/regularization.md.txt b/develop/doc_cn/_sources/design/regularization.md.txt deleted file mode 100644 index 21280ac898feb4dd5e5a5d9e88d121e856850f0b..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/regularization.md.txt +++ /dev/null @@ -1,72 +0,0 @@ -# Regularization in PaddlePaddle - -## Introduction to Regularization -A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. A frequently faced problem is the problem of **overfitting**, where the model does not make reliable predictions on new unseen data. **Regularization** is the process of introducing additional information in order to prevent overfitting. This is usually done by adding extra penalties to the loss function that restricts the parameter spaces that an optimization algorithm can explore. - -### Parameter Norm Penalties -Most common regularization approaches in deep learning are based on limiting the capacity of the models by adding a parameter norm penalty to the objective function `J`. This is given as follows: - -
                                - -The parameter `alpha` is a hyperparameter that weights the relative contribution of the norm penalty term, `omega`, relative to the standard objective function `J`. - -The most commonly used norm penalties are the L2 norm penalty and the L1 norm penalty. These are given as follows: - -##### L2 Regularization: -
                                - -##### L1 Regularization -
                                - -A much more detailed mathematical background of regularization can be found [here](http://www.deeplearningbook.org/contents/regularization.html). - -## Regularization Survey - -A detailed survey of regularization in various deep learning frameworks can be found [here](https://github.com/PaddlePaddle/Paddle/wiki/Regularization-Survey). - -## Proposal for Regularization in PaddlePaddle - -### Low-Level implementation - -In the new design, we propose to create new operations for regularization. For now, we can add 2 ops that correspond to the most frequently used regularizations: -- L2_regularization_op -- L1_regularization_op - -These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for [Activation Ops](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/accuracy_op.h). This abstraction pattern can make it very easy to implement new regularization schemes other than L1 and L2 norm penalties. - -The idea of building ops for regularization is in sync with the refactored Paddle philosophy of using operators to represent any computation unit. The way these ops will be added to the computation graph, will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) in Python API. - -### Computation Graph - -Below is an example of a really simple feed forward neural network. - -
                                - -The Python API will modify this computation graph to add regularization operators. The modified computation graph will look as follows: - -
                                -    -### Python API implementation for Regularization - -Using the low level ops, `L2_regularization_op` and `L1_regularization_op`, any user can add regularization to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support regularization. An example of such an API can be seen in [Keras](https://keras.io/regularizers/). As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since regularization is a property of parameters, it makes sense to create these in the layer functions. - -#### Creation of Regularization ops -There are two possibilities for creating the regularization ops: -1. We create these ops immediately while building the computation graph. -2. We add these ops in a lazy manner, just before the backward, similar to the way the optimization ops are added. - -The proposal is to add these ops in a lazy manner just before the backward pass. - -#### Storage of Regularization attributes - -Since we want to create the regularization ops in a lazy manner, the regularization attributes (type of regularization and weight of regularization penalty) can be stored as attributes of the [`Parameter`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/framework.py#L421) class. This is because regularization is a property of the parameters and storing regularization properties with Parameters also allows for shared parameters. - -#### High-level API - -In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers). - - - - - - diff --git a/develop/doc_cn/_sources/design/releasing_process.md.txt b/develop/doc_cn/_sources/design/releasing_process.md.txt deleted file mode 100644 index b9787261092f1f27377886152cb1596d9ff54188..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/releasing_process.md.txt +++ /dev/null @@ -1,90 +0,0 @@ -# PaddlePaddle发行规范 - -PaddlePaddle使用git-flow branching model做分支管理,使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。 - -PaddlePaddle每次发新的版本,遵循以下流程: - -1. 从`develop`分支派生出新的分支,分支名为`release/版本号`。例如,`release/0.10.0` -1. 将新分支的版本打上tag,tag为`版本号rc.Patch号`。第一个tag为`0.10.0rc1`,第二个为`0.10.0rc2`,依次类推。 -1. 对这个版本的提交,做如下几个操作: - * 使用Regression Test List作为检查列表,测试本次release的正确性。 - * 如果失败,记录下所有失败的例子,在这个`release/版本号`分支中,修复所有bug后,Patch号加一,到第二步 - * 修改`python/setup.py.in`中的版本信息,并将`istaged`字段设为`True`。 - * 编译这个版本的python wheel包,并发布到pypi。 - * 由于pypi.python.org目前遵循[严格的命名规范PEP 513](https://www.python.org/dev/peps/pep-0513),在使用twine上传之前,需要重命名wheel包中platform相关的后缀,比如将`linux_x86_64`修改成`manylinux1_x86_64`。 - * pypi上的package名称为paddlepaddle和paddlepaddle_gpu,如果要上传GPU版本的包,需要修改build/python/setup.py中,name: "paddlepaddle_gpu"并重新打包wheel包:`python setup.py bdist_wheel`。 - * 上传方法: - ``` - cd build/python - pip install twine - twine upload dist/[package to upload] - ``` - * 编译这个版本的Docker发行镜像,发布到dockerhub。如果失败,修复Docker编译镜像问题,Patch号加一,返回第二步 -1. 第三步完成后,将`release/版本号`分支合入master分支,并删除`release/版本号`分支。将master分支的合入commit打上tag,tag为`版本号`。同时再将`master`分支合入`develop`分支。最后删除`release/版本号`分支。 -1. 协同完成Release Note的书写 - - -需要注意的是: - -* `release/版本号`分支一旦建立,一般不允许再从`develop`分支合入`release/版本号`。这样保证`release/版本号`分支功能的封闭,方便测试人员测试PaddlePaddle的行为。 -* 在`release/版本号`分支存在的时候,如果有bugfix的行为,需要将bugfix的分支同时merge到`master`, `develop`和`release/版本号`这三个分支。 - -## 发布wheel包到pypi - -使用[PaddlePaddle CI](https://paddleci.ngrok.io/project.html?projectId=Manylinux1&tab=projectOverview) -完成自动化二进制编译,参考下图,选择需要发布的版本(通常包含一个CPU版本和一个GPU版本),点击"run"右侧的"..."按钮,可以 -弹出下面的选择框,在第二个tab (Changes)里选择需要发布的分支,这里选择0.11.0,然后点击"Run Build"按钮。等待编译完成后 -可以在此页面的"Artifacts"下拉框中找到生成的3个二进制文件,分别对应CAPI,`cp27m`和`cp27mu`的版本。然后按照上述的方法 -使用`twine`工具上传即可。 - - - -* 注:CI环境使用 https://github.com/PaddlePaddle/buildtools 这里的DockerImage作为编译环境以支持更多的Linux - 发型版,如果需要手动编译,也可以使用这些镜像。这些镜像也可以从 https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/ 下载得到。 -* pypi不支持覆盖上传,所以一个版本号的wheel包发布之后,不可以更改。下一个wheel包需要更新版本号才可以上传。 - -## 发布Docker镜像 - -上述PaddlePaddle CI编译wheel完成后会自动将Docker镜像push到DockerHub,所以,发布Docker镜像只需要对自动push的镜像打上 -版本号对应的tag即可: - -1. 进入 https://hub.docker.com/r/paddlepaddle/paddle/tags/ 查看latest tag的更新时间是否在上述编译wheel包完成后是否最新。 -1. 执行 `docker pull paddlepaddle/paddle:[latest tag]`,latest tag可以是latest或latest-gpu等。 -1. 执行 `docker tag paddlepaddle/paddle:[latest tag] paddlepaddle/paddle:[version]` -1. 执行 `docker push paddlepaddle/paddle:[version]` - -## PaddlePaddle 分支规范 - -PaddlePaddle开发过程使用[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,并适应github的特性做了一些区别。 - -* PaddlePaddle的主版本库遵循[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范。其中: - * `master`分支为稳定(stable branch)版本分支。每一个`master`分支的版本都是经过单元测试和回归测试的版本。 - * `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试,但并没有经过回归测试。 - * `release/版本号`分支为每一次Release时建立的临时分支。在这个阶段的代码正在经历回归测试。 - -* 其他用户的fork版本库并不需要严格遵守[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,但所有fork的版本库的所有分支都相当于特性分支。 - * 建议,开发者fork的版本库使用`develop`分支同步主版本库的`develop`分支 - * 建议,开发者fork的版本库中,再基于`develop`版本fork出自己的功能分支。 - * 当功能分支开发完毕后,向PaddlePaddle的主版本库提交`Pull Reuqest`,进而进行代码评审。 - * 在评审过程中,开发者修改自己的代码,可以继续在自己的功能分支提交代码。 - -* BugFix分支也是在开发者自己的fork版本库维护,与功能分支不同的是,BugFix分支需要分别给主版本库的`master`、`develop`与可能有的`release/版本号`分支,同时提起`Pull Request`。 - -## PaddlePaddle回归测试列表 - -本列表说明PaddlePaddle发版之前需要测试的功能点。 - -### PaddlePaddle Book中所有章节 - -PaddlePaddle每次发版本首先要保证PaddlePaddle Book中所有章节功能的正确性。功能的正确性包括验证PaddlePaddle目前的`paddle_trainer`训练和纯使用`Python`训练模型正确性。 - -| | 新手入门章节 | 识别数字 | 图像分类 | 词向量 | 情感分析 | 语意角色标注 | 机器翻译 | 个性化推荐 | -| --- | --- | --- | --- | --- | --- | --- | --- | --- | -| API.V2 + Docker + GPU | | | | | | | | | -| API.V2 + Docker + CPU | | | | | | | | | -| `paddle_trainer` + Docker + GPU | | | | | | | | | -| `paddle_trainer` + Docker + CPU | | | | | | | | | -| API.V2 + Ubuntu + GPU | | | | | | | | | -| API.V2 + Ubuntu + CPU | | | | | | | | | -| `paddle_trainer` + Ubuntu + GPU | | | | | | | | | -| `paddle_trainer` + Ubuntu + CPU | | | | | | | | | diff --git a/develop/doc_cn/_sources/design/scope.md.txt b/develop/doc_cn/_sources/design/scope.md.txt deleted file mode 100644 index 4da76eebb74abcd26ec2b8671399e6bc4fb58574..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/scope.md.txt +++ /dev/null @@ -1,124 +0,0 @@ -# Design of Scope in Paddle - -## Overview - -Scope is an important concept in programming languages, which defines a program region that a set of bindings between names and entities applies. In a specific scope, a valid name is uniquely associated with an entity, such as a variable. And in another scope, this name may refer to other entity or nothing at all. It clearly restricts the visibility and validity of names in a program. Hence **Scope** is introduced to PaddlePaddle to manage variables in context. But different from the original abstract concept, Scope now becomes an object with two important attributes: - -- Scope is an association of a name to variable. -- Variables in a parent scope can be retrieved from local scope. - -A detailed explanation of these two attributes goes as following. - - -## Scope is an association of a name to variable. - -Scope is an association of a name to variable. All variables belong to `Scope`. You need to specify a scope to run a Net, i.e., `net.Run(&scope)`. One net can run in different scopes and update different variable in the scope. - - -1. Scope only contains a map of a name to variable. - - All parameters, data, states in a Net should be variables and stored inside a scope. Each op should get inputs and outputs to do computation from a scope, such as data buffer, state (momentum) etc. - -1. Variable can only be created by Scope and a variable can only be got from Scope. User cannot create or get a variable outside a scope. This is a constraints of our framework, and will keep our framework simple and clear. - -1. Scope only contains methods that are used to Create and Get Variables. Scope do not contain Operators and have no information to run them. - `Net` is designed to drive the computation and Scope only contains a map of variables. There is no computation logic inside a `Scope`. Scope just handles the lifetime management of variables. - - `Create` is used to create a Variable by its name and add the mapping relation. - - `Get` is used to find a Variable by name. - -1. Every variable only belongs to one certain Scope. - - Variable can not belong to many scopes. If you want to use variables from parent scope, you can use `parent scope`. - -1. Scope should destruct all Variables inside it when itself is destructed. User can never store `Variable` pointer somewhere else. - - Because Variable can only be got from Scope. When destroying Scope, we also need to destroy all the Variables in it. If user store `Variable` pointer to private data member or some global variable, the pointer will be an invalid pointer when associated `Scope` is destroyed. - -```cpp -class Scope { - public: - Variable* Var(const std::string& name); - const Variable* FindVar(const std::string& name) const; - - private: - std::unordered_map> vars_; -}; -``` - - -## Parent scope and local scope - -Just like [scope](https://en.wikipedia.org/wiki/Scope_(computer_science)) in programming languages, `Scope` in the neural network can also be a local scope. There are two attributes about local scope. - -1. We can create local variables in a local scope. When that local scope is destroyed, all local variables should also be destroyed. -2. Variables in a parent scope can be retrieved from local scopes of that parent scope, i.e., when user get a variable from a scope, it will try to search this variable in current scope. If there is no such variable in the local scope, `scope` will keep searching from its parent, until the variable is found or there is no parent. - -```cpp -class Scope { - public: - Scope(const std::shared_ptr& scope): parent_(scope) {} - - Variable* FindVar(const std::string& name) const { - auto it = vars_.find(name); - if (it != vars_.end()) { - return it->second.get(); - } else if (parent_ != nullptr) { - return parent_->FindVar(name); - } else { - return nullptr; - } - } - - private: - std::shared_ptr parent_ {nullptr}; -}; -``` - -In `Scope` class, there is a private data member called `parent_`. `parent_` is a smart pointer to its parent scope. When user `Get` a variable by its `name`, the `name` will be searched inside the current scope. If the variable cannot be found locally and parent scope is not a `nullptr`, the variable will be searched inside that parent scope. `parent_` pointer's default value is `nullptr`. It means that the scope is a global scope when `parent_` is nullptr. - -A local scope is very useful when we implement Recurrent Neural Network. Each timestep of an RNN should be a `Net`. Each `Net` of timestep (`StepNet` for short) should use an independent local scope. Just like variables in a while loop is inside a local scope in programming languages. By using a single `StepNet` and changing local scope, we can implement an RNN easily. - -# Interface Design - -```cpp -class Variable { - private: - Variable() = default; - friend class Scope; -}; - -class Scope { - private: - Scope(const std::shared_ptr& parent = nullptr); - - public: - static std::shared_ptr Create(const std::shared_ptr& parent = nullptr); - - // return nullptr if not found. - Variable* FindVar(const std::string& name) const; - - // return if already contains same name variable. - Variable* Var(const std::string& name); - - private: - std::shared_ptr parent_; - std::unordered_map> vars_; -}; -``` -## Only scope can create a variable - -To ensure `only scope can create a variable`, we should mark `Variable`'s constructor as a private member function, and Scope is a friend class of Variable. And then only `Var` can construct `Variable`. - -## When scope destroyed, all variables inside this scope should be destroyed together - -The scope hold unique pointers for all variables. User can `FindVar` from scope, but he should not hold this pointer as a member variable. Because when scope is destroyed, all variables inside this scope will be destroyed together. - -## Sharing a parent scope - -Local scope contains a `parent_` pointer. It is a linked-list for scopes. Using a `shared_ptr` because when a local scope is using, its parents cannot be destroyed. - -Also, as the parent scope is a `shared_ptr`, we can only `Create()` a scope shared pointer. We cannot construct a scope variable, because it cannot be passed to other scope as `parent` pointer. - -## Orthogonal interface - -`FindVar` will return `nullptr` when `name` is not found. It can be used as `Contains` method. `Var` will return an `Error` when there is a name conflict locally. Combine `FindVar` and `Var`, we can implement `Var` easily. diff --git a/develop/doc_cn/_sources/design/selected_rows.md.txt b/develop/doc_cn/_sources/design/selected_rows.md.txt deleted file mode 100644 index 1a98839a957612b91b2276b58818623ecc62d1d5..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/selected_rows.md.txt +++ /dev/null @@ -1,74 +0,0 @@ -# Design Doc: Selected Rows - -`SelectedRows` is a type of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure: - -```cpp -class SelectedRows { - private: - vector rows_; - Tensor value_; - int height_; -}; -``` - -The field `height_` is the first dimension of `SelectedRows`. The `rows` are the indices of the non-zero rows of `SelectedRows`. The `value_` field is an N-dim tensor of shape `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`. - -Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be: - -``` -x = SelectedRow { - rows = [73, 84], - value = [[1, 2], [3,4]] -} -``` - - -## SelectedRows in Protobuf - -`SelectedRows` is a type of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time because the `rows_` and `value_` are dependent on the training data. -So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description. - -```proto -message TensorDesc { - required DataType data_type = 1; - repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480] -} - -message LodTensorDesc { - required TensorDesc tensor = 1; - optional int lod_level = 2; -} - -message VarDesc { - required string name = 1; - enum VarType { - LOD_TENSOR = 0; - SELECTED_ROWS = 1; - } - required VarType type = 2; - optional LodTensorDesc lod_desc = 3; - optional TensorDesc selected_rows_desc = 4; - optional bool persistable = 5 [ default = false ]; -} -``` - -## InferShape for Selected Rows - -Just like `LoD` information, `InferShape` method will infer the output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor. - -For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following - -```cpp -void TableLookupGrad::InferShape(context) { - ... - context.SetDataType("Embedding.Grad", kSelectedRows); -} -``` - - -## Sparse Operators - -There are several operators that need to be written to support `SelectedRows`. These are: - -1. Operators which generate `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`. -2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`. diff --git a/develop/doc_cn/_sources/design/simple_op_design.md.txt b/develop/doc_cn/_sources/design/simple_op_design.md.txt deleted file mode 100644 index c7aeed7f9b4637e1c29d530f37b42d12500af82f..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/simple_op_design.md.txt +++ /dev/null @@ -1,202 +0,0 @@ -## Interaction between C++ and Python - -Users employ API in Python to describe their own network, however, the network construction actually happens in C++. so Protobuf is introduced to send the message between Python and C++. - -The Interaction between Python and C++ can be simplified as two steps: - -1. C++ tells Python how many Ops there are, and what parameter do users need to offer to initialize a new Op. Python then builds API for each Op at compile time. - -2. Users invoke APIs built by Python and provide necessary parameters. These parameters will be sent to C++ for finishing the Op construction task. - -### Message from C++ to Python - -We define a Protobuf message class `OpProto` to hold message needed in the first step. What should an `OpProto` contain? This question is equivalent to “What message do we need to offer, to build a Python API which is legal and user oriented and can use to describe a whole Op.” - -Following message are necessary: - -1. Op's name, and its simple comment. -2. Input and output variable number; each variable's name, type, and comment. -3. Op's attributes; each attribute includes name, type, comment, **default value** and **value range**. - -So `OpProto` can be defined as follows: - -```proto -enum AttrType { - INT = 1; - FLOAT = 2; - STRING = 3; - INTS = 4; - FLOATS = 5; - STRINGS = 6; -}; - -message AttrValue { - AttrType type = 1; - optional int iv = 2; - optional float fv = 3; - optional string sv = 4; - repeated int ivs = 5; - repeated float fvs = 6; - repeated string svs = 7; -}; - -message AttrProto { - required string name = 1; - required string comment = 2; - required AttrType type = 3; -}; - -message VarProto { - required string name = 1; - required string comment = 2; - required bool is_tensor = 3; -}; - -message OpProto { - repeated VarProto inputs = 1; - repeated VarProto outputs = 2; - repeated AttrProto attrs = 3; - required string type = 4; - required string comment = 5; -}; -``` - -To generate Python code automatically: - -```python -def create_python_ops_creatation_functions(): - op_protos = paddle.framework.OpRegistry.get_all_op_proto() - for type_name in op_protos: - op_proto = op_protos[type_name] - def __impl__(**kwargs): # User must use key word args in Paddle API - inputs = [kwargs.get(ipt.name, "") for ipt in op_proto.inputs] - outputs = [kwargs.get(opt.name, "") for opt in op_proto.outputs] - attrs = [cast_to_op_attr(attr, kwargs.get(attr.name, None)) for attr in op_proto.attrs] - opdesc = (input, outputs, type_name, attrs) - return paddle.framework.OpRegistry.CreateOp(opdesc) - __impl__.__doc__ = create_doc_string(op_proto) - globals()[type_name] = __impl__ - -create_python_ops_creatation_functions() -``` - -### Message from Python to C++ - -To hold message needed in the above second step, we define Protobuf message class `OpDesc`. It is used to hold user-specified parameters in Op describing. - -```proto -message OpDesc { - required string type = 1; - repeated string inputs = 2; - repeated string outputs = 3; - map attrs = 4; -}; -``` - -## OpProto Register - -Every Op has its own `OpProto`. For using convenience, we need to register them and record all their messages. For each `Op` class, we define a corresponding `OpMaker` class, in whose constructor we implement the `OpProto`'s building process. `OpMaker`'s constructor will be invoked by another function `OpRegistry::RegisterOp()`. - -```cpp -class OpProtoMaker { -public: - OpProtoMaker(OpProto* proto): proto_(proto) {} -protected: - OpProto* proto_; - void AddInput(const std::string& name, const std::string& desc) {...} - void AddAttr(const std::string& name, const std::string& desc, TypeId type) {...} - void AddComment(const std::string& comment) { ... } -}; - -class OpRegistry { -public: - using OpCreator = std::function; - - template - static void RegisterOp(const std::string& name) { - gCreators_[name] = [](const OpDesc& desc) { - return new OpType(desc); - }; - OpProto& opProto = gProtos_[name]; - OpMaker()(&opProto); - } - - static map gCreators_; - static map gProtos_; -}; - -template -class OpRegister { - public: - OpRegister(std::string type) { - OpRegistry::RegisterOp(type); - } -}; - -#define REGISTER_OP(op_class, op_maker_class, type_name) \ - class op_class##Register { \ - private: \ - const static OpRegister<#op_class, #op_maker_class> reg; \ - }; \ - const Register op_class##Register::reg(#type_name); - -class CosineOp { -// ... -} - -struct CosineOpProtoMaker : public OpProtoMaker { - CosineOpProtoMaker(OpProto* proto) : OpProtoMaker(proto) { - AddInput("input", "input of cosine op"); - AddAttr("scale", "scale of cosine op", float).Default(1.0).GreaterThan(0.0); - AddType("cos"); - AddComment("This is cos op"); - } -} - -REGISTER_OP(CosineOp, CosineOpProtoMaker, cos); -``` - -In `REGISTER_OP(CosineOp, CosineOpProtoMaker, cos)`, we register not only `CosineOp` but also `CosineOpProto`. As fields of `CosineOpProto`, the default value and value range of `scale` are also registered here. - -## Python API - -Python APIs are divided into two types, high-level API and low-level API. - -### High-Level API - -High-level API is called by users directly, so it should keep its style consistent with existing V2 APIs. - -Here is a sample about how a define a fc layer: - -```python -hd = fc_layer(input=data, size=56, with_bias=True, activation="sigmoid"); -``` - -`hd` is the output of `fc_layer` and it's a `variable`. It can be further sent into other layers as input. - -The definition of `fc_layer()`: - -```python -def fc_layer(input, size, with_bias, activation): - attr_map = {"size":size} - check_attrs(attr_map) - w = make_variable('w') - if with_bias: - b = make_variable('b') - else: - b = None - fc_output = make_variable('fc_output'); - fc_op(input, w, b, fc_output, attr_map) - act_output = make_variable('sigmod_output'); - if activation == "sigmod": - sigmod_op(fc_output, act_output); - elif: - # ... - return act_output; -``` - -### Low Leval API - -In above sample, `fc_op` and `sigmod_op` are low-level API. They build `OpDesc` and invoke corresponding C++ code. - -*TODO* diff --git a/develop/doc_cn/_sources/design/speech/deep_speech_2.md.txt b/develop/doc_cn/_sources/design/speech/deep_speech_2.md.txt deleted file mode 100644 index cfdc4d6df04344c70d3334626bd38eca997c31ff..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/speech/deep_speech_2.md.txt +++ /dev/null @@ -1,168 +0,0 @@ -# DeepSpeech2 on PaddlePaddle: Design Doc - -We are planning to build Deep Speech 2 (DS2) \[[1](#references)\], a powerful Automatic Speech Recognition (ASR) engine, on PaddlePaddle. For the first-stage plan, we have the following short-term goals: - -- Release a basic distributed implementation of DS2 on PaddlePaddle. -- Contribute a chapter of Deep Speech to PaddlePaddle Book. - -Intensive system optimization and low-latency inference library (details in \[[1](#references)\]) are not yet covered in this first-stage plan. - -## Table of Contents - -- [Tasks](#tasks) -- [Task Dependency](#task-dependency) -- [Design Details](#design-details) - - [Overview](#overview) - - [Row Convolution](#row-convolution) - - [Beam Search With CTC and LM](#beam-search-with-ctc-and-lm) -- [Future Work](#future-work) -- [References](#references) - -## Tasks - -We roughly break down the project into 14 tasks: - -1. Develop an **audio data provider**: - - Json filelist generator. - - Audio file format transformer. - - Spectrogram feature extraction, power normalization etc. - - Batch data reader with SortaGrad. - - Data augmentation (optional). - - Prepare (one or more) public English data sets & baseline. -2. Create a **simplified DS2 model configuration**: - - With only fixed-length (by padding) audio sequences (otherwise need *Task 3*). - - With only bidirectional-GRU (otherwise need *Task 4*). - - With only greedy decoder (otherwise need *Task 5, 6*). -3. Develop to support **variable-shaped** dense-vector (image) batches of input data. - - Update `DenseScanner` in `dataprovider_converter.py`, etc. -4. Develop a new **lookahead-row-convolution layer** (See \[[1](#references)\] for details): - - Lookahead convolution windows. - - Within-row convolution, without kernels shared across rows. -5. Build KenLM **language model** (5-gram) for beam search decoder: - - Use KenLM toolkit. - - Prepare the corpus & train the model. - - Create infererence interfaces (for Task 6). -6. Develop a **beam search decoder** with CTC + LM + WORDCOUNT: - - Beam search with CTC. - - Beam search with external custom scorer (e.g. LM). - - Try to design a more general beam search interface. -7. Develop a **Word Error Rate evaluator**: - - update `ctc_error_evaluator`(CER) to support WER. -8. Prepare internal dataset for Mandarin (optional): - - Dataset, baseline, evaluation details. - - Particular data preprocessing for Mandarin. - - Might need cooperating with the Speech Department. -9. Create **standard DS2 model configuration**: - - With variable-length audio sequences (need *Task 3*). - - With unidirectional-GRU + row-convolution (need *Task 4*). - - With CTC-LM beam search decoder (need *Task 5, 6*). -10. Make it run perfectly on **clusters**. -11. Experiments and **benchmarking** (for accuracy, not efficiency): - - With public English dataset. - - With internal (Baidu) Mandarin dataset (optional). -12. Time **profiling** and optimization. -13. Prepare **docs**. -14. Prepare PaddlePaddle **Book** chapter with a simplified version. - -## Task Dependency - -Tasks parallelizable within phases: - -Roadmap | Description | Parallelizable Tasks ------------ | :------------------------------------ | :-------------------- -Phase I | Simplified model & components | *Task 1* ~ *Task 8* -Phase II | Standard model & benchmarking & profiling | *Task 9* ~ *Task 12* -Phase III | Documentations | *Task13* ~ *Task14* - -Issue for each task will be created later. Contributions, discussions and comments are all highly appreciated and welcomed! - -## Design Details - -### Overview - -Traditional **ASR** (Automatic Speech Recognition) pipelines require great human efforts devoted to elaborately tuning multiple hand-engineered components (e.g. audio feature design, accoustic model, pronuncation model and language model etc.). **Deep Speech 2** (**DS2**) \[[1](#references)\], however, trains such ASR models in an end-to-end manner, replacing most intermediate modules with only a single deep network architecture. With scaling up both the data and model sizes, DS2 achieves a very significant performance boost. - -Please read Deep Speech 2 \[[1](#references),[2](#references)\] paper for more background knowledge. - -The classical DS2 network contains 15 layers (from bottom to top): - -- **Two** data layers (audio spectrogram, transcription text) -- **Three** 2D convolution layers -- **Seven** uni-directional simple-RNN layers -- **One** lookahead row convolution layers -- **One** fully-connected layers -- **One** CTC-loss layer - -
                                -
                                -Figure 1. Archetecture of Deep Speech 2 Network. -
                                - -We don't have to persist on this 2-3-7-1-1-1 depth \[[2](#references)\]. Similar networks with different depths might also work well. As in \[[1](#references)\], authors use a different depth (e.g. 2-2-3-1-1-1) for final experiments. - -Key ingredients about the layers: - -- **Data Layers**: - - Frame sequences data of audio **spectrogram** (with FFT). - - Token sequences data of **transcription** text (labels). - - These two type of sequences do not have the same lengthes, thus a CTC-loss layer is required. -- **2D Convolution Layers**: - - Not only temporal convolution, but also **frequency convolution**. Like a 2D image convolution, but with a variable dimension (i.e. temporal dimension). - - With striding for only the first convlution layer. - - No pooling for all convolution layers. -- **Uni-directional RNNs** - - Uni-directional + row convolution: for low-latency inference. - - Bi-direcitional + without row convolution: if we don't care about the inference latency. -- **Row convolution**: - - For looking only a few steps ahead into the feature, instead of looking into a whole sequence in bi-directional RNNs. - - Not nessesary if with bi-direcitional RNNs. - - "**Row**" means convolutions are done within each frequency dimension (row), and no convolution kernels shared across. -- **Batch Normalization Layers**: - - Added to all above layers (except for data and loss layer). - - Sequence-wise normalization for RNNs: BatchNorm only performed on input-state projection and not state-state projection, for efficiency consideration. - - -Required Components | PaddlePaddle Support | Need to Develop -:------------------------------------- | :-------------------------------------- | :----------------------- -Data Layer I (Spectrogram) | Not supported yet. | TBD (Task 3) -Data Layer II (Transcription) | `paddle.data_type.integer_value_sequence` | - -2D Convolution Layer | `paddle.layer.image_conv_layer` | - -DataType Converter (vec2seq) | `paddle.layer.block_expand` | - -Bi-/Uni-directional RNNs | `paddle.layer.recurrent_group` | - -Row Convolution Layer | Not supported yet. | TBD (Task 4) -CTC-loss Layer | `paddle.layer.warp_ctc` | - -Batch Normalization Layer | `paddle.layer.batch_norm` | - -CTC-Beam search | Not supported yet. | TBD (Task 6) - -### Row Convolution - -TODO by Assignees - -### Beam Search with CTC and LM - -
                                -
                                -Figure 2. Algorithm for CTC Beam Search Decoder. -
                                - -- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] as shown in Figure 2, with two important modifications for the ambiguous parts: - - 1) in the iterative computation of probabilities, the assignment operation is changed to accumulation for one prefix may comes from different paths; - - 2) the if condition ```if l^+ not in A_prev then``` after probabilities' computation is deprecated for it is hard to understand and seems unnecessary. -- An **external scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a white space appended in English decoding and any character appended in Mandarin decoding. -- Such external scorer consists of language model, word count or any other custom scorers. -- The **language model** is built from Task 5, with parameters should be carefully tuned to achieve minimum WER/CER (c.f. Task 7) -- This decoder needs to perform with **high efficiency** for the convenience of parameters tuning and speech recognition in reality. - - -## Future Work - -- Efficiency Improvement -- Accuracy Improvement -- Low-latency Inference Library -- Large-scale benchmarking - -## References - -1. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](http://proceedings.mlr.press/v48/amodei16.pdf). ICML 2016. -2. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595). arXiv:1512.02595. -3. Awni Y. Hannun, etc. [First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs](https://arxiv.org/abs/1408.2873). arXiv:1408.2873 diff --git a/develop/doc_cn/_sources/design/support_new_device.md.txt b/develop/doc_cn/_sources/design/support_new_device.md.txt deleted file mode 100644 index 8983df900460127fc130043c52373dab505363ba..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/support_new_device.md.txt +++ /dev/null @@ -1,240 +0,0 @@ -# Design Doc: Supporting new Device/Library - -## Background - -Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries in a flexible and efficient manner. - -On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example, Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library. - -On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are hardware independent. - -So, how to support a new Device/Library in Fluid becomes a challenge. - - -## Basic: Integrate A New Device/Library - -For a general overview of fluid, please refer to the [overview doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md). - -There are mainly three parts that we have to consider while integrating a new device/library: - -- Place and DeviceContext: indicate the device id and manage hardware resources - -- Memory and Tensor: malloc/free data on certain device - -- Math Functor and OpKernel: implement computing unit on certain devices/libraries - -### Place and DeviceContext - -Please note that device and computing library are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices. - -#### Place -Fluid uses class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent the device memory where data is located. If we add another device, we have to add the corresponding `DevicePlace`. - -``` - | CPUPlace -Place --| CUDAPlace - | FPGAPlace -``` - -And `Place` is defined as follows: - -``` -typedef boost::variant Place; -``` - -#### DeviceContext - -Fluid uses class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in different libraries, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`. - - -``` - /-> CPUDeviceContext -DeviceContext ----> CUDADeviceContext - \-> FPGADeviceContext -``` - -An example of Nvidia GPU is as follows: - -- DeviceContext - - -``` -class DeviceContext { - virtual Place GetPlace() const = 0; -}; -``` - - -- CUDADeviceContext - - -``` -class CUDADeviceContext : public DeviceContext { - Place GetPlace() const override { return place_; } -private: - CUDAPlace place_; - cudaStream_t stream_; - cublasHandle_t cublas_handle_; - std::unique_ptr eigen_device_; // binds with stream_ -}; -``` - -### Memory and Tensor - - -#### memory module - -Fluid provides the following [memory interfaces](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36): - -``` -template -void* Alloc(Place place, size_t size); - -template -void Free(Place place, void* ptr); - -template -size_t Used(Place place); -``` - -To implement these interfaces, we have to implement MemoryAllocator for different Devices. - - -#### Tensor - -[Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36) holds data with some shape in a specific Place. - -```cpp -class Tensor { - public: - /*! Return a pointer to mutable memory block. */ - template - inline T* data(); - - /** - * @brief Return a pointer to mutable memory block. - * @note If not exist, then allocation. - */ - template - inline T* mutable_data(platform::Place place); - - /** - * @brief Return a pointer to mutable memory block. - * - * @param[in] dims The dimensions of the memory block. - * @param[in] place The place of the memory block. - * - * @note If not exist, then allocation. - */ - template - inline T* mutable_data(DDim dims, platform::Place place); - - /*! Resize the dimensions of the memory block. */ - inline Tensor& Resize(const DDim& dims); - - /*! Return the dimensions of the memory block. */ - inline const DDim& dims() const; - - private: - /*! holds the memory block if allocated. */ - std::shared_ptr holder_; - - /*! points to dimensions of memory block. */ - DDim dim_; -}; -``` - -`Placeholder` is used to delay memory allocation; that is, we can first define a tensor, using `Resize` to configurate its shape, and then call `mutuable_data` to allocate the actual memory. - -```cpp -paddle::framework::Tensor t; -paddle::platform::CPUPlace place; -// set size first -t.Resize({2, 3}); -// allocate memory on CPU later -t.mutable_data(place); -``` - - - -### Math Functor and OpKernel - -Fluid implements computing units based on different DeviceContexts. Some computing units are shared between operators. This common part will be put in operators/math directory as basic Functors. - -Let's take [MaxOutFunctor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27) as an example: - -The interface is defined in the header file. - -``` -template -class MaxOutFunctor { - public: - void operator()(const DeviceContext& context, const framework::Tensor& input, - framework::Tensor* output, int groups); -}; -``` - -CPU implementation is in .cc file - -``` -template -class MaxOutFunctor { - public: - void operator()(const platform::CPUDeviceContext& context, - const framework::Tensor& input, framework::Tensor* output, - int groups) { - ... - } -}; -``` - -CUDA implementation is in .cu file - -``` -template -class MaxOutFunctor { - public: - void operator()(const platform::CUDADeviceContext& context, - const framework::Tensor& input, framework::Tensor* output, - int groups) { - ... - } -}; -``` - - -We first obtain the computing handle from a concrete DeviceContext and then compute on tensors. - -The implementation of `OpKernel` is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map. - -Fluid provides different register interfaces in op_registry.h - - -Let's take [Crop](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.cc#L134) operator as an example: - -In .cc file: - -``` -REGISTER_OP_CPU_KERNEL(crop, ops::CropKernel); -REGISTER_OP_CPU_KERNEL( - crop_grad, ops::CropGradKernel); -``` - -In .cu file: - -``` -REGISTER_OP_CUDA_KERNEL(crop, ops::CropKernel); -REGISTER_OP_CUDA_KERNEL( - crop_grad, ops::CropGradKernel); -``` - - -## Advanced topics: How to switch between different Device/Library - -Generally, we will implement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not suitable on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run on GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library. - - -For more details, please refer to following docs: - -- operator kernel type [doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md) -- switch kernel [doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md) diff --git a/develop/doc_cn/_sources/design/switch.md.txt b/develop/doc_cn/_sources/design/switch.md.txt deleted file mode 100644 index 827d0601c621e4a230de28e2baad8e196e69625e..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/switch.md.txt +++ /dev/null @@ -1,31 +0,0 @@ -### Design Doc: Switch - -### Background - -Many programming languages provide `switch` as a generalization of `if-elif-else`. We want to add it to Fluid. - -The following example shows the usage of `fluid.switch`. - -```python -a = fluid.Var(10) -b = fluid.Var(0) - -with switch() as switch: - with switch.case(fluid.less_equal(a, 10)): - fluid.print("Case 1") - with switch.case(fluid.larger(a, 0)): - fluid.print("Case 2") - with switch.default(): - fluid.print("Case 3") -``` - -### The Semantics - -1. A `switch` control-flow checks cases one-by-one. -1. The condition of each case is a boolean value, which is a scalar, and differs from the `fluid.if_else` control-flow, which condition could be a vector of boolean values. -1. It runs the first matched case, or the default case if there is one. -1. Once it matches a case, it runs the corresponding branch and only that branch. It's like there is a C's `break` keyword at the end of each case. - -The above program should print and print only "Case 1". - -The implementation of the backward pass of the `switch` control-flow is easier than the backward of the `if_else`, because `switch` runs at most one branch, whereas `if-else` could run more than one branches. diff --git a/develop/doc_cn/_sources/design/tensor_array.md.txt b/develop/doc_cn/_sources/design/tensor_array.md.txt deleted file mode 100644 index 37e4f7b90f94fa3eb015e733999cd84c96b2239c..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/tensor_array.md.txt +++ /dev/null @@ -1,271 +0,0 @@ -# Design for TensorArray -This design doc presents the necessity of a new C++ class `TensorArray`. -In addition to the very simple C++ implementation - -```c++ -class TensorArray { - public: - explicit TensorArray(const LoDTensor&); - explicit TensorArray(size_t size); - - private: - vector values_; -}; -``` - -We also need to expose it to PaddlePaddle's Python API, -because users would want to use it with our very flexible operators `WhileLoop`. -An example for a RNN based on dynamic operators is - -```python -input = pd.data(...) -num_steps = Var(12) - -TensorArray states(size=num_steps) -TensorArray step_inputs(unstack_from=input) -TensorArray step_outputs(size=num_steps) - -W = Tensor(...) -U = Tensor(...) -default_state = some_op() - -step = Var(1) - -wloop = paddle.create_whileloop(loop_vars=[step]) -with wloop.frame(): - wloop.break_if(pd.equal(step, num_steps) - pre_state = states.read(step-1, default_state) - step_input = step_inputs.read(step) - state = pd.sigmoid(pd.matmul(U, pre_state) + pd.matmul(W, step_input)) - states.write(step, state) - step_outputs.write(step, state) # output state - step.update(state+1) - -output = step_outputs.stack() -``` - -## Background -Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call `states[step_id]` will get the state in `step_id`th time step. - -An RNN can be implemented with the following pseudocode - -```c++ -Array states; -Array input_segments; -Array output_segments; -Parameter W, U; - -step = 1 -seq_len = 12 -while_loop { - if (step == seq_len) break; - states[step] = sigmoid(W * states[step-1] + U * input_segments[step]); - output_segments[step] = states[step] // take state as output - step++; -} -``` -According to the [RNN roadmap](https://github.com/PaddlePaddle/Paddle/issues/4561), there are several different RNNs that PaddlePaddle will eventually support. - -Currently, the basic RNN implementation supported by PaddlePaddle is the `recurrent_op` which takes tensors as input and splits them into `input_segments`. - - -Since a tensor cannot store variable-length sequences directly, PaddlePaddle implements the tensor with level of details (`LoDTensor` for short). -Segmenting the `LoDTensor` is much more complicated than splitting a tensor, that makes it necessary to refactor the `recurrent_op` with `LoDTensor` segmenting support. - -As the next step in RNN support, `dynamic_recurrent_op` should be introduced to handle inputs with variable-length sequences. - -The implementation is similar to `recurrent_op`. -The key difference is the way **the original input `LoDTensors` and outupts are split to get the `input_segments` and the `output_segments`.** - - -Though it can't be built over `recurrent_op` or `dynamic_recurrent_op` directly, -the logic behind splitting a tensor or a LoD tensor into `input_segments` remains the same. - -## Why `TensorArray` -The logic behind splitting the inputs to segments, states and outputs is similar and can be shared in a seperate module. - -The array of `states`, `input_segments` and `output_segments` would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes. - -So there should be an array-like container, which can store the segments of a tensor or LoD tensor. - -**This container can store an array of tensors and provides several methods to split a tensor or a LoD tensor** . -This is where the notion of `TensorArray` comes from. - -## Introduce TensorArray to uniform all the three RNNs -TensorArray as a new concept is borrowed from TensorFlow, -it is meant to be used with dynamic iteration primitives such as `while_loop` and `map_fn`. - -This concept can be used to support our new design of dynamic operations, and help to refactor some existing variant-sentence-related layers, -such as `recurrent_op`, `RecurrentGradientMachine`. - -In [our design for dynamic RNN](https://github.com/PaddlePaddle/Paddle/pull/4401), -`TensorArray` is used to segment inputs and store states in all time steps. -By providing some methods similar to a C++ array, -the definition of some state-based dynamic models such as RNN can be more natural and highly flexible. - -## Dynamic-operations on TensorArray - -`TensorArray` will be used directly when defining dynamic models, so some operators listed below should be implemented - -```python -# several helper operators for TensorArray -def tensor_array_stack(ta, tensor): - ''' - get a tensor array `ta`, return a packed `tensor`. - ''' - pass - -def tensor_array_unstack(tensor, ta): - ''' - get a `tensor`, unstack it and get a tensor array `ta`. - ''' - pass - -def tensor_array_write(ta, index, tensor, data_shared): - ''' - get a `tensor` and a scalar tensor `index`, write `tensor` into index-th - value of the tensor array `ta`. - `data_shared` is an attribute that specifies whether to copy or reference the tensors. - ''' - pass - -def tensor_array_read(ta, index, tensor): - ''' - get a tensor array `ta`, a scalar tensor `index`, read the index-th value of - `ta` and return as the `tensor`. - ''' - pass - -def tensor_array_size(ta, tensor): - ''' - get a tensor array `ta`, return the size of `ta` and return as the scalar `tensor`. - ''' - pass -``` - -It is trivial for users to use so many low-level operators, so some helper methods should be proposed in python wrapper to make `TensorArray` easier to use, -for example - -```python -class TensorArray: - def __init__(self, name): - self.name = name - self.desc = TensorArrayDesc() - - def stack(self, name=None): - ''' - Pack the values in a `TensorArray` into a tensor with rank one higher - than each tensor in `values`. - `stack` can be used to split tensor into time steps for RNN or whileloop. - - @name: str - the name of the variable to output. - ''' - tensor = Var(name) - tensor_array_stack(self.name, tensor) - return tensor - - def unstack(self, input): - ''' - Unpacks the given dimension of a rank-`R` tensor into rank-`(R-1)` tensors. - `unstack` can be used to concatenate all the time steps for RNN or whileloop. - - @input: str - the name of input tensor - ''' - tensor_array_unstack(tensor, self.name) - - def write(self, index, value, data_shared=True): - ''' - Write value into index of the TensorArray. - If `data_shared` is set to True, than the index-th value in TensorArray will - be shared with the tensor passed in. - - @index: str - name of a scalar tensor - @value: str - name of a tensor - @data_shared: bool - ''' - tensor_array_write(self.name, index, value, data_shared) - - def read(self, index, output): - ''' - Read the value at location `index` in the `TensorArray`. - - @index: str - name of a scalar tensor - @output: - name of a output variable - ''' - tensor_array_read(self.name, index, output) - - - def size(self, output): - ''' - Return the number of values. - - @output: str - name of a scalar tensor - ''' - tensor_array_size(self.name, output) -``` - -## LoDTensor-related Supports -The `RecurrentGradientMachine` in Paddle serves as a flexible RNN layer; it takes varience-length sequences as input, and output sequences too. - -Since each step of RNN can only take a tensor-represented batch of data as input, -some preprocess should be taken on the inputs such as sorting the sentences by their length in descending order and cut each word and pack to new batches. - -Such cut-like operations can be embedded into `TensorArray` as general methods called `unpack` and `pack`, -these two operations are similar to `stack` and `unstack` except that they operate on variable-length sequences formated as a LoD tensor rather than a tensor. - -Some definitions are like - -```python -def unpack(level): - ''' - Split LodTensor in some `level` and generate batches, if set `sort_by_length`, - will sort by length. - - Returns: - - a new `TensorArray`, whose values are LodTensors and represents batches - of data. - - an int32 Tensor, which stores the map from the new batch's indices to - original LoDTensor - ''' - pass - -def pack(level, indices_map): - ''' - Recover the original LoD-arranged LoDTensor with the values in a `TensorArray` - and `level` and `indices_map`. - ''' - pass -``` - -With these two methods, a varience-length sentence supported RNN can be implemented like - -```c++ -// input is the varient-length data -LodTensor sentence_input(xxx); -TensorArray ta; -Tensor indice_map; -Tensor boot_state = xxx; // to initialize rnn's first state -TensorArray::unpack(input, 1/*level*/, true/*sort_by_length*/, &ta, &indice_map); -TessorArray step_outputs; -TensorArray states; - -for (int step = 0; step = ta.size(); step++) { - auto state = states.read(step); - // rnnstep is a function which acts like a step of RNN - auto step_input = ta.read(step); - auto step_output = rnnstep(step_input, state); - step_outputs.write(step_output, true/*data_shared*/); -} - -// rnn_output is the final output of an rnn -LoDTensor rnn_output = ta.pack(ta, indice_map); -``` -the code above shows that by embedding the LoDTensor-related preprocess operations into `TensorArray`, -the implementation of a RNN that supports varient-length sentences is far more concise than `RecurrentGradientMachine` because the latter mixes all the codes together, hard to read and extend. diff --git a/develop/doc_cn/_sources/design/var_desc.md.txt b/develop/doc_cn/_sources/design/var_desc.md.txt deleted file mode 100644 index 6a45af1995463402ba9c65ddb51c6c8bb107f99e..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/design/var_desc.md.txt +++ /dev/null @@ -1,81 +0,0 @@ -## Background -PaddlePaddle divides the description of neural network computation into two stages: compile time and runtime. At compile time, the neural network computation is described as a `ProgramDesc` whereas at runtime an `Executor` interprets the `ProgramDesc` to compute the operations. - -PaddlePaddle uses proto message to describe compile time program because : - -1. The computation program description must be serializable and saved in a file. -1. During distributed training, the serialized program will be sent to multiple workers. It should also be possible to break the program into different components, each of which can be executed on a different worker. - -The computation `Program` consists of nested `Blocks`. Each `Block` will consist of data(i.e. `Variable`) and `Operations`. The concept to represent them is in the table below. - -| |compile time|runtime| -|---|---|---| -|Data|VarDesc(proto)|Variable(cpp)| -|Operation|OpDesc(proto)|Operator(cpp)| - - -## Definition of VarType - -A VarDesc should have a name, type and whether or not it is persistable. The are different kinds of variable types supported in PaddlePaddle, apart from the POD_Types like: `LOD_TENSOR`, `SELECTED_ROWS`, `FEED_MINIBATCH`, `FETCH_LIST`, `STEP_SCOPES`, `LOD_RANK_TABLE`, `LOD_TENSOR_ARRAY`, `PLACE_LIST`, `READER` and `CHANNEL`. These are declared inside `VarType`. A `VarDesc` then looks as the following: - -```proto -message VarDesc { - required string name = 1; - required VarType type = 2; - optional bool persistable = 3 [ default = false ]; -} -``` - -## Definition of TensorDesc - -```proto -message TensorDesc { - // Should only be PODType. Is enforced in C++ - required Type data_type = 1; - repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480] -} -``` - -The `Type` here comes from the enum defined inside of `VarType` : - -```proto -enum Type { - // Pod Types - BOOL = 0; - INT16 = 1; - INT32 = 2; - INT64 = 3; - FP16 = 4; - FP32 = 5; - FP64 = 6; - - // Other types that may need additional descriptions - LOD_TENSOR = 7; - SELECTED_ROWS = 8; - FEED_MINIBATCH = 9; - FETCH_LIST = 10; - STEP_SCOPES = 11; - LOD_RANK_TABLE = 12; - LOD_TENSOR_ARRAY = 13; - PLACE_LIST = 14; - READER = 15; - CHANNEL = 16; -} -``` - -A TensorDesc describes `SelectedRows` and `LoDTensor`. For details of `SelectedRows`, please reference [`SelectedRows`](./selected_rows.md). - -## Definition of LodTensorDesc - -```proto -message LoDTensorDesc { - required TensorDesc tensor = 1; - optional int32 lod_level = 2 [ default = 0 ]; -} -``` - -A LoDTensorDesc contains a tensor and a lod_level. - -## Definition of Variable in Python - -For Variable in Python, please reference [`Python API`](./python_api.md). diff --git a/develop/doc_cn/_sources/dev/index_cn.rst.txt b/develop/doc_cn/_sources/dev/index_cn.rst.txt index 487db868bb2a0a5383d56c3a723912d9fd5910b7..c488191b8174531905e44cb9443ee539d4cb1ed3 100644 --- a/develop/doc_cn/_sources/dev/index_cn.rst.txt +++ b/develop/doc_cn/_sources/dev/index_cn.rst.txt @@ -6,3 +6,4 @@ contribute_to_paddle_cn.md write_docs_cn.rst + new_layer_cn.rst diff --git a/develop/doc_cn/_sources/dev/new_layer_cn.rst.txt b/develop/doc_cn/_sources/dev/new_layer_cn.rst.txt index 75037e693b32f923ee7dc9dfec322495fe4ce10a..0ded1c262adad44f4df000ef2933c7b68050f2fc 100644 --- a/develop/doc_cn/_sources/dev/new_layer_cn.rst.txt +++ b/develop/doc_cn/_sources/dev/new_layer_cn.rst.txt @@ -1,6 +1,6 @@ -================ -实现新的网络层 -================ +================== +如何实现新的网络层 +================== 这份教程展示了如何在PaddlePaddle中实现一个自定义的网络层。在这里我们使用全连接层作为例子来展示实现新网络层所需要的四个步骤。 diff --git a/develop/doc_cn/_sources/dev/new_op_cn.md.txt b/develop/doc_cn/_sources/dev/new_op_cn.md.txt deleted file mode 100644 index 92996585674b46f45549b972b9f295503b1c7f8c..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/dev/new_op_cn.md.txt +++ /dev/null @@ -1,318 +0,0 @@ -# 如何写新的Operator - - - [概念简介](#概念简介) - - [实现C++类](#实现c类) - - [定义ProtoMaker类](#定义protomaker类) - - [定义Operator类](#定义operator类) - - [定义OpKernel类](#定义opkernel类) - - [注册Operator](#注册operator) - - [编译](#编译) - - [绑定Python](#绑定python) - - [实现单元测试](#实现单元测试) - - [前向Operator单测](#前向operator单测) - - [反向Operator单测](#反向operator单测) - - [编译和执行](#编译和执行) - - [注意事项](#注意事项) - - -## 概念简介 - -简单介绍需要用到基类,详细介绍请参考设计文档。 - -- `framework::OperatorBase`: Operator(简写,Op)基类。 -- `framework::OpKernel`: Op计算函数的基类,称作Kernel。 -- `framework::OperatorWithKernel`:继承自OperatorBase,Op有计算函数,称作有Kernel。 -- `class OpProtoAndCheckerMaker`:描述该Op的输入、输出、属性、注释,主要用于Python API接口生成 - -依据是否包含kernel,可以将Op分为两种:包含Kernel的Op和不包含kernel的Op,前者Op的定义继承自`OperatorWithKernel`,后者继承自`OperatorBase`。本教程主要介绍带Kernel的Op如何写,简单总结Op需要包含的内容如下: - - - 内容 | 定义位置 --------------- | :---------------------- -OpProtoMake定义 | `.cc`文件,Backward Op不需要定义OpProtoMake -Op定义 | `.cc`文件 -Kernel实现 | CPU、CUDA共享Kernel实现在`.h`文件中,否则,CPU 实现在`.cc`文件中,CUDA 实现在`.cu`文件中。 -注册Op | Op注册实现在`.cc`文件;Kernel注册CPU实现在`.cc`文件中,CUDA实现在`.cu`文件中 - - -实现新的op都添加至目录[paddle/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators)下,文件命名以`*_op.h`(如有) 、 `*_op.cc` 、`*_op.cu`(如有)结尾。**系统会根据文件名自动构建op和其对应的Python扩展。** - - -下面以矩阵乘操作,即[MulOp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc)为例来介绍如何写带Kernel的Operator。 - - -## 实现C++类 - - -### 定义ProtoMaker类 - -矩阵乘法的公式:$Out = X * Y$, 可见该计算由两个输入,一个输出组成。 - -首先定义`ProtoMaker`来描述该Op的输入、输出,并添加注释: - -```cpp -class MulOpMaker : public framework::OpProtoAndCheckerMaker { - public: - MulOpMaker(OpProto *proto, OpAttrChecker *op_checker) - : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("X", "(Tensor), 2D tensor of size (M x K)"); - AddInput("Y", "(Tensor), 2D tensor of size (K x N)"); - AddOutput("Out", "(Tensor), 2D tensor of size (M x N)"); - AddComment(R"DOC( -Two Element Mul Operator. -The equation is: Out = X * Y -)DOC"); - } -}; -``` - -[`MulOpMaker`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L43)继承自`framework::OpProtoAndCheckerMaker`,构造函数含有2个参数: - - - `framework::OpProto` : 前者存储Op的输入输出和参数属性,将用于Python API接口的生成。 - - `framework::OpAttrChecker` :后者用于检查参数属性的合法性。 - -构造函数里通过`AddInput`添加输入参数,通过`AddOutput`添加输出参数,通过`AddComment`添加Op的注释。这些函数会将对应内容添加到`OpProto`中。 - -上面的代码在`MulOp`中添加两个输入`X`和`Y`,添加了一个输出`Out`,并解释了各自含义,命名请遵守[命名规范](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md)。 - - -再以[`ScaleOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)为例: - -```cpp -template -class ScaleOpMaker : public framework::OpProtoAndCheckerMaker { - public: - ScaleOpMaker(OpProto *proto, OpAttrChecker *op_checker) - : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("X", "The input tensor of scale operator.").NotInGradient(); - AddOutput("Out", "The output tensor of scale operator.").NotInGradient(); - AddComment(R"DOC(Scale operator -The equation is: Out = scale*X -)DOC"); - AddAttr("scale", "scale of scale operator.").SetDefault(1.0); - } -}; -``` - -这个例子有两处不同: - -- `AddInput("X","...").NotInGradient()` : 表示`X`这个输入不参与`ScaleOp`对应的梯度Op计算之中,如果Op的某个输入不参与反向梯度的计算,请显示地调用`.NotInGradient()`进行设置。 - -- `AddAttr("scale", "...").SetDefault(1.0);` : 增加`scale`系数,作为参数属性,并且设置默认值为1.0。 - - -### 定义Operator类 - -下面的点实现了MulOp的定义: - -```cpp -class MulOp : public framework::OperatorWithKernel { - public: - using framework::OperatorWithKernel::OperatorWithKernel; - - protected: - void InferShape(const framework::InferShapeContext &ctx) const override { - auto dim0 = ctx.Input("X")->dims(); - auto dim1 = ctx.Input("Y")->dims(); - PADDLE_ENFORCE_EQ(dim0.size(), 2, - "input X(%s) should be a tensor with 2 dims, a matrix", - ctx.op_.Input("X")); - PADDLE_ENFORCE_EQ(dim1.size(), 2, - "input Y(%s) should be a tensor with 2 dims, a matrix", - ctx.op_.Input("Y")); - PADDLE_ENFORCE_EQ( - dim0[1], dim1[0], - "First matrix's width must be equal with second matrix's height."); - ctx.Output("Out")->Resize({dim0[0], dim1[1]}); - } -}; -``` - -[`MulOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L22)继承自`OperatorWithKernel`。`public`成员: - -```cpp -using framework::OperatorWithKernel::OperatorWithKernel; -``` - -这句表示使用基类`OperatorWithKernel`的构造函数,也可写成: - -```cpp -MulOp(const std::string &type, const framework::VariableNameMap &inputs, - const framework::VariableNameMap &outputs, - const framework::AttributeMap &attrs) - : OperatorWithKernel(type, inputs, outputs, attrs) {} -``` - -还需要重写`InferShape`接口。`InferShape`为const函数,不能修改Op的成员变量,参数为`const framework::InferShapeContext &ctx`,通过该参数可获取到输入输出以及属性。它的功能是: - - - 1). 做检查, 尽早报错:检查输入数据维度、类型等是否合法。 - - 2). 设置输出Tensor的形状。 - -通常`OpProtoMaker`和`Op`类的定义写在`.cc`文件中,和下面将要介绍的注册函数一起放在`.cc`中 - -### 定义OpKernel类 - -`MulKernel`继承自`framework::OpKernel`,带有下面两个模板参数: - -- `typename DeviceContext`: 表示设备类型,不同设备(CPU、CUDA)共享同一个Kernel时,需加该模板参数,不共享则不加,一个不共享的例子是[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。 - -- `typename T` : 表示数据类型,如`float`, `double`等。 - -需要为`MulKernel`类重写`Compute`接口。 -- `Compute`接受一个输入参数:`const framework::ExecutionContext& context`。 -- 与`InferShapeContext`相比,`ExecutionContext`增加了设备类型,同样可获取到输入输出和属性参数。 -- `Compute`函数里实现`OpKernel`的具体计算逻辑。 - -下面是 `MulKernel` `Compute`的实现: - - ```cpp - template - class MulKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* Y = context.Input("Y"); - auto* Z = context.Output("Out"); - Z->mutable_data(context.GetPlace()); - auto& device_context = context.template device_context(); - math::matmul(*X, false, *Y, false, 1, Z, 0, device_context); - } - }; - ``` - -需要注意:**不同设备(CPU、CUDA)共享一个Op定义,是否则共享同一个`OpKernel`,取决于`Compute`调用的函数是否支持不同设备。** - -`MulOp`的CPU、CUDA实现共享同一个`Kernel`。`OpKernel`不共享的例子可以参考:[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。 - -为了使`OpKernel`的计算过程书写更加简单,并且CPU、CUDA的代码可以复用,我们通常借助 Eigen unsupported Tensor模块来实现`Compute`接口。关于在PaddlePaddle中如何使用Eigen库,请参考[使用文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md)。 - - -到此,前向Op实现完成。接下来,需要在`.cc`文件中注册该op和kernel。 -反向Op类的定义,反向OpKernel的定义与前向Op类似,这里不再赘述。**但需注意反向Op没有`ProtoMaker`**。 - -### 注册Operator - -- 在`.cc`文件中注册前向、反向Op类,注册CPU Kernel。 - - ```cpp - namespace ops = paddle::operators; - REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad); - REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel); - REGISTER_OP_CPU_KERNEL(mul_grad, - ops::MulGradKernel); - ``` - - 在上面的代码中: - - - `REGISTER_OP` : 注册`ops::MulOp`类,类型名为`mul`,该类的`ProtoMaker`为`ops::MulOpMaker`,注册`ops::MulOpGrad`,类型名为`mul_grad`。 - - `REGISTER_OP_WITHOUT_GRADIENT` : 用于注册没有反向的Op。 - - `REGISTER_OP_CPU_KERNEL` :注册`ops::MulKernel`类,并特化模板参数为`paddle::platform::CPUPlace`和`float`类型,同理,注册`ops::MulGradKernel`类。 - - -- 在 `.cu`文件中注册CUDA Kernel。 - - 请注意,如果CUDA Kernel的实现基于Eigen unsupported模块,那么在 `.cu`的开始请加上宏定义 `#define EIGEN_USE_GPU`,代码示例如下: - - ```cpp - // if use Eigen unsupported module before include head files - #define EIGEN_USE_GPU - - namespace ops = paddle::operators; - REGISTER_OP_CUDA_KERNEL(mul, ops::MulKernel); - REGISTER_OP_CUDA_KERNEL(mul_grad, - ops::MulGradKernel); - ``` - -### 编译 - -运行下面命令可以进行编译: - -``` -make mul_op -``` - -## 绑定Python - -系统会对新增的op自动绑定Python,并链接到生成的lib库中。 - -## 实现单元测试 - -单测包括对比前向Op不同设备(CPU、CUDA)的实现、对比反向OP不同设备(CPU、CUDA)的实现、反向Op的梯度测试。下面介绍介绍[`MulOp`的单元测试](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py)。 - -### 前向Operator单测 - -Op单元测试继承自`OpTest`。各项更加具体的单元测试在`TestMulOp`里完成。测试Operator,需要: - -1. 在`setUp`函数定义输入、输出,以及相关的属性参数。 -2. 生成随机的输入数据。 -3. 在Python脚本中实现与前向operator相同的计算逻辑,得到输出值,与operator前向计算的输出进行对比。 -4. 反向计算已经自动集成进测试框架,直接调用相应接口即可。 - - - ```python - import unittest - import numpy as np - from op_test import OpTest - - - class TestMulOp(OpTest): - def setUp(self): - self.op_type = "mul" - self.inputs = { - 'X': np.random.random((32, 84)).astype("float32"), - 'Y': np.random.random((84, 100)).astype("float32") - } - self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])} - - def test_check_output(self): - self.check_output() - - def test_check_grad_normal(self): - self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5) - - def test_check_grad_ingore_x(self): - self.check_grad( - ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X")) - - def test_check_grad_ingore_y(self): - self.check_grad( - ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y')) - ``` - -上面的代码首先导入依赖的包,下面是对`setUp`函数中操作的重要变量的详细解释: - -- `self.op_type = "mul" ` : 定义类型,与operator注册时注册的类型一致。 -- `self.inputs` : 定义输入,类型为`numpy.array`,并初始化。 -- `self.outputs` : 定义输出,并在Python脚本中完成与operator同样的计算逻辑,返回Python端的计算结果。 - -### 反向operator单测 - -而反向测试中: -- `test_check_grad_normal`中调用`check_grad`使用数值法检测梯度正确性和稳定性。 - - 第一个参数`["X", "Y"]` : 指定对输入变量`X`、`Y`做梯度检测。 - - 第二个参数`"Out"` : 指定前向网络最终的输出目标变量`Out`。 - - 第三个参数`max_relative_error`:指定检测梯度时能容忍的最大错误值。 -- `test_check_grad_ingore_x`和`test_check_grad_ingore_y`分支用来测试只需要计算一个输入梯度的情况。 - - -### 编译和执行 - -`python/paddle/v2/framework/tests` 目录下新增的 `test_*.py` 单元测试会被自动加入工程进行编译。 - -请注意,**不同于Op的编译测试,运行单元测试测时需要编译整个工程**,并且编译时需要打开`WITH_TESTING`, 即`cmake paddle_dir -DWITH_TESTING=ON`。编译成功后,执行下面的命令来运行单元测试: - -```bash -make test ARGS="-R test_mul_op -V" -``` - -或者: - -```bash -ctest -R test_mul_op -``` - -## 注意事项 - -- 为每个Op创建单独的`*_op.h`(如有)、`*_op.cc`和`*_op.cu`(如有)。不允许一个文件中包含多个Op,这将会导致编译出错。 -- 注册Op时的类型名,需要和该Op的名字一样。即不允许在`A_op.cc`里面,注册`REGISTER_OP(B, ...)`等,这将会导致单元测试出错。 -- 如果Op没有实现CUDA Kernel,请不要创建空的`*_op.cu`,这将会导致单元测试出错。 -- 如果多个Op依赖一些共用的函数,可以创建非`*_op.*`格式的文件来存放,如`gather.h`文件。 diff --git a/develop/doc_cn/_sources/dev/use_eigen_cn.md.txt b/develop/doc_cn/_sources/dev/use_eigen_cn.md.txt deleted file mode 100644 index 1367323b71277984834d9d4f0d9bea0f69478479..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/dev/use_eigen_cn.md.txt +++ /dev/null @@ -1,146 +0,0 @@ -## 在Paddle中如何使用Eigen - -神经网络本质上是一个计算图,计算需要的数据存放在`Tensor`中,而计算过程是由`Operartor`来描述的。在执行时,`Operator`调用对应`OpKernel`中的`Compute`接口,实现对`Tensor`的操作。 - - -### Eigen Tensor模块 - -Eigen Tensor模块对element-wise计算提供了强大的支持,并且书写一份代码,可以同时在CPU、GPU执行。但Eigen Tensor是一个正在开发中的模块,因此可能测试不够完备,文档较少。 - -关于Eigen Tensor模块的详细介绍请参考[文档1](https://github.com/RLovelett/eigen/blob/master/unsupported/Eigen/CXX11/src/Tensor/README.md) 和[文档2](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md) - - -### paddle::framework::Tensor - -Paddle Tensor定义在framework目录下,其主要接口如下: - -```cpp -class Tensor { - public: - /*! Return a pointer to mutable memory block. */ - template - inline T* data(); - - /** - * @brief Return a pointer to mutable memory block. - * @note If not exist, then allocation. - */ - template - inline T* mutable_data(platform::Place place); - - /** - * @brief Return a pointer to mutable memory block. - * - * @param[in] dims The dimensions of the memory block. - * @param[in] place The place of the memory block. - * - * @note If not exist, then allocation. - */ - template - inline T* mutable_data(DDim dims, platform::Place place); - - /*! Resize the dimensions of the memory block. */ - inline Tensor& Resize(const DDim& dims); - - /*! Return the dimensions of the memory block. */ - inline const DDim& dims() const; - - private: - /*! holds the memory block if allocated. */ - std::shared_ptr holder_; - - /*! points to dimensions of memory block. */ - DDim dim_; -}; -``` - -`Placeholder`的作用是延迟分配内存,即我们可以先定义一个Tensor,然后使用Resize接口设置Tensor的大小,最后再调用mutable_data接口分配实际的内存。 - -```cpp -paddle::framework::Tensor t; -paddle::platform::CPUPlace place; -// set size first -t.Resize({2, 3}); -// allocate memory on CPU later -t.mutable_data(place); -``` - -### paddle::framework::Tensor使用样例 -下面以AddOp为例说明Tensor的使用过程: - -- InferShape - -在运行神经网络计算图时,我们先调用每个`Operator`的`InferShape`接口,根据输入Tensor的大小来设置输出Tensor的大小,`Resize`接口会被调用。 - -```cpp -void InferShape(const framework::InferShapeContext &ctx) const override { - PADDLE_ENFORCE_EQ(ctx.Input("X")->dims(), - ctx.Input("Y")->dims(), - "Two input of Add Op's dimension must be same."); - ctx.Output("Out")->Resize(ctx.Input("X")->dims()); -} -``` - - -- Run - -`Operator`的`Run`接口最终会调用对应`OpKernel`的`Compute`接口,在这时真正的分配内存,`mutable_data`接口会被调用。 - -```cpp -void Compute(const framework::ExecutionContext& context) const override { - auto* input0 = context.Input("X"); - auto* input1 = context.Input("Y"); - auto* output = context.Output("Out"); - - output->mutable_data(context.GetPlace()); - - auto x = EigenVector::Flatten(*input0); - auto y = EigenVector::Flatten(*input1); - auto z = EigenVector::Flatten(*output); - - auto place = context.GetEigenDevice(); - - z.device(place) = x + y; -} -``` - - -### paddle::framework::Tensor到EigenTensor的转换 - -如上一小节所示,在具体的计算中,我们需要先把输入Tensor和输出Tensor转换为Eigen支持的格式。我们在[eigen.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/eigen.h)中提供了一些全局函数用来实现paddle::framework::Tensor到EigenTensor/EigenMatrix/EigenVector/EigenScalar的转换。 - -以EigenTensor为例,做一个介绍 - -```cpp -Tensor t; -float* p = t.mutable_data(make_ddim({1, 2, 3}), platform::CPUPlace()); -for (int i = 0; i < 1 * 2 * 3; i++) { - p[i] = static_cast(i); -} - -EigenTensor::Type et = EigenTensor::From(t); -``` - -From是EigenTensor模板提供的一个接口,可以实现从paddle::framework::Tensor到对EigenTensor的转换。由于Tensor的rank是模板参数,因此在转换时需要显示的指定。 - -在Eigen中,不同rank的Tensor是不同类型,Vector是rank为1的Tensor。需要额外注意的是,EigenVector::From方法是把paddle中的一维Tensor转为Eigen的一维Tensor,在这里用EigenVector来表示;而EigenVector::Flatten方法是把paddle中的一个Tensor进行reshape操作,压扁成为Eigen的一维Tensor,类型仍然为EigenVector。 - -更多的转换方法请参考eigen_test.cc中的[单元测试](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/eigen_test.cc)。 - - - -### 实现计算 - -当需要完成计算时,我们需要等式左边的EigenTensor调用device接口。在这里需要注意的是,这里的EigenTensor之间的运算只是改变了原有Tensor中的数据,而不会改变原有Tensor的shape信息。 - -```cpp -auto x = EigenVector::Flatten(*input0); -auto y = EigenVector::Flatten(*input1); -auto z = EigenVector::Flatten(*output); -auto place = context.GetEigenDevice(); -z.device(place) = x + y; -``` - -在这段代码中,input0/input1/output可以是任意维度的Tensor。我们调用了EigenVector的Flatten接口,把任意维度的Tensor转为了一维的EigenVector。而在计算结束之后,input0/input1/output的原有shape信息不变。如果想改变原有Tensor的shape信息,可以调用Resize接口进行改变。 - -由于Eigen Tensor模块的文档较少,我们可以参考TensorFlow的[kernels](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/kernels)模块下的相关`OpKernel`的计算代码。 diff --git a/develop/doc_cn/_sources/howto/optimization/cpu_profiling_cn.md.txt b/develop/doc_cn/_sources/howto/optimization/cpu_profiling_cn.md.txt deleted file mode 100644 index d59be670c2b33b64d9b6f96b53f50e5bf9f0613b..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/howto/optimization/cpu_profiling_cn.md.txt +++ /dev/null @@ -1,155 +0,0 @@ -此教程会介绍如何使用Python的cProfile包、Python库yep、Google perftools来进行性能分析 (profiling) 与调优(performance tuning)。 - -Profling 指发现性能瓶颈。系统中的瓶颈可能和程序员开发过程中想象的瓶颈相去甚远。Tuning 指消除瓶颈。性能优化的过程通常是不断重复地 profiling 和 tuning。 - -PaddlePaddle 用户一般通过调用 Python API 编写深度学习程序。大部分 Python API 调用用 C++ 写的 libpaddle.so。所以 PaddlePaddle 的性能分析与调优分为两个部分: - -* Python 代码的性能分析 -* Python 与 C++ 混合代码的性能分析 - - -## Python代码的性能分析 - -### 生成性能分析文件 - -Python标准库中提供了性能分析的工具包,[cProfile](https://docs.python.org/2/library/profile.html)。生成Python性能分析的命令如下: - -```bash -python -m cProfile -o profile.out main.py -``` - -其中 `main.py` 是我们要分析的程序,`-o`标识了一个输出的文件名,用来存储本次性能分析的结果。如果不指定这个文件,`cProfile`会打印到标准输出。 - -### 查看性能分析文件 - -`cProfile` 在main.py 运行完毕后输出`profile.out`。我们可以使用[`cprofilev`](https://github.com/ymichael/cprofilev)来查看性能分析结果。`cprofilev`是一个Python的第三方库。使用它会开启一个HTTP服务,将性能分析结果以网页的形式展示出来: - -```bash -cprofilev -a 0.0.0.0 -p 3214 -f profile.out main.py -``` - -其中`-a`标识HTTP服务绑定的IP。使用`0.0.0.0`允许外网访问这个HTTP服务。`-p`标识HTTP服务的端口。`-f`标识性能分析的结果文件。`main.py`标识被性能分析的源文件。 - -用Web浏览器访问对应网址,即可显示性能分析的结果: - -``` - ncalls tottime percall cumtime percall filename:lineno(function) - 1 0.284 0.284 29.514 29.514 main.py:1() - 4696 0.128 0.000 15.748 0.003 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/executor.py:20(run) - 4696 12.040 0.003 12.040 0.003 {built-in method run} - 1 0.144 0.144 6.534 6.534 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/__init__.py:14() -``` - -每一列的含义是: - -| 列名 | 含义 | -| --- | --- | -| ncalls | 函数的调用次数 | -| tottime | 函数实际使用的总时间。该时间去除掉本函数调用其他函数的时间 | -| percall | tottime的每次调用平均时间 | -| cumtime | 函数总时间。包含这个函数调用其他函数的时间 | -| percall | cumtime的每次调用平均时间 | -| filename:lineno(function) | 文件名, 行号,函数名 | - - -### 寻找性能瓶颈 - -通常`tottime`和`cumtime`是寻找瓶颈的关键指标。这两个指标代表了某一个函数真实的运行时间。 - -将性能分析结果按照tottime排序,效果如下: - -```text - 4696 12.040 0.003 12.040 0.003 {built-in method run} - 300005 0.874 0.000 1.681 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/dataset/mnist.py:38(reader) - 107991 0.676 0.000 1.519 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:219(__init__) - 4697 0.626 0.000 2.291 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp) - 1 0.618 0.618 0.618 0.618 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/__init__.py:1() -``` - -可以看到最耗时的函数是C++端的`run`函数。这需要联合我们第二节`Python`与`C++`混合代码的性能分析来进行调优。而`sync_with_cpp`函数的总共耗时很长,每次调用的耗时也很长。于是我们可以点击`sync_with_cpp`的详细信息,了解其调用关系。 - -```text -Called By: - - Ordered by: internal time - List reduced from 4497 to 2 due to restriction <'sync_with_cpp'> - -Function was called by... - ncalls tottime cumtime -/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp) <- 4697 0.626 2.291 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp) -/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp) <- 4696 0.019 2.316 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:487(clone) - 1 0.000 0.001 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:534(append_backward) - - -Called: - - Ordered by: internal time - List reduced from 4497 to 2 due to restriction <'sync_with_cpp'> -``` - -通常观察热点函数间的调用关系,和对应行的代码,就可以了解到问题代码在哪里。当我们做出性能修正后,再次进行性能分析(profiling)即可检查我们调优后的修正是否能够改善程序的性能。 - - - -## Python与C++混合代码的性能分析 - -### 生成性能分析文件 - -C++的性能分析工具非常多。常见的包括`gprof`, `valgrind`, `google-perftools`。但是调试Python中使用的动态链接库与直接调试原始二进制相比增加了很多复杂度。幸而Python的一个第三方库`yep`提供了方便的和`google-perftools`交互的方法。于是这里使用`yep`进行Python与C++混合代码的性能分析 - -使用`yep`前需要安装`google-perftools`与`yep`包。ubuntu下安装命令为 - -```bash -apt update -apt install libgoogle-perftools-dev -pip install yep -``` - -安装完毕后,我们可以通过 - -```bash -python -m yep -v main.py -``` - -生成性能分析文件。生成的性能分析文件为`main.py.prof`。 - -命令行中的`-v`指定在生成性能分析文件之后,在命令行显示分析结果。我们可以在命令行中简单的看一下生成效果。因为C++与Python不同,编译时可能会去掉调试信息,运行时也可能因为多线程产生混乱不可读的性能分析结果。为了生成更可读的性能分析结果,可以采取下面几点措施: - -1. 编译时指定`-g`生成调试信息。使用cmake的话,可以将CMAKE_BUILD_TYPE指定为`RelWithDebInfo`。 -2. 编译时一定要开启优化。单纯的`Debug`编译性能会和`-O2`或者`-O3`有非常大的差别。`Debug`模式下的性能测试是没有意义的。 -3. 运行性能分析的时候,先从单线程开始,再开启多线程,进而多机。毕竟单线程调试更容易。可以设置`OMP_NUM_THREADS=1`这个环境变量关闭openmp优化。 - -### 查看性能分析文件 - -在运行完性能分析后,会生成性能分析结果文件。我们可以使用[`pprof`](https://github.com/google/pprof)来显示性能分析结果。注意,这里使用了用`Go`语言重构后的`pprof`,因为这个工具具有web服务界面,且展示效果更好。 - -安装`pprof`的命令和一般的`Go`程序是一样的,其命令如下: - -```bash -go get github.com/google/pprof -``` - -进而我们可以使用如下命令开启一个HTTP服务: - -```bash -pprof -http=0.0.0.0:3213 `which python` ./main.py.prof -``` - -这行命令中,`-http`指开启HTTP服务。`which python`会产生当前Python二进制的完整路径,进而指定了Python可执行文件的路径。`./main.py.prof`输入了性能分析结果。 - -访问对应的网址,我们可以查看性能分析的结果。结果如下图所示: - -![result](./pprof_1.png) - - -### 寻找性能瓶颈 - -与寻找Python代码的性能瓶颈类似,寻找Python与C++混合代码的性能瓶颈也是要看`tottime`和`cumtime`。而`pprof`展示的调用图也可以帮助我们发现性能中的问题。 - -例如下图中, - -![kernel_perf](./pprof_2.png) - -在一次训练中,乘法和乘法梯度的计算占用2%-4%左右的计算时间。而`MomentumOp`占用了17%左右的计算时间。显然,`MomentumOp`的性能有问题。 - -在`pprof`中,对于性能的关键路径都做出了红色标记。先检查关键路径的性能问题,再检查其他部分的性能问题,可以更有次序的完成性能的优化。 diff --git a/develop/doc_cn/_sources/howto/optimization/gpu_profiling_cn.rst.txt b/develop/doc_cn/_sources/howto/optimization/gpu_profiling_cn.rst.txt index 0239eef4f118197bf92f9fc7d323be58344b0ded..25bcaccb6975bc21fba2e8c5843da15c69948d72 100644 --- a/develop/doc_cn/_sources/howto/optimization/gpu_profiling_cn.rst.txt +++ b/develop/doc_cn/_sources/howto/optimization/gpu_profiling_cn.rst.txt @@ -55,7 +55,7 @@ above profilers. :code:`paddle/math/test` 目录中的 :code:`test_GpuProfiler` 就是用于展示上述分析工具的用法。 -.. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp +.. literalinclude:: ../../../../paddle/math/tests/test_GpuProfiler.cpp :language: c++ :lines: 137-151 :linenos: @@ -83,7 +83,7 @@ program crashes when CPU version of PaddlePaddle invokes them. 1. 加入 :code:`REGISTER_TIMER_INFO` 和 :code:`printAllStatus` 函数(如高亮部分)。 - .. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp + .. literalinclude:: ../../../../paddle/math/tests/test_GpuProfiler.cpp :language: c++ :lines: 137-151 :emphasize-lines: 8-12,14 @@ -130,7 +130,7 @@ nvprof 工具 1. 将 :code:`REGISTER_GPU_PROFILER` 函数加到代码中(参考强调部分)。 - .. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp + .. literalinclude:: ../../../../paddle/math/tests/test_GpuProfiler.cpp :language: c++ :lines: 137-151 :emphasize-lines: 6-7 diff --git a/develop/doc_cn/_sources/howto/read_source.md.txt b/develop/doc_cn/_sources/howto/read_source.md.txt deleted file mode 100644 index edf46aff8c6cc9fc01d26c6453b3a8123238ef91..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/howto/read_source.md.txt +++ /dev/null @@ -1,67 +0,0 @@ -# PaddlePaddle Fluid Source Code Overview - -Examples: https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/tests/book - -Core: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework - -Operator: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators - -Memory: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory - -Platform: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/platform - -# Compile Time - -The following **defines** the NN. The definition goes into this [protocol buffer](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto). - -```python -x = fluid.layers.data(name='x', shape=[13], dtype='float32') -y = fluid.layers.data(name='y', shape=[1], dtype='float32') - -y_predict = fluid.layers.fc(input=x, size=1, act=None) -cost = fluid.layers.square_error_cost(input=y_predict, label=y) -avg_cost = fluid.layers.mean(x=cost) - -sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) -sgd_optimizer.minimize(avg_cost) -``` - -- Variables: `x`, `y`, `y_predict`, `cost` and `avg_cost`. [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/framework.py#) -- Layers: `fluid.layers.data`, `fluid.layers.fc` and `fluid.layers.mean` are layers. [Python](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/layers) - - Every Layer has one or more operators and variables/parameters - - All the operators are defined at [`paddle/operators/`](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators). Other worth-looking files: - - Base class: [`paddle/framework/operator.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h) - - Operator Registration: [`paddle/framework/op_registry.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h) - - Operator Lookup: [`paddle/framework/op_info.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_info.h) -- Optimizer: `fluid.optimizer.SGD`. It does the following - - Add backward operators. [[Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/backward.py)] - - Add optimizer operators. [[Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/optimizer.py)] - -# Run Time - -The following **evaluates** the NN. Instantiates all the variables, operators. - -```python -place = fluid.CPUPlace() -feeder = fluid.DataFeeder(place=place, feed_list=[x, y]) -exe = fluid.Executor(place) - -# Allocate memory. Initialize Parameter. -exe.run(fluid.default_startup_program()) - -# Allocate memory. Do computation. -exe.run(fluid.default_main_program(), - feed=feeder.feed(data), - fetch_list=[avg_cost]) -``` - -- Place: `place`. one of CPU, GPU or FPGA. [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h) - - The device handle are at [paddle/platform/device_context.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h) -- Executor: `fluid.Executor(place)`. [[Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/executor.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.cc)] - - Feeds the data: `feed=feeder.feed(data)` - - Evaluates all the operators - - Fetches the result: `fetch_list=[avg_cost]` -- Other worth looking files: - - Scope: [paddle/framework/scope.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/scope.h). Where all the variables live - - Variable: [paddle/framework/variable.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h). Where all the data (most likely tensors) live - - Tensor: [paddle/framework/tensor.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h). Where we allocate memory through [`paddle/memory/`](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory) diff --git a/develop/doc_cn/_sources/mobile/cross_compiling_for_android_cn.md.txt b/develop/doc_cn/_sources/mobile/cross_compiling_for_android_cn.md.txt deleted file mode 100644 index cdd6917239371a660d0df05bb623f0b94f8f11a3..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/mobile/cross_compiling_for_android_cn.md.txt +++ /dev/null @@ -1,187 +0,0 @@ -# Android平台编译指南 - -用户可通过如下两种方式,交叉编译Android平台上适用的PaddlePaddle库: - -- [基于Docker容器的编译方式](#基于docker容器的编译方式) -- [基于Linux交叉编译环境的编译方式](#基于linux交叉编译环境的编译方式) - -## 基于Docker容器的编译方式 -Docker能在所有主要操作系统(包括Linux,Mac OS X和Windows)上运行,因此,使用基于Docker容器的编译方式,用户可在自己熟悉的开发平台上编译Android平台上适用的PaddlePaddle库。 - -### 构建PaddlePaddle的Android开发镜像 -我们把PaddlePaddle的交叉编译环境打包成一个镜像,称为开发镜像,里面涵盖了交叉编译Android版PaddlePaddle库需要的所有编译工具。 - -```bash -$ git clone https://github.com/PaddlePaddle/Paddle.git -$ cd Paddle -$ docker build -t username/paddle-android:dev . -f Dockerfile.android -``` - -用户也可以使用PaddlePaddle提供的官方开发镜像: - -```bash -$ docker pull paddlepaddle/paddle:latest-dev-android -``` - -对于国内用户,我们提供了加速访问的镜像源: - -```bash -$ docker pull docker.paddlepaddlehub.com/paddle:latest-dev-android -``` - -### 编译PaddlePaddle C-API库 -构建好开发镜像后,即可使用开发镜像来编译Android版PaddlePaddle C-API库。 -Android的Docker开发镜像向用户提供两个可配置的参数: - - -- - - - - - - - - - - - - - - - - - - - - - - -
                                ArgumentOptional ValuesDefault
                                ANDROID_ABIarmeabi-v7a, arm64-v8aarmeabi-v7a
                                ANDROID_API>= 1621
                                - -- 编译`armeabi-v7a`,`Android API 21`的PaddlePaddle库 - -```bash -$ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=armeabi-v7a" -e "ANDROID_API=21" username/paddle-android:dev -``` - -- 编译`arm64-v8a`,`Android API 21`的PaddlePaddle库 - -```bash -$ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=arm64-v8a" -e "ANDROID_API=21" username/paddle-android:dev -``` - -执行上述`docker run`命令时,容器默认执行[paddle/scripts/docker/build_android.sh](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build_android.sh)脚本。该脚本中记录了交叉编译Android版PaddlePaddle库常用的CMake配置,并且会根据`ANDROID_ABI`和`ANDROID_API`自动构建独立工具链、进行编译和安装。由于arm64架构要求Android API不小于21。因此当`ANDROID_ABI=arm64-v8a`,`ANDROID_API<21`时,Docker容器中将默认使用`Android API 21`的编译工具链。用户可以参考下文[配置交叉编译参数](#配置交叉编译参数)章节,根据个人的需求修改定制Docker容器所执行的脚本。编译安装结束之后,PaddlePaddle的C-API库将被安装到`$PWD/install_android`目录,所依赖的第三方库同时也被安装到`$PWD/install_android/third_party`目录。 - -## 基于Linux交叉编译环境的编译方式 -本文档将以Linux x86-64平台为例,介绍交叉编译Android平台上适用的PaddlePaddle库的方法和步骤。 - -### 准备交叉编译环境 - -从源码交叉编译PaddlePaddle,用户需要提前准备好交叉编译环境。Android平台上使用的C/C++交叉编译工具链为[Android NDK](https://developer.android.com/ndk/downloads/index.html?hl=zh-cn),用户可自行前往下载预编译好的版本,也可通过以下命令获取: - -```bash -wget -q https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip -unzip -q android-ndk-r14b-linux-x86_64.zip -``` - -Android NDK中包含了所有Android API级别、所有架构(arm/arm64/x86/mips)需要用到的编译工具和系统库。用户可根据自己的编译目标架构、所需支持的最低Android API级别,构建[独立工具链](https://developer.android.google.cn/ndk/guides/standalone_toolchain.html?hl=zh-cn)。 - -- 构建`armeabi-v7a`、 `Android API 21`的独立工具链: - -```bash -your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \ - --arch=arm --platform=android-21 --install-dir=your/path/to/arm_standalone_toolchain -``` - -此命令将在`your/path/to/arm_standalone_toolchain`目录生成一套独立编译工具链,面向架构为32位ARM架构,支持的最小的Android API级别为21,支持编译器`arm-linux-androideabi-gcc (GCC) 4.9`和`clang 3.8`。 - -- 构建`arm64-v8a`、 `Android API 21`的独立工具链: - -```bash -your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \ - --arch=arm64 --platform=android-21 --install-dir=your/path/to/arm64_standalone_toolchain -``` - -此命令将在`your/path/to/arm64_standalone_toolchain`目录生成一套独立编译工具链,面向架构为64位ARM64架构,支持的最小Android API级别为21,支持编译器`arm-linux-androideabi-gcc (GCC) 4.9`和`clang 3.8`。 - -### 配置交叉编译参数 - -CMake系统对交叉编译提供了支持[cmake-toolchains](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling)。为了简化cmake配置,PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/android.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/android.cmake),以提供一些默认的编译器和编译参数相关配置。注意,从CMake 3.7版本开始,CMake官方对Android平台的交叉编译提供了通用的支持。PaddlePaddle若检测到用户使用的CMake版本不低于3.7时,将会将用户传进来的配置参数传递CMake系统,交由CMake系统本身来处理。有关参数配置的详细说明见[cmake-toolchains](https://cmake.org/cmake/help/v3.7/manual/cmake-toolchains.7.html#cross-compiling)。 - -交叉编译Android版本的PaddlePaddle库时,有一些必须配置的参数: -- `CMAKE_SYSTEM_NAME`,CMake编译的目标平台,必须设置为`Android`。在设置`CMAKE_SYSTEM_NAME=Android`后,PaddlePaddle的CMake系统才认为是在交叉编译Android系统的版本,并自动编译PaddlePaddle所需的所有第三方库。此外,还会强制设置一些PaddlePaddle参数的值(`WITH_GPU=OFF`、`WITH_AVX=OFF`、`WITH_PYTHON=OFF`、`WITH_RDMA=OFF`、`WITH_MKL=OFF`、`WITH_GOLANG=OFF`)。 -- `WITH_C_API`,必须设置为`ON`。在Android平台上只支持使用C-API来预测。 -- `WITH_SWIG_PY`,必须设置为`OFF`。在Android平台上不支持通过swig调用来训练或者预测。 - -Android平台可选配置参数: - -- `ANDROID_STANDALONE_TOOLCHAIN`,独立工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动推导和设置需要使用的交叉编译器、sysroot、以及Android API级别;否则,用户需要在cmake时手动设置这些值。无默认值。 -- `ANDROID_TOOLCHAIN`,目标工具链。可设置`gcc/clang`,默认值为`clang`。 - - CMake 3.7以上,将会始终使用`clang`工具链;CMake 3.7以下,可设置`ANDROID_TOOLCHAIN=gcc`以使用`gcc`工具链。 - - Android官方提供的`clang`编译器要求系统支持`GLIBC 2.15`以上。 -- `ANDROID_ABI`,目标架构ABI。目前支持`armeabi-v7a`和`arm64-v8a`,默认值为`armeabi-v7a`。 -- `ANDROID_NATIVE_API_LEVEL`,工具链的Android API级别。若没有显式设置,PaddlePaddle将根据`ANDROID_STANDALONE_TOOLCHAIN`的值自动推导得到。 -- `ANROID_ARM_MODE`,是否使用ARM模式。 - - `ANDROID_ABI=armeabi-v7a`时,可设置`ON/OFF`,默认值为`ON`; - - `ANDROID_ABI=arm64-v8a`时,不需要设置。 -- `ANDROID_ARM_NEON`,是否使用NEON指令。 - - `ANDROID_ABI=armeabi-v7a`时,可设置`ON/OFF`,默认值为`ON`; - - `ANDROID_ABI=arm64-v8a`时,不需要设置。 - -其他配置参数: - -- `USE_EIGEN_FOR_BLAS`,是否使用Eigen库进行矩阵计算。可设置`ON/OFF`,默认值为`OFF`。 -- `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量`CC/CXX`的值;若环境变量`CC/CXX`没有设置,则设置成`cc/c++`编译器。 - -常用的cmake配置如下: - -```bash -cmake -DCMAKE_SYSTEM_NAME=Android \ - -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm_standalone_toolchain \ - -DANDROID_ABI=armeabi-v7a \ - -DANDROID_ARM_NEON=ON \ - -DANDROID_ARM_MODE=ON \ - -DUSE_EIGEN_FOR_BLAS=ON \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_C_API=ON \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -``` -cmake -DCMAKE_SYSTEM_NAME=Android \ - -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm64_standalone_toolchain \ - -DANDROID_ABI=arm64-v8a \ - -DUSE_EIGEN_FOR_BLAS=OFF \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_C_API=ON \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -用户还可根据自己的需求设置其他编译参数。 - -- 设置`CMAKE_BUILD_TYPE`为`MinSizeRel`,最小化生成的库的大小。 -- 设置`CMAKE_BUILD_TYPE`为`Release`,获得最快的执行速度, -- 用户亦可以通过手动设置`CMAKE_C/CXX_FLAGS`来影响PaddlePaddle的编译过程。 - -**性能TIPS**,为了达到最快的计算速度,在CMake参数配置上,有以下建议: - -- 设置`CMAKE_BUILD_TYPE`为`Release` -- 使用`clang`编译工具链 -- `armeabi-v7a`时,设置`USE_EIGEN_BLAS=ON`,使用Eigen进行矩阵计算;`arm64-v8a`时,设置`USE_EIGEN_FOR_BLAS=OFF`,使用OpenBLAS进行矩阵计算 - -### 编译和安装 - -CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。 - -```bash -make -make install -``` - -注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。 - -执行完安装命令后,`your/path/to/install`目录中会包含`include`、`lib`和`third_party`目录,其中`include`中包含C-API的头文件,`lib`中包含若干个不同Android ABI的PaddlePaddle库,`third_party`中包含所依赖的所有第三方库。自此,PaddlePaddle的已经安装完成,用户可将`your/path/to/install`目录下的生成文件用于深度学习相关Android App中,调用方法见C-API文档。 diff --git a/develop/doc_cn/_sources/mobile/cross_compiling_for_ios_cn.md.txt b/develop/doc_cn/_sources/mobile/cross_compiling_for_ios_cn.md.txt deleted file mode 100644 index d5196d9a4c93c7692d2a624ec7d0650e32806338..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/mobile/cross_compiling_for_ios_cn.md.txt +++ /dev/null @@ -1,117 +0,0 @@ -# iOS平台编译指南 -交叉编译iOS平台上适用的PaddlePaddle库,需要在MacOS系统上进行。本文的将介绍在MacOS上,从源码交叉编译iOS平台上适用的PaddlePaddle库。 - -## 准备交叉编译环境 -Apple官方为iOS开发提供了完整的交叉编译工具和集成开发环境,用户从App Store下载安装Xcode即可。也可自行前往官网下载,[Xcode](https://developer.apple.com/cn/xcode/)。安装完成之后,可在命令行执行`xcodebuild -version`,判断是否安装成功。 - -```bash -$ xcodebuild -version -Xcode 9.0 -Build version 9A235 -``` - -## 配置交叉编译参数 - -PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/ios.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/ios.cmake),以提供一些默认的编译器和编译参数配置。 - -交叉编译iOS版本的PaddlePaddle库时,有一些必须配置的参数: - -- `CMAKE_SYSTEM_NAME`,CMake编译的目标平台,必须设置为`iOS`。在设置`CMAKE_SYSTEM_NAME=iOS`后,PaddlePaddle的CMake系统会自动编译所有的第三方依赖库,并且强制设置一些PaddlePaddle参数的值(`WITH_C_API=ON`、`WITH_GPU=OFF`、`WITH_AVX=OFF`、`WITH_PYTHON=OFF`、`WITH_RDMA=OFF`)。 -- `WITH_C_API`,是否编译C-API预测库,必须设置为ON。在iOS平台上只支持使用C-API来预测。 -- `WITH_SWIG_PY`,必须设置为`OFF`。在iOS平台上不支持通过swig调用来训练或者预测。 - -iOS平台可选配置参数: - -- `IOS_PLATFORM`,可设置为`OS`(默认值)或`SIMULATOR`。 - - `OS`,构建目标为`arm`架构的iPhone或者iPad等物理设备。 - - `SIMULATOR`,构建目标为`x86`架构的模拟器平台。 -- `IOS_ARCH`,目标架构。针对不同的`IOS_PLATFORM`,可设置的目标架构如下表所示,默认编译所有架构: - - - - - - - - - - - - - - - - - - - - - - -
                                IOS_PLATFORMIOS_ARCH
                                OSarmv7, armv7s, arm64
                                SIMULATORi386, x86_64
                                - -- `IOS_DEPLOYMENT_TARGET`,最小的iOS部署版本,默认值为`7.0`。 -- `IOS_ENABLE_BITCODE`,是否使能[Bitcode](https://developer.apple.com/library/content/documentation/IDEs/Conceptual/AppDistributionGuide/AppThinning/AppThinning.html#//apple_ref/doc/uid/TP40012582-CH35-SW3),可设置`ON/OFF`,默认值为`ON`。 -- `IOS_USE_VECLIB_FOR_BLAS`,是否使用[vecLib](https://developer.apple.com/documentation/accelerate/veclib)框架进行BLAS矩阵计算,可设置`ON/OFF`,默认值为`OFF`。 -- `IOS_DEVELOPMENT_ROOT`,`Developer`目录,可显式指定为`/path/to/platform/Developer`。若未显式指定,PaddlePaddle将会根据`IOS_PLATFORM`自动选择`Xcode`对应`platform`的`Developer`目录。 -- `IOS_SDK_ROOT`,所使用`SDK`的根目录,可显式指定为`/path/to/platform/Developer/SDKs/SDK`。若未显式指定,PaddlePaddle将会自动选择`IOS_DEVELOPMENT_ROOT`目录下最新的`SDK`版本。 - -其他配置参数: - -- `USE_EIGEN_FOR_BLAS`,是否使用Eigen库进行矩阵计算,在`IOS_USE_VECLIB_FOR_BLAS=OFF`时有效。可设置`ON/OFF`,默认值为`OFF`。 -- `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。默认值为环境变量`CC/CXX`的值;若环境变量`CC/CXX`未设置,则使用`cc/c++`编译器。 - -常用的cmake配置如下: - -```bash -cmake -DCMAKE_SYSTEM_NAME=iOS \ - -DIOS_PLATFORM=OS \ - -DIOS_ARCH="armv7;arm64" \ - -DIOS_ENABLE_BITCODE=ON \ - -DIOS_USE_VECLIB_FOR_BLAS=ON \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_C_API=ON \ - -DWITH_TESTING=OFF \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -```bash -cmake -DCMAKE_SYSTEM_NAME=iOS \ - -DIOS_PLATFORM=SIMULATOR \ - -DIOS_ARCH="x86_64" \ - -DIOS_USE_VECLIB_FOR_BLAS=ON \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_C_API=ON \ - -DWITH_TESTING=OFF \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -用户还可根据自己的需求设置其他编译参数。比如希望最小化生成库的大小,可以设置`CMAKE_BUILD_TYPE`为`MinSizeRel`;若希望得到最快的执行速度,则可设置`CMAKE_BUILD_TYPE`为`Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS`来影响PaddlePaddle的编译过程。 - -**性能TIPS**,为了达到最快的计算速度,在CMake参数配置上,有以下建议: - -- 设置`CMAKE_BUILD_TYPE`为`Release` -- 设置`IOS_USE_VECLIB_FOR_BLAS=ON`,调用`vecLib`框架提供的BLAS函数进行矩阵计算。 - -## 编译和安装 - -CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。 - -``` -$ make -$ make install -``` - -注意:如果你曾在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。 - -执行完安装命令后,`your/path/to/install`目录中会包含以下内容: - -- `include`目录,其中包含所有C-API的头文件 -- `lib`目录,其中包含PaddlePaddle的C-API静态库 -- `third_party`目录,其中包含所依赖的所有第三方库 - -注意,如果PaddlePaddle库需要同时支持真机和模拟器,则需要分别编译真机和模拟器版本,然后使用`lipo`工具合并fat库。 - -自此,PaddlePaddle库已经安装完成,用户可将合成的fat库用于深度学习相关的iOS App中,调用方法见C-API文档。 diff --git a/develop/doc_cn/_sources/mobile/cross_compiling_for_raspberry_cn.md.txt b/develop/doc_cn/_sources/mobile/cross_compiling_for_raspberry_cn.md.txt deleted file mode 100644 index f8ef9dc8031613831437745995268f3abc392f5b..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/mobile/cross_compiling_for_raspberry_cn.md.txt +++ /dev/null @@ -1,62 +0,0 @@ -# Raspberry Pi平台编译指南 - -通常有两个方法来构建基于 Rasspberry Pi 的版本: - -1. 通过ssh等方式登录到Raspberry Pi系统上来构建。所需的开发工具和第三方库可以参考 [`/Dockerfile`](https://github.com/PaddlePaddle/Paddle/blob/develop/Dockerfile)。 - -1. 另一个方法是交叉编译。这篇文档介绍在 Linux/x64 上交叉编译Raspberry Pi平台上适用的PaddlePaddle的方法和步骤。 - -## 安装交叉编译器 - -克隆下面 Github repo - -```bash -git clone https://github.com/raspberrypi/tools.git -``` - -即可在 `./tools/tree/master/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64` 目录里找到交叉编译器 arm-linux-gnueabihf-gcc 4.8.3。运行该编译工具链需要一台 Linux x64 机器上以及 2.14版本以上的 glibc。 - -## 配置交叉编译参数 - -CMake[支持交叉编译](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling)。PaddlePaddle for Raspberry Pi的配置信息在[cmake/cross_compiling/raspberry_pi.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/raspberry_pi.cmake)。 - -交叉编译Raspberry Pi版本PaddlePaddle库时,有一些必须配置的参数: - -- `CMAKE_SYSTEM_NAME`:CMake编译的目标平台,必须配置为`RPi`。在设置`CMAKE_SYSTEM_NAME=RPi`后,PaddlePaddle的CMake系统才认为在是在交叉编译Raspberry Pi系统的版本,并自动编译宿主机版protoc可执行文件、目标机版protobuf库、以及目标机版OpenBLAS库。 - -- `RPI_TOOLCHAIN`:编译工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动设置需要使用的交叉编译器;否则,用户需要在cmake时手动设置这些值。无默认值。 - -- `RPI_ARM_NEON`:是否使用NEON指令。目前必须设置成`ON`,默认值为`ON`。 - -- `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量`CC`的值;若环境变量`CC`没有设置,则设置成`cc`编译器。 - -一个常用的CMake配置如下: - -``` -cmake -DCMAKE_SYSTEM_NAME=RPi \ - -DRPI_TOOLCHAIN=your/path/to/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64 \ - -DRPI_ARM_NEON=ON \ - -DCMAKE_INSTALL_PREFIX=your/path/to/install \ - -DWITH_GPU=OFF \ - -DWITH_C_API=ON \ - -DWITH_PYTHON=OFF \ - -DWITH_SWIG_PY=OFF \ - .. -``` - -其中`WITH_C_API=ON`表示需要构建推理库。 - -用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置`CMAKE_BUILD_TYPE`为`MinSizeRel`;若希望最快的执行速度,则可设置`CMAKE_BUILD_TYPE`为`Release`。 - -## 编译和安装 - -CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle。 - -```bash -make -make install -``` - -注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。 - -执行完安装命令后,`your/path/to/install`目录中会包含`include`和`lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。 diff --git a/develop/doc_cn/_sources/survey/cluster_bootstrapping_tools.md.txt b/develop/doc_cn/_sources/survey/cluster_bootstrapping_tools.md.txt deleted file mode 100644 index 1cd9962700bb49866f1ed6987abc28b27888a23f..0000000000000000000000000000000000000000 --- a/develop/doc_cn/_sources/survey/cluster_bootstrapping_tools.md.txt +++ /dev/null @@ -1,71 +0,0 @@ -# Cluster bootstrapping tool survey -## Abstract -In order to bring up a cluster from bare metal machine to a fully functional kubernetes cluster for Paddlepaddle to run, we need to utilize some tools. Here we are going to compare [Sextant](https://github.com/k8sp/sextant) and [Tectonic installer](https://github.com/coreos/tectonic-installer) - -## Basic assumptions -Here are some basic assumptions before we move on to details -1. You are an administrator of a bare metal machine cluster, which means: - * you have full control to each of the machines. - * you have full control to the network which machines are connected to. -2. Machines can be booted from network with PEX or iPXE -3. You understand the [general procedure to bring up a cluster](#appendix-general-procedure-to-bring-up-a-cluster) - -if your cluster is able to mark above items with checkmarks, then keep reading. - -## Comparing Sextant and Tectonic installer -### Sextant -Sextant is an end2end solution to bring up a bare metal cluster to a fully functional k8s cluster, it integrates DHCP, name service, PEX, cloud-config-service, docker registry services altogether. - -#### Pros -1. End2End: basically all admin need to do is to config the cluster.yaml and power on the cluster. -2. Offline cluster configuration: Sextant has 2 phases during working with it, config time and deploy time. when admin is configuring, it requires admin's machine has internet connectivity, which will download some images, etc. But in deploy time, it's completely OK to go offline since all dependencies are ready during config time. -3. docker registry integrated. -4. GPU machine took care of. - -### Cons -1. k8s API server is not deployed with high availability in considering by default. -2. No grouping support. -3. No API interface, a one-off service. - - -### Tectonic installer -First of all, Tectonic is not free, it requires coreos.com account as a step of installation, and free user can only create less than 10 nodes. - -Tectonic is a suite of software which wraps around k8s and providing more utility regarding dev ops, ie, -Tectonic installer as it's named, it installs Tectonic to a bare metal cluster which means it's not totally an equivalent of Sextant. At the "booting a cluster" part, it mostly utilizes [Matchbox](https://github.com/coreos/matchbox), which is a general cluster bootstrapper. - -Matchbox's Approach is similar to Sexstant. - -### Pros -1. supports grouping machines. -2. supports running provisioning service in rtk. (not a big deal though). -3. supports http/gRPC API interface. -4. supports multi-template. - -### Cons -1. Not an e2e solution to bring up a cluster, need a lot of extra work and other software. -2. [Not fully supporting](https://github.com/coreos/matchbox/issues/550) centOS deployment yet. - -## Conclusion -Sextant is a better solution overall for paddle cloud deploying to a bare metal cluster. It would be great if Sextant can also 1) deploy k8s api server with high availability by default; 2) not designed as a one-off service. - - - -## Appendix: General procedure to bring up a cluster -It's physically impossible for a cluster admin to manually install OS and applications into cluster nodes one by one, here is what an admin would do in cloud industry: -1. setup a bootstrap machine with static IP in the cluster, which has following services: - * DHCP: assigns ip address for rest of the nodes. - * name service: to map node name to a IP - * PXE related services: the booting related info will be delivered to newly booted machines as their IP is assigned via DHCP service, PXE service will provide further booting and installing info and image with TFTP and http protocol. - * cluster config service: this is for providing cluster node with OS config via http - * optional docker registry: a built-in docker registry makes the whole cluster independent from connecting internet, and speeds up software distribution. -2. New node powers on, it will - * broadcast the request for an IP address - * DHCP server assigns the IP address, and deliver the PXE booting related info to the node. - * cluster node will request config files with booting info delivered with DHCP via the TFTP service, and in most of the cases, the config file will point to a http service for the booting image. - * Since PXE is configured with initrd, it will utilize the cloud config service and do further installations like coreOS or K8s installations. - * then restart the node. - -For further understanding, following 2 links from Matchbox are some good readings: -* [Machine lifecycle](https://github.com/coreos/matchbox/blob/master/Documentation/machine-lifecycle.md) -* [PXE booting](https://github.com/coreos/matchbox/blob/master/Documentation/network-booting.md) diff --git a/develop/doc_cn/build_and_install/build_from_source_cn.html b/develop/doc_cn/build_and_install/build_from_source_cn.html index 827675ce4bcf515d4165210b44a814c57ca72901..29fddef0cbe0e7fee58d66f0a7c1c57f103932e8 100644 --- a/develop/doc_cn/build_and_install/build_from_source_cn.html +++ b/develop/doc_cn/build_and_install/build_from_source_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                              • 开发标准
                              • FAQ
                                  diff --git a/develop/doc_cn/build_and_install/docker_install_cn.html b/develop/doc_cn/build_and_install/docker_install_cn.html index ff7ee771be3cd6a95dfedea045c2d78783870181..402094927087f1ac84df56eb3f14238909b4be06 100644 --- a/develop/doc_cn/build_and_install/docker_install_cn.html +++ b/develop/doc_cn/build_and_install/docker_install_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                • 开发标准
                                • FAQ
                                    diff --git a/develop/doc_cn/build_and_install/index_cn.html b/develop/doc_cn/build_and_install/index_cn.html index 5df3b9c4bc7cdb08a7ec05fd4ce3bd4116550d2c..521349742c1a61ffed1ee0dac6512055da23831b 100644 --- a/develop/doc_cn/build_and_install/index_cn.html +++ b/develop/doc_cn/build_and_install/index_cn.html @@ -143,6 +143,7 @@ var _hmt = _hmt || [];
                                  • 开发标准
                                  • FAQ
                                      diff --git a/develop/doc_cn/build_and_install/pip_install_cn.html b/develop/doc_cn/build_and_install/pip_install_cn.html index ca6c358a87d692b46854beced9d1c901619ec4e2..3968a97ebd0f65bfa7097e4ccbd7dda2ebcb3e5c 100644 --- a/develop/doc_cn/build_and_install/pip_install_cn.html +++ b/develop/doc_cn/build_and_install/pip_install_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                    • 开发标准
                                    • FAQ
                                        diff --git a/develop/doc_cn/design/api.html b/develop/doc_cn/design/api.html deleted file mode 100644 index 5d9afbf68e0d3d96ea73e9ec01d218384b424d32..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/api.html +++ /dev/null @@ -1,495 +0,0 @@ - - - - - - - - - - - - - PaddlePaddle Design Doc — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        PaddlePaddle Design Doc

                                        -
                                        -

                                        Ingredients

                                        -

                                        As our design principle is starting from the essence: how could we -allow users to express and solve their problems as neural networks. -Some essential concepts that our API have to provide include:

                                        -
                                          -
                                        1. A topology is an expression of layers.
                                        2. -
                                        3. A layer could be any kind of computation, including cost.
                                        4. -
                                        5. Some layers have parameters, some don’t. Most costs don’t have -parameters.
                                        6. -
                                        7. In some topologies, layers share parameters. For -example, -the network for training a ranking model.
                                        8. -
                                        9. At programming time, users specify topologies and possible sharing -of parameters. PaddlePaddle can figure out and create parameters -required (and possibly shared) by one or more topologies.
                                        10. -
                                        -
                                        -
                                        -

                                        Starting from Examples

                                        -

                                        As a summarization -of -our disucssion, -let us present two examples here:

                                        -
                                        -

                                        Example 1. Sharing Parameters between Layers

                                        -

                                        We use -the -3-branch ranking model -in this example. For your convenience, I copy-a-paste the model’s -topology as follows:

                                        -
                                        A -> f -\
                                        -Q -> f --> cost
                                        -B -> f -/
                                        -
                                        -
                                        -

                                        The following program trains the topology including the cost, and then -use the sub-network in the trained topology in inference:

                                        -
                                        def f(in):
                                        -    e = paddle.layer.embedding(in, parameter_name="embedding")
                                        -    o = paddle.layer.softmax(e, parameter_name="semantic")
                                        -    return o
                                        -
                                        -# Create 3 topologies (subnets), they share parameters because all
                                        -# correspoinding layers have the same parameter names.
                                        -fA = f(paddle.layer.data(input_name="A"))
                                        -fB = f(paddle.layer.data(input_name="B"))
                                        -fQ = f(paddle.layer.data(input_name="Q"))
                                        -
                                        -topology = paddle.layer.less_than(
                                        -               paddle.layer.cross_entropy(fA, fQ),
                                        -               paddle.layer.corss_entropy(fB, fQ))
                                        -
                                        -# Derive parameters required in topology and create them in model.
                                        -parameters = paddle.parameters.create(topology)
                                        -
                                        -# Estimate parameters used in topology from data.
                                        -paddle.train(topology, parameters, reader=read_ranking_model_data)
                                        -
                                        -# Inference using fA (or fB or fC, as they share their parameters).
                                        -[testA, testB, testQ] = read_ranking_model_data()
                                        -print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Example 2. Sharing Parameters between “Models”

                                        -

                                        We use GAN in -this example. In the following example program, d0 and d1 -correspond to the two networks in the following figure:

                                        -

                                        -
                                        def G(in):
                                        -    # over-simplified example as G has only one layers:
                                        -    return paddle.layer.fc(in, parameter_name="G")
                                        -
                                        -def D(in);
                                        -    # again, over-simplified:
                                        -    return paddle.layer.fc(in, parameter_name="D")
                                        -
                                        -# Construct the first topology, which contains both D and G.
                                        -# By learning this topology, we update parameters of G.
                                        -d0 = paddle.layer.should_be_false(D(G(paddle.layer.data())))
                                        -
                                        -# Construct a second topology d1, which contains only D. By
                                        -# training this topology, we update parameters of D.  Note
                                        -# that d1 share parameters with d0.
                                        -d1 = paddle.layer.should_be_true(D(paddle.layer.data()))
                                        -
                                        -# Create parameters from a list of multiple topologies (models) for
                                        -# the chance to share parameters between these topologies.
                                        -parameters = paddle.parameters.create([d0, d1])
                                        -
                                        -# Iterative training of GAN.
                                        -for ...:
                                        -    train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"})
                                        -    train(d1, parameters, reader=read_from_realistic_images)
                                        -
                                        -# Use d1 for inference:
                                        -print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Summarization

                                        -

                                        Above two programs reveal some important design concerns:

                                        -
                                          -
                                        1. Users describe a topology as an expression of layers. Every layer -has a parameter name. If the users don’t specify it explicitly, it’s automatically generated as a unique name. By -specifying the parameter name, users can specify the sharing of -parameters between layers and even between topologies.
                                        2. -
                                        3. paddle.parameters.create figures out parameters required by one -or more topologies from parameter names of layers. It creates these -parameters and returns a ParameterSet object, which is in essence -a map from parameter names to parameters.
                                        4. -
                                        5. At training and inference time, paddle.train and paddle.infer -requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons:
                                            -
                                          1. This prevents users from forgetting to call -paddle.parameters.create.
                                          2. -
                                          3. paddle.train needs to know which parameter set to update.
                                          4. -
                                          5. Users could load another (pre-trained) parameter set and use it -with a topology in train.infer.
                                          6. -
                                          -
                                        6. -
                                        7. By specifying the immutable_parameters parameter of -paddle.train, we can forbid the update of these parameters.
                                        8. -
                                        -
                                        -
                                        -
                                        -

                                        Reader

                                        -

                                        Not all programming frameworks allow users to define I/O functions. -An example is Google MapReduce, which can only read from text, -SSTable, and RecordIO files. Hadoop MapReduce allows users to define -readers and writers by deriving from base classes Reader and -Writer. The former is less flexible but also less error-prone. We -decide to provide the flexibility to users to define their readers.

                                        -

                                        There are some open questions here:

                                        -
                                          -
                                        1. Should a reader return a Python dictionary?
                                        2. -
                                        3. How to map multiple outputs from a reader to multiple data layers?
                                        4. -
                                        5. How to easily compose some existing readers to read more data and -feed a topology with more data layers?
                                        6. -
                                        -
                                        -
                                        -

                                        Training

                                        -

                                        The recommended way to training a model is to call paddle.train, -which simply calls paddle.trainer.Default, a global variable of -type paddle.trainer.SGD. Equivalently, we can do

                                        -
                                        opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))
                                        -opt.train(topology, parameters, reader=read, ...)
                                        -
                                        -
                                        -
                                        -

                                        Updater

                                        -

                                        Please be aware that a trainer can accept an updater as its data -member, where an updater is a class derived from -paddle.trainer.Updater. This is to make it easier to customize -trainers, as discussed -here.

                                        -
                                        -
                                        -

                                        Event Handler

                                        -

                                        paddle.train and paddle.trainer.XXX.train take an optional -parameter event_handler, which should be either None or a function -that handle some events:

                                        -
                                          -
                                        1. BeginTraining
                                        2. -
                                        3. EndTraining
                                        4. -
                                        5. BeginIteration
                                        6. -
                                        7. EndIteration
                                        8. -
                                        9. BeginPass
                                        10. -
                                        11. EndPass
                                        12. -
                                        -

                                        where EndPass is sent if and only if the reader yields -end_pass=True.

                                        -

                                        An example as follows:

                                        -
                                        def event_handler(event):
                                        -    if ininstance(event, paddle.event.EndIteration):
                                        -        print paddle.test(...)
                                        -
                                        -paddle.train(topology, parameters, reader, event_handler)
                                        -
                                        -
                                        -

                                        If we are writing a PaddlePaddle program in and for iPython/Jypyter, -we can use metaplotlib in the event handler to plot a curve of -cost/error versus iterations, as shown -here.

                                        -
                                        -
                                        -

                                        Distributed Training

                                        -

                                        If users want to do distributed training on a cluster, s/he should -call paddle.dist_train and provides access tokens to the cluster as -a parameter.

                                        -

                                        For example, if the user has a TLS certificate that allows him to -access a Kubernetes cluster, s/he should be able to call

                                        -
                                        paddle.dist_train(model,
                                        -                  trainer=paddle.trainer.SGD(...,
                                        -                                             paddle.updater.Adam(...)),
                                        -                  reader=read,
                                        -                  k8s_user="yi",
                                        -                  k8s_token="kube_cluster_tls.pem",
                                        -                  k8s_job="hello",
                                        -                  num_parameter_servers=15)
                                        -
                                        -
                                        -

                                        The pseudo code of paddle.dist_train is as follows:

                                        -
                                        def dist_train(topology, parameters, trainer, reader, ...):
                                        -    if os.getenv("KUBERNETES_SERVICE_HOST") == None:
                                        -        image_name = k8s_user + '/' + k8s_job
                                        -        docker_build(image_name)
                                        -        docker_push()
                                        -        kube_ctrl_start_job(image_name, k8s_user, k8s_token)
                                        -    else:
                                        -        rank = kube_list_containers_in_job_and_return_current_containers_rank()
                                        -        if rank == 0:
                                        -            master()
                                        -        elif rank < 15:
                                        -            parameter_server()
                                        -        else:
                                        -            trainer.train(model, reader=read)
                                        -
                                        -
                                        -

                                        Please be aware that if a process is running on the Kubernetes -cluster, it will have some environment variables pre-defined.

                                        -

                                        If dist_train doesn’t see these environment variables, it knows -that it’s running on users’ personal computer, and it should work as a -launcher. Otherwise, it knows that it’s running on the cluster and -need to figure out its role as either the master, or a trainer, or a -parameter server.

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/auto_gradient_check.html b/develop/doc_cn/design/auto_gradient_check.html deleted file mode 100644 index 2be2fd09fa95975a47ccc0c2be5f15ee05fb228e..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/auto_gradient_check.html +++ /dev/null @@ -1,431 +0,0 @@ - - - - - - - - - - - - - Auto Gradient Check Design — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Auto Gradient Check Design

                                        -
                                        -
                                        -

                                        Background:

                                        -
                                          -
                                        • Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right because of the following challenges:
                                            -
                                          1. The formula for backpropagation formula should be correct according to the forward computation.
                                          2. -
                                          3. The Implementation of the above shoule be correct in CPP.
                                          4. -
                                          5. It is difficult to prepare an unbiased test data.
                                          6. -
                                          -
                                        • -
                                        • Auto gradient checking gets a numerical gradient using forward Operator and uses it as a reference for the backward Operator’s result. It has several advantages:
                                            -
                                          1. Numerical gradient checker only needs the forward operator.
                                          2. -
                                          3. The user only needs to prepare the input data for forward Operator and not worry about the backward Operator.
                                          4. -
                                          -
                                        • -
                                        -
                                        -
                                        -

                                        Mathematical Theory

                                        -

                                        The following documents from Stanford have a detailed explanation of how to compute the numerical gradient and why it is useful.

                                        - -
                                        -
                                        -

                                        Numerical Gradient Implementation

                                        -
                                        -

                                        Python Interface

                                        -
                                        def get_numerical_gradient(op,
                                        -                         input_values,
                                        -                         output_name,
                                        -                         input_to_check,
                                        -                         delta=0.005,
                                        -                         local_scope=None):
                                        -    """
                                        -    Get Numerical Gradient for the input of an operator.
                                        -
                                        -    :param op: C++ operator instance, could be an network.
                                        -    :param input_values: The input variables. Should be an dictionary, whose key is
                                        -    variable name, and value is a numpy array.
                                        -    :param output_name: The final output variable name.
                                        -    :param input_to_check: The input variable with respect to which the gradient has to be computed.
                                        -    :param delta: The perturbation value for numerical gradient method. The
                                        -    smaller the delta, the more accurate the result. But if the delta is too
                                        -    small, it will suffer from the numerical stability problem.
                                        -    :param local_scope: The local scope used for get_numeric_gradient.
                                        -    :return: The gradient array in numpy format.
                                        -    """
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Explanation:

                                        -
                                          -
                                        • Why do we need an output_name
                                            -
                                          • An Operator may have multiple Outputs, one can compute an independent gradient from each Output. So the caller should specify the name of the output variable.
                                          • -
                                          -
                                        • -
                                        • Why do we need input_to_check
                                            -
                                          • One operator can have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numerical Gradient needs to calculate them one by one. So get_numeric_gradient is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call get_numeric_gradient multiple times each with a different input.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -

                                        Core Algorithm Implementation

                                        -
                                            # we only compute the gradient of one element a time.
                                        -    # we use a for loop to compute the gradient of each element.
                                        -    for i in xrange(tensor_size):
                                        -        # get one input element using the index i.
                                        -        original = tensor_to_check.get_float_element(i)
                                        -
                                        -        # add delta to it, run the forward op and then
                                        -        # get the new value of the result tensor.
                                        -        x_pos = original + delta
                                        -        tensor_to_check.set_float_element(i, x_pos)
                                        -        y_pos = get_output()
                                        -
                                        -        # Subtract delta from this element, run the op again
                                        -        # and get the new value of the result tensor.
                                        -        x_neg = original - delta
                                        -        tensor_to_check.set_float_element(i, x_neg)
                                        -        y_neg = get_output()
                                        -
                                        -        # restore old value
                                        -        tensor_to_check.set_float_element(i, original)
                                        -
                                        -        # compute the gradient of this element and store
                                        -        # it into a numpy array.
                                        -        gradient_flat[i] = (y_pos - y_neg) / delta / 2
                                        -
                                        -    # reshape the gradient result to the shape of the source tensor.
                                        -    return gradient_flat.reshape(tensor_to_check.get_dims())
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Auto Gradient Check Framework

                                        -

                                        Each Operator Kernel has three kinds of Gradient:

                                        -
                                          -
                                        1. Numerical gradient
                                        2. -
                                        3. CPU kernel gradient
                                        4. -
                                        5. GPU kernel gradient (if supported by the device)
                                        6. -
                                        -

                                        The numerical gradient only relies on the forward Operator, so we use the numerical gradient as the reference value. The gradient checking is performed in the following three steps:

                                        -
                                          -
                                        1. Calculate the numerical gradient
                                        2. -
                                        3. Calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient.
                                        4. -
                                        5. Calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient. (if supported)
                                        6. -
                                        -
                                        -

                                        Python Interface

                                        -
                                            def check_grad(self,
                                        -                   forward_op,
                                        -                   input_vars,
                                        -                   inputs_to_check,
                                        -                   output_name,
                                        -                   no_grad_set=None,
                                        -                   only_cpu=False,
                                        -                   max_relative_error=0.005):
                                        -        """
                                        -        :param forward_op: used to create backward_op
                                        -        :param input_vars: numpy value of input variable. The following
                                        -          computation will use these variables.
                                        -        :param inputs_to_check: the input variable with respect to which the
                                        -          gradient will be computed.
                                        -        :param output_name: The final output variable name.
                                        -        :param max_relative_error: The relative tolerance parameter.
                                        -        :param no_grad_set: used to create backward ops
                                        -        :param only_cpu: only compute and check gradient on cpu kernel.
                                        -        :return:
                                        -        """
                                        -
                                        -
                                        -
                                        -
                                        -

                                        How to check if two numpy arrays are close enough?

                                        -

                                        if abs_numerical_grad is nearly zero, then use absolute error for numerical_grad.

                                        -
                                        numerical_grad = ...
                                        -operator_grad = numpy.array(scope.find_var(grad_var_name(name)).get_tensor())
                                        -
                                        -abs_numerical_grad = numpy.abs(numerical_grad)
                                        -# if abs_numerical_grad is nearly zero, then use abs error for
                                        -# numeric_grad, instead of relative error.
                                        -abs_numerical_grad[abs_numerical_grad < 1e-3] = 1
                                        -
                                        -diff_mat = numpy.abs(abs_numerical_grad - operator_grad) / abs_numerical_grad
                                        -max_diff = numpy.max(diff_mat)
                                        -
                                        -
                                        -
                                        -

                                        Notes:

                                        -

                                        The Input data for auto gradient checker should be reasonable to avoid numerical stability problem.

                                        -
                                        - -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/backward.html b/develop/doc_cn/design/backward.html deleted file mode 100644 index 139558ad027c97d9cb38af8cb41da671be7eef41..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/backward.html +++ /dev/null @@ -1,402 +0,0 @@ - - - - - - - - - - - - - Backward Building — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Backward Building

                                        -
                                        -

                                        Motivation

                                        -

                                        In Neural Network, most models are solved by the backpropagation algorithm(known as BP) at present. Technically, BP calculates the gradient of the loss function, then propagates it back through the networks following the chain rule. However, when configuring the model structure, users do not need to define the backward part. So a mechanism is required by the framework which can complete the model’s backward part automatically according to the given forward part.

                                        -

                                        When implementing a specific op, the developer is also asked to implement its backward version, called grad_op. A grad_op takes gradients of its corresponding op‘s outputs, and calculate gradients of the op‘s inputs. During the building of a model’s backward part, the framework creates each forward op‘s grad_op, and then string them together in reverse order of forwarding part. In this way, gradients spread from the end to the beginning of the model, in another word, from the loss to parameters.

                                        -
                                        -
                                        -

                                        Challenges

                                        -

                                        The motivation of backward building is apparent. However, implementation it correctly is not so easy. In the Fluid design, a deep learning model is described by Program, Block, Op and Variable. The Block itself can be nested. It means that the ops and variables are scattered across different blocks rather than all be gathered in a single graph. Our backward building algorithm shall visit blocks in recursive order and be able to insert grad_ops and new created variables into the right place.

                                        -
                                        -
                                        -

                                        Usage

                                        -

                                        Although the whole algorithm is comprised of many functions, only one is exposed as API:

                                        -
                                        def append_backward(loss, parameter_list=None, no_grad_set=None):
                                        -    """
                                        -    Append backward part to main_program
                                        -
                                        -    Args:
                                        -        loss(Variable): The variable generated by the cost function.
                                        -        parameter_list(list): Parameters that need to be updated by optimizers.
                                        -            If None, it means all parameters need to be updated.
                                        -
                                        -        no_grad_set(set): Variables that have no gradients in Block 0. 
                                        -            If None, the set will be generated inside the function and 
                                        -            contains all variables with `step_gradient=True` from all blocks.
                                        -        
                                        -    Return:
                                        -        (list[Variable]): list of (parameters, gradients) pair.
                                        -    """
                                        -
                                        -
                                        -

                                        By invoking this API, the framework appends backward part of the program where the loss is. It takes three arguments. loss means the final loss value. It must be a scalar and is usually the output of the loss layer. It is also where the gradient generated and backpropagation starts. parameter_list marks all parameters needs updating. If it’s None, all parameter will be updated by optimizers. no_grad_set marks variables without gradient. if all outputs of some grad_op are in no_grad_set, the grad_op will not be run.

                                        -

                                        This API will be invoked automatically before optimizer building. -As a result, in most cases, users do not need to invoke the API by themselves to append backward part.

                                        -
                                        -
                                        -

                                        Implementation

                                        -

                                        The implementation of backward building algorithm is in backward.py file. The whole algorithm can be divided into two independent parts: creating grad_ops and creating new variables.

                                        -
                                        -

                                        Creating grad_ops

                                        -

                                        The creating of grad_ops is implemented by:

                                        -
                                        def _append_backward_ops_(target,
                                        -                          block,
                                        -                          target_block,
                                        -                          no_grad_dict,
                                        -                          grad_to_var):
                                        -    """
                                        -    Create all grad ops, and insert them into given block
                                        -
                                        -    Args:
                                        -        target(Variable): the target variable of forward pass
                                        -        block(Block): the block where forward ops are
                                        -        target_block(Block): the block which is going to hold new generated grad ops
                                        -        no_grad_dict(dict): 
                                        -            key(int)  block index
                                        -            val(set) a set of varibale names. These varibales have no gradient
                                        -        grad_to_var(dict)(output argument):
                                        -            key(str): grad variable name
                                        -            val(str): corresponding forward variable name
                                        -    """
                                        -
                                        -
                                        -

                                        Given a block, the function will traverses all ops in this block in reverse order, gets corresponding grad_op from the C++ core via core.get_grad_op_desc(), then append it to target_block.

                                        -

                                        However, some specific op(e.g. while_op, if_else_op) can hold its own sub-block. For these sub-blocks contains ops as well, the grad_op creating should be recursive.

                                        -

                                        During the reverse traversal, we check each op whether it has an attribute named sub_block. If so, it means there is a sub-block and we need to deal with it first. After creating a new block whose father is the one in op‘s attribute, we invoke _append_backward_ops_() recursively, assigning the new block to parameter target_block and the one in op‘s attribute to block. The pseudo-code shows this process:

                                        -
                                        ******* pseudo-code ********
                                        -for op in reversed(block.ops):
                                        -    if op has an attribute named 'sub_block':
                                        -        Get the sub-block(`s_block`) from op's attribute.
                                        -        Create a new block(`grad_s_block`), whose father is `s_block`.
                                        -        Invoke _append_backward_ops_(), with `block=s_block` and `target_block=grad_s_block`
                                        -    
                                        -    Invoke `core.get_grad_op_desc()` to get op's grad_op.
                                        -    Insert name correspondings between variables and their gradients of the grad_op to grad_to_var
                                        -    Assign grad_s_block to grad_op as it's 'sub_block' attribute.
                                        -    Append grad_op to current target_block.
                                        -
                                        -
                                        -

                                        The first invoking of _append_backward_ops_() is initiated by append_backward(), in which parameters block and target_block are all assigned with root block(the block with index 0).

                                        -
                                        -
                                        -

                                        Corner Cases of grad_op Creating

                                        -

                                        In the previous section, we show the regular process of grad_op creating. However, in some corner cases, the conventional algorithm is not enough to get the correct result and appending handling is required. These additional processes run after the algorithm mentioned above and do some special adjusts on its output grad_ops.

                                        -
                                        -

                                        Shared Variables

                                        -

                                        If a variable is read by more than one op in the forward pass, its gradient is likely to be written by more than one grad_ops in the next backward pass. To make the gradient result being the sum of all grad_ops’ outputs instead of the last running one, we assign each output with a temporary variable and then add a sum_op to add them up.

                                        -

                                        For the debug convenience, if the final gradient name is w@GRAD, it’s corresponding temporary variables will be named as w@GRAD@RENAME@0, w@GRAD@RENAME@1...

                                        -

                                        See function _addup_repetitive_outputs_ in backward.py for implementation details.

                                        -
                                        -
                                        -

                                        No Gradient Variables

                                        -

                                        In our framework, variables can be marked as no_gradient, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some grad_op are marked as no_gradient, the grad_op itself can be skipped in backward pass.

                                        -

                                        Another situation is all the gradient inputs of some grad_op are marked as no_gradient, which means all of them can be considered as zeros. For grad_ops are in essence the propagation of gradients, all the outputs are definitely zeros when all gradient inputs are zeros. Therefore the grad_op can also be skipped.

                                        -

                                        It should be noted that all these zero gradients still need to be creating and initialized by something, otherwise following grad_ops who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ fill_zeros_like_op to initialize them as all zeros.

                                        -

                                        This features are implemented in function _remove_no_grad_branch_. It checks new created grad_ops one-by-one, removes who can be skipped and inserts fill_zeros_like_op when its necessary. We can get the no_grad_set from the _append_backward_ops_ argument no_grad_dict or generate it on the fly by scanning all variables’ no_gradient attribute(True or False).

                                        -
                                        -
                                        -
                                        -

                                        Creating Backward Variables

                                        -

                                        Up to now, we have completed all creating and adjusting jobs of grad_ops. However, backward variables have not been created. Now they are only represented by grad_op‘s input and output arguments. The backward variable creating job will be done by:

                                        -
                                        def _append_backward_vars_(block, 
                                        -                           start_op_idx, 
                                        -                           grad_to_var, 
                                        -                           grad_info_map):
                                        -    """
                                        -    Create new variables required by backward pass.
                                        -
                                        -    Args:
                                        -        block(Block): the block where new variables will be created
                                        -        start_op_idx(int): Only variables required by ops in block.ops[start_op_idx : ] will be created
                                        -        grad_to_var(dict):
                                        -            key(str): grad variable name
                                        -            val(str): corresponding forward variable name
                                        -            In most cases, this dict is generated by _append_backward_ops_()
                                        -        grad_info_map(dict)(output argument):
                                        -            key(str): forward variable name
                                        -            val(tuple): a tuple of (str, int), str is the corresponding grad name, int is the block index
                                        -    """
                                        -
                                        -
                                        -

                                        Given a block, this function traverses all the grad_ops in it(The argument start_op_idx indicates where the grad_op sequence starts.) and creates all the uncreated outputs. The pseudo-code shows this process:

                                        -
                                        for op in block.ops[start_op_idx : ]:
                                        -
                                        -    if op has an attribute named 'sub_block':
                                        -        Get the sub-block(`s_block`) from op's attribute.
                                        -        Invoke _append_backward_vars_(), with `block=s_block`
                                        -        
                                        -    for var_name in op.all_output_names():
                                        -        if block.has_var_recursive(var_name) or var_name is the name of empty variable:
                                        -            continue
                                        -        create a new variable named 'var_name' in block
                                        -        if grad_to_var.has_key(var_name):
                                        -            set grad_info_map[grad_to_var[var_name]] as a tuple of (var_name. block)
                                        -            
                                        -    do op's var type inference
                                        -    do op's shape inference
                                        -
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/block.html b/develop/doc_cn/design/block.html deleted file mode 100644 index 3f75987c529378de531c14a04f969031062646f3..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/block.html +++ /dev/null @@ -1,567 +0,0 @@ - - - - - - - - - - - - - Design Doc: Block and Scope — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Block and Scope

                                        -
                                        -

                                        The Representation of Computation

                                        -

                                        Both deep learning systems and programming languages help users describe computation procedures. These systems use various representations of computation:

                                        -
                                          -
                                        • Caffe, Torch, and Paddle: sequences of layers.
                                        • -
                                        • TensorFlow, Caffe2, Mxnet: graph of operators.
                                        • -
                                        • PaddlePaddle: nested blocks, like C++ and Java programs.
                                        • -
                                        -
                                        -
                                        -

                                        Block in Programming Languages and Deep Learning

                                        -

                                        In programming languages, a block is a pair of curly braces that includes local variables definitions and a sequence of instructions or operators.

                                        -

                                        Blocks work with control flow structures like if, else, and for, which have equivalents in deep learning:

                                        -

                                        | programming languages | PaddlePaddle | -|———————–|———————–| -| for, while loop | RNN, WhileOp | -| if, if-else, switch | IfElseOp, SwitchOp | -| sequential execution | a sequence of layers |

                                        -

                                        A key difference is that a C++ program describes a one pass computation, whereas a deep learning program describes both the forward and backward passes.

                                        -
                                        -
                                        -

                                        Stack Frames and the Scope Hierarchy

                                        -

                                        The existence of the backward pass makes the execution of a block of PaddlePaddle different from traditional programs:

                                        -

                                        | programming languages | PaddlePaddle | -|———————–|———————————| -| stack | scope hierarchy | -| stack frame | scope | -| push at entering block| push at entering block | -| pop at leaving block | destroy when minibatch completes|

                                        -
                                          -
                                        1. In traditional programs:
                                            -
                                          • When the execution enters the left curly brace of a block, the runtime pushes a frame into the stack, where it realizes local variables.
                                          • -
                                          • After the execution leaves the right curly brace, the runtime pops the frame.
                                          • -
                                          • The maximum number of frames in the stack is the maximum depth of nested blocks.
                                          • -
                                          -
                                        2. -
                                        3. In PaddlePaddle
                                            -
                                          • When the execution enters a block, PaddlePaddle adds a new scope, where it realizes variables.
                                          • -
                                          • PaddlePaddle doesn’t pop a scope after the execution of the block because variables therein are used by the backward pass. So it has a stack forest known as a scope hierarchy.
                                          • -
                                          • The height of the highest tree is the maximum depth of nested blocks.
                                          • -
                                          • After the processing of a minibatch, PaddlePaddle destroys the scope hierarchy.
                                          • -
                                          -
                                        4. -
                                        -
                                        -
                                        -

                                        Use Blocks in C++ and PaddlePaddle Programs

                                        -

                                        Let us consolidate the discussion by presenting some examples.

                                        -
                                        -

                                        Blocks with if-else and IfElseOp

                                        -

                                        The following C++ programs shows how blocks are used with the if-else structure:

                                        -
                                        namespace pd = paddle;
                                        -
                                        -int x = 10;
                                        -int y = 1;
                                        -int z = 10;
                                        -bool cond = false;
                                        -int o1, o2;
                                        -if (cond) {
                                        -  int z = x + y;
                                        -  o1 = z;
                                        -  o2 = pd::layer::softmax(z);
                                        -} else {
                                        -  int d = pd::layer::fc(z);
                                        -  o1 = d;
                                        -  o2 = d+1;
                                        -}
                                        -
                                        -
                                        -

                                        An equivalent PaddlePaddle program from the design doc of the IfElseOp operator is as follows:

                                        -
                                        import paddle as pd
                                        -
                                        -x = minibatch([10, 20, 30]) # shape=[None, 1]
                                        -y = var(1) # shape=[1], value=1
                                        -z = minibatch([10, 20, 30]) # shape=[None, 1]
                                        -cond = larger_than(x, 15) # [false, true, true]
                                        -
                                        -ie = pd.ifelse()
                                        -with ie.true_block():
                                        -    d = pd.layer.add_scalar(x, y)
                                        -    ie.output(d, pd.layer.softmax(d))
                                        -with ie.false_block():
                                        -    d = pd.layer.fc(z)
                                        -    ie.output(d, d+1)
                                        -o1, o2 = ie(cond)
                                        -
                                        -
                                        -

                                        In both examples, the left branch computes x+y and softmax(x+y), the right branch computes fc(x) and x+1 .

                                        -

                                        The difference is that variables in the C++ program contain scalar values, whereas those in the PaddlePaddle programs are mini-batches of instances.

                                        -
                                        -
                                        -

                                        Blocks with for and RNNOp

                                        -

                                        The following RNN model in PaddlePaddle from the RNN design doc :

                                        -
                                        x = sequence([10, 20, 30]) # shape=[None, 1]
                                        -m = var(0) # shape=[1]
                                        -W = var(0.314, param=true) # shape=[1]
                                        -U = var(0.375, param=true) # shape=[1]
                                        -
                                        -rnn = pd.rnn()
                                        -with rnn.step():
                                        -  h = rnn.memory(init = m)
                                        -  h_prev = rnn.previous_memory(h)
                                        -  a = layer.fc(W, x)
                                        -  b = layer.fc(U, h_prev)  
                                        -  s = pd.add(a, b)
                                        -  act = pd.sigmoid(s)
                                        -  rnn.update_memory(h, act)
                                        -  rnn.output(a, b)
                                        -o1, o2 = rnn()
                                        -
                                        -
                                        -

                                        has its equivalent C++ program as follows

                                        -
                                        int* x = {10, 20, 30};
                                        -int* m = {0};
                                        -int* W = {0.314};
                                        -int* U = {0.375};
                                        -
                                        -int mem[sizeof(x) / sizeof(x[0]) + 1];
                                        -int o1[sizeof(x) / sizeof(x[0]) + 1];
                                        -int o2[sizeof(x) / sizeof(x[0]) + 1];
                                        -for (int i = 1; i <= sizeof(x)/sizeof(x[0]); ++i) {
                                        -  int x = x[i-1];
                                        -  if (i == 1) mem[0] = m;
                                        -  int a = W * x;
                                        -  int b = Y * mem[i-1];
                                        -  int s = fc_out + hidden_out;
                                        -  int act = sigmoid(sum);
                                        -  mem[i] = act;
                                        -  o1[i] = act;
                                        -  o2[i] = hidden_out;
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Compilation and Execution

                                        -

                                        Like TensorFlow, a PaddlePaddle program is written in Python. The first part describes a neural network as a protobuf message, and the rest executes the message for training or inference.

                                        -

                                        The generation of this protobuf message is similar to how a compiler generates a binary executable file. The execution of the message is similar to how the OS executes the binary file.

                                        -
                                        -
                                        -

                                        The “Binary Executable File Format”

                                        -

                                        The definition of the protobuf message is as follows:

                                        -
                                        message BlockDesc {
                                        -  repeated VarDesc vars = 1;
                                        -  repeated OpDesc ops = 2;
                                        -}
                                        -
                                        -
                                        -

                                        The step net in above RNN example would look like

                                        -
                                        BlockDesc {
                                        -  vars = {
                                        -    VarDesc {...} // x
                                        -    VarDesc {...} // h
                                        -    VarDesc {...} // fc_out
                                        -    VarDesc {...} // hidden_out
                                        -    VarDesc {...} // sum
                                        -    VarDesc {...} // act
                                        -  }
                                        -  ops = {
                                        -    OpDesc {...} // matmul
                                        -    OpDesc {...} // add_two
                                        -    OpDesc {...} // sigmoid
                                        -  }
                                        -};
                                        -
                                        -
                                        -

                                        Also, the RNN operator in above example is serialized into a protobuf message of type OpDesc and would look like:

                                        -
                                        OpDesc {
                                        -  inputs = {0} // the index of x in vars of BlockDesc above
                                        -  outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above
                                        -  attrs {
                                        -    "states" : {1} // the index of h
                                        -    "step_net" : <above step net>
                                        -  }
                                        -};
                                        -
                                        -
                                        -

                                        This OpDesc value is in the ops field of the BlockDesc value representing the global block.

                                        -
                                        -
                                        -

                                        The Compilation of Blocks

                                        -

                                        During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator).

                                        -

                                        VarDesc in a block should have its name scope to avoid local variables affecting parent block’s name scope. -Child block’s name scopes should inherit the parent’s so that OpDesc in child block can reference a VarDesc that is stored in the parent block. For example:

                                        -
                                        a = pd.Variable(shape=[20, 20])
                                        -b = pd.fc(a, params=["fc.w", "fc.b"])
                                        -
                                        -rnn = pd.create_rnn()
                                        -with rnn.stepnet():
                                        -    x = a.as_step_input()
                                        -    # reuse fc's parameter
                                        -    fc_without_b = pd.get_variable("fc.w")
                                        -    rnn.output(fc_without_b)
                                        -
                                        -out = rnn()
                                        -
                                        -
                                        -

                                        The method pd.get_variable can help retrieve a Variable by the name. The Variable may be stored in a parent block, but might be retrieved in a child block, so block should have a variable scope that supports inheritance.

                                        -

                                        In compiler design, the symbol table is a data structure created and maintained by compilers to store information about the occurrence of various entities such as variable names, function names, classes, etc.

                                        -

                                        To store the definition of variables and operators, we define a C++ class SymbolTable, like the one used in compilers.

                                        -

                                        SymbolTable can do the following:

                                        -
                                          -
                                        • store the definitions (some names and attributes) of variables and operators,
                                        • -
                                        • verify if a variable was declared,
                                        • -
                                        • make it possible to implement type checking (offer Protobuf message pointers to InferShape handlers).
                                        • -
                                        -
                                        // Information in SymbolTable is enough to trace the dependency graph. So maybe
                                        -// the Eval() interface takes a SymbolTable is enough.
                                        -class SymbolTable {
                                        - public:
                                        -  SymbolTable(SymbolTable* parent) : parent_(parent) {}
                                        -
                                        -  OpDesc* NewOp(const string& name="");
                                        -
                                        -  // TODO determine whether name is generated by python or C++.
                                        -  // Currently assume that a unique name will be generated by C++ if the
                                        -  // argument name is left default.
                                        -  VarDesc* Var(const string& name="");
                                        -
                                        -  // find a VarDesc by name, if recursive is true, find parent's SymbolTable
                                        -  // recursively.
                                        -  // this interface is introduced to support InferShape, find protobuf messages
                                        -  // of variables and operators, pass pointers into InferShape.
                                        -  //
                                        -  // NOTE maybe some C++ classes such as VarDescBuilder and OpDescBuilder should
                                        -  // be proposed and embedded into pybind to enable python operation on C++ pointers.
                                        -  VarDesc* FindVar(const string& name, bool recursive=true);
                                        -
                                        -  OpDesc* FindOp(const string& name);
                                        -
                                        -  BlockDesc Compile() const;
                                        -
                                        - private:
                                        -  SymbolTable* parent_;
                                        -
                                        -  map<string, OpDesc> ops_;
                                        -  map<string, VarDesc> vars_;
                                        -};
                                        -
                                        -
                                        -

                                        After all the description of variables and operators is added into SymbolTable, -the block has enough information to run.

                                        -

                                        The Block class takes a BlockDesc as input, and provides Run and InferShape functions.

                                        -
                                        namespace {
                                        -
                                        -class Block : OperatorBase {
                                        -public:
                                        -  Block(const BlockDesc& desc) desc_(desc) {}
                                        -
                                        -  void InferShape(const framework::Scope& scope) const override {
                                        -    if (!symbols_ready_) {
                                        -      CreateVariables(scope);
                                        -      CreateOperators();
                                        -    }
                                        -    // should run InferShape first.
                                        -    for (auto& op : runtime_table_.ops()) {
                                        -      op->InferShape(scope);
                                        -    }
                                        -  }
                                        -
                                        -  void Run(const framework::Scope& scope,
                                        -           const platform::Place& place) const override {
                                        -    PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first.");
                                        -    for (auto& op : runtime_table_.ops()) {
                                        -      op->Run(scope, place);
                                        -    }
                                        -  }
                                        -
                                        -  void CreateVariables(const framework::Scope& scope);
                                        -  void CreateOperators();
                                        -
                                        -  // some other necessary interfaces of NetOp are listed below
                                        -  // ...
                                        -
                                        -private:
                                        -  BlockDesc desc_;
                                        -  bool symbols_ready_{false};
                                        -};
                                        -
                                        -
                                        -
                                        -
                                        -

                                        The Execution of Blocks

                                        -

                                        Block inherits from OperatorBase, which has a Run method. -Block’s Run method will run its operators sequentially.

                                        -

                                        There is another important interface called Eval, which takes some arguments called targets and generates a minimal graph which treats targets as the end points and creates a new Block. After Run, Eval will get the latest value and return the targets.

                                        -

                                        The definition of Eval is as follows:

                                        -
                                        // clean a block description by targets using the corresponding dependency graph.
                                        -// return a new BlockDesc with minimal number of operators.
                                        -// NOTE: The return type is not a Block but the block's description so that this can be distributed
                                        -// to a cluster.
                                        -BlockDesc Prune(const BlockDesc& desc, vector<string> targets);
                                        -
                                        -void Block::Eval(const vector<string>& targets,
                                        -                 const framework::Scope& scope,
                                        -                 const platform::DeviceContext& dev_ctx) {
                                        -  BlockDesc min_desc = Prune(desc_, targets);
                                        -  Block min_block(min_desc);
                                        -  min_block.Run(scope, dev_ctx);
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/build_system/README.html b/develop/doc_cn/design/build_system/README.html deleted file mode 100644 index a41ff09495a1ff77e81bcda4618dbdcbbd818fa0..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/build_system/README.html +++ /dev/null @@ -1,409 +0,0 @@ - - - - - - - - - - - - - Required CMake Function — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -

                                        A few months ago when we were trying to replace CMake with Bazel, @emailweixu suggested that we rewrite those handy Bazel functions using CMake. Now it seems that it’s the right time to get this done, as we are facing problems from the porting of Majel and the development of new the parameter server using Go and C++.

                                        -

                                        Here are some initial thoughts. Your comments are welcome!

                                        -
                                        -

                                        Required CMake Function

                                        -

                                        I think we need only the following few CMake functions to make a project description mean and clean:

                                        -

                                        | C++ | CUDA C++ | Go | -|—|—|—| -| cc_library | nv_library | go_library | -| cc_binary | nv_binary | go_binary | -| cc_test | nv_test | go_test |

                                        -
                                          -
                                        • The _library functions generate .a files from source code.
                                        • -
                                        • The _binary functions generate executable binary files.
                                        • -
                                        • The _test functions generate executable unit test files. They work like _binary but links -lgtest and -lgtest_main.
                                        • -
                                        -

                                        The difference between nv_ functions and cc_ functions is that the former use nvcc instead of the system-default C++ compiler.

                                        -

                                        Both nv_ and cc_ functions enables C++11 (-std=c++11).

                                        -

                                        Also,

                                        -
                                          -
                                        • to describe external dependencies, we need external_library.
                                        • -
                                        • to build shared libraries, we need shared_library.
                                        • -
                                        -
                                        -
                                        -

                                        An Example Project

                                        -

                                        Suppose that we have aforementioned functions defined in our /cmake directory. The following example CMakeLists.txt describes a project including the following source files:

                                        -
                                          -
                                        • tensor.h
                                        • -
                                        • tensor.cc
                                        • -
                                        • tensor_test.cc
                                        • -
                                        • ops.h
                                        • -
                                        • ops.cu
                                        • -
                                        • ops_test.cu
                                        • -
                                        • api.go
                                        • -
                                        • api_test.go
                                        • -
                                        -

                                        Suppose that ops.cu depends on CUDNN.

                                        -
                                        # cc_binary parses tensor.cc and figures out that target also depend
                                        -# on tensor.h.
                                        -cc_binary(tensor
                                        -  SRCS
                                        -  tensor.cc)
                                        -
                                        -# The dependency to target tensor implies that if any of
                                        -# tensor{.h,.cc,_test.cc} is changed, tensor_test need to be re-built.
                                        -cc_test(tensor_test
                                        -  SRCS
                                        -  tensor_test.cc
                                        -  DEPS
                                        -  tensor)
                                        -
                                        -# I don't have a clear idea what parameters external_library need to
                                        -# have.  @gangliao as a CMake expert would have better ideas.
                                        -external_library(cudnn
                                        -  ....)
                                        -
                                        -# Suppose that ops.cu depends on external target CUDNN.  Also, ops.cu
                                        -# include global functions that take Tensor as their parameters, so
                                        -# ops depend on tensor.  This implies that if any of tensor.{h.cc},
                                        -# ops.{h,cu} is changed, ops need to be re-built.
                                        -nv_library(ops
                                        -  SRCS
                                        -  ops.cu
                                        -  DEPS
                                        -  tensor
                                        -  cudnn)  # cudnn is defined later.
                                        -
                                        -nv_test(ops_test
                                        -  SRCS
                                        -  ops_test.cu
                                        -  DEPS
                                        -  ops)
                                        -
                                        -# Because api.go defines a GO wrapper to ops and tensor, it depends on
                                        -# both.  This implies that if any of tensor.{h,cc}, ops.{h,cu}, or
                                        -# api.go is changed, api need to be re-built.
                                        -go_library(api
                                        -  SRCS
                                        -  api.go
                                        -  DEPS
                                        -  tensor # Because ops depend on tensor, this line is optional.
                                        -  ops)
                                        -
                                        -go_test(api_test
                                        -  SRCS
                                        -  api_test.go
                                        -  DEPS
                                        -  api)
                                        -
                                        -
                                        -# This builds libapi.so.  shared_library might use CMake target
                                        -# api_shared so to distinguish it from above target api.
                                        -shared_library(api
                                        -  DEPS
                                        -  api)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Implementation

                                        -

                                        As above example CMakeLists.txt executes, each function invocation adds “nodes” to a dependency graph. It also use this graph to generate CMake commands including add_executable, add_dependencies, target_link_libraries, and add_test.

                                        -
                                        -
                                        -

                                        Using Package Manager For Go

                                        -

                                        Building Go binaries and libraries need to satisfy their dependencies, generally -we can do go get ./... to download and compile all external dependencies. The -problems are:

                                        -
                                          -
                                        1. go get will always get the latest code from the default branch of the -remote repo, so changes of dependents might break the build. This is very -different with what we already have in cmake/external which download a -specific version or commit id of the dependency.
                                        2. -
                                        3. Some locations can not access external dependencies through the internet, as mentioned -in https://github.com/PaddlePaddle/Paddle/issues/2605. Using package management -tools can package the dependencies as a “vendor” package, which can be mirrored -at many cloud file hosting, so users what to compile paddle by themselves can -download this “vendor” package from a mirror site.
                                        4. -
                                        -
                                        -

                                        Choose A Suitable Tool

                                        -

                                        As mentioned by @wangkuiyi, Here -list dozens of Go package managers. We choose the tool using following principles:

                                        -
                                          -
                                        • Most “active” projects with more stars, more pull requests or commits
                                        • -
                                        • Widely used project
                                        • -
                                        -

                                        After comparing all these projects, we shall choose between the most popular -tools: Godep and Glide.

                                        -

                                        Here’s a brief comparison between Godep and Glide -: https://github.com/Masterminds/glide/wiki/Go-Package-Manager-Comparison. There are -also many complaints about using Godep. There’s also a new “official” pakcage -management tool has been started at: https://github.com/golang/dep to resolve -such problems, but it’s currently at Alpha stage. So the best choice now is -glide obviously.

                                        -
                                        -
                                        -

                                        Manage Go Packages

                                        -
                                          -
                                        • Dependencies: go/glide.yaml will store the dependencies and their versions which -is directly imported by paddle. go/glide.lock will store all dependencies recursively -with their commit id. Builds will “lock” to these packages if we don’t glide up -them
                                        • -
                                        • Vendor package: go/vendor directory will generated when running cmake command. cmake -will download the code corresponding to go/glide.lock. If we put a vendor folder -under go/, cmake will just check the commit id to the packages under the folder, -if commit id matches, there will be no download at all.
                                        • -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/README.html b/develop/doc_cn/design/cluster_train/README.html deleted file mode 100644 index 1ead3a4dc3c804b9745f0babea73998609e09647..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/README.html +++ /dev/null @@ -1,438 +0,0 @@ - - - - - - - - - - - - - Design Doc: Distributed Training — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Distributed Training

                                        -
                                        -

                                        Objective

                                        -

                                        In this slides, we explained that we’d like PaddlePaddle running on general-purpose clusters like those managed by Kubernetes, so to address demands for AI from both Internet and non-Internet industries.

                                        -

                                        This poses technical challenges to PaddlePaddle:

                                        -
                                          -
                                        1. Support fault-recovery.
                                        2. -
                                        3. Support both offline and online training.
                                        4. -
                                        5. Serverless computing of distributed training.
                                        6. -
                                        -
                                        -
                                        -

                                        Training Job

                                        -

                                        A training job will be created once user asks Paddle cloud to train a model. The training job is made up of different processes that collaboratively consume data and produce a trained model. There are three kinds of processes:

                                        -
                                          -
                                        1. the master server process, which dispatches tasks to
                                        2. -
                                        3. one or more trainer processes, which run distributed training and synchronize gradients/models via
                                        4. -
                                        5. one or more parameter server processes, where each holds a shard of the global model, and receive the uploaded gradients from every trainer process, so they can run the optimize functions to update their parameters.
                                        6. -
                                        -

                                        Their relation is illustrated in the following graph:

                                        -

                                        -

                                        By coordinating these processes, PaddlePaddle supports use both Synchronize Stochastic Gradient Descent (sync SGD) and Asynchronous Stochastic Gradient Descent (async SGD) to train user-defined neural network topologies.

                                        -

                                        When training with sync SGD, parameter servers wait for all trainers to finish gradients update and then send the updated parameters to trainers, training can not proceed until the trainer received the updated parameters. This creates a synchronization point between trainers. When training with async SGD, each trainer upload gradient and download new parameters individually, without the synchronization with other trainers. Using asyc SGD will be faster in terms of time per pass, but have more noise in gradient since trainers are likely to have a stale model.

                                        -
                                        -

                                        Master Server Process

                                        -

                                        The master server process will:

                                        -
                                          -
                                        • Partition a dataset into tasks and dispatch tasks to trainers.
                                        • -
                                        • Keep track of training progress on the dataset with task queue. A training job will iterate on the dataset for a full pass until it goes into next pass.
                                        • -
                                        -
                                        -

                                        Task

                                        -

                                        A task is a data shard to be trained. The total number of tasks will be much bigger than the total number of trainers. The number of data instances inside a task will be much bigger than the mini-batch size.

                                        -
                                        -
                                        -

                                        Task Queue

                                        -

                                        The master server has three task queues to track training progress. As illustrated in the graph below, Job A and Job B both have one master server. Each master server process has three task queues.

                                        -

                                        -
                                          -
                                        • The todo queue holds tasks to be dispatched. When a job starts, the master server fills in the todo queue with all tasks.
                                        • -
                                        • The pending queue holds tasks that are currently training by trainers.
                                        • -
                                        • the done queue holds tasks that are already trained.
                                        • -
                                        -

                                        The life cycle of a single task is illustrated below:

                                        -

                                        -
                                          -
                                        1. When a new pass of training starts, all tasks will be placed in the todo queue.
                                        2. -
                                        3. Upon trainer requests for new task, the master server will dispatch a task from todo queue to it, put the task in the pending queue and wait for completion.
                                        4. -
                                        5. The trainer will work on its task and tell the master server once the task is completed and ask for new task. The master server will dispatch a new task to that trainer.
                                        6. -
                                        7. If a task fails for any reason in trainer, or takes longer than a specific period of time, the master server will move the task back to the todo queue. The timeout count for that task will increase by one. If the timeout count is above a threshold, the task is likely to cause a trainer to crash, then it will be discarded.
                                        8. -
                                        9. The master server will move completed task to the done queue. When the todo queue is empty, the master server will start a new pass by moving all tasks in the done queue to todo queue and reset the timeout counter of all tasks to zero.
                                        10. -
                                        -
                                        -
                                        -
                                        -

                                        Trainer Process

                                        -

                                        The trainer process will:

                                        -
                                          -
                                        • Request tasks from the master.
                                        • -
                                        • Work on the tasks
                                        • -
                                        • Upload gradient to parameter servers, and update local model by downloading new parameters from parameter servers.
                                        • -
                                        -
                                        -
                                        -

                                        Parameter Server Process

                                        -

                                        Parameter server processes hold the parameters collaboratively. The parameters are partitioned on different parameter servers.

                                        -

                                        The parameter server will:

                                        -
                                          -
                                        • Receive gradient from the trainers, update its parameters, and give the trainers the latest parameters.
                                        • -
                                        • Periodically save its parameters to distributed file system by overriding the previous save.
                                        • -
                                        -
                                        -
                                        -

                                        Optimization Algorithms

                                        -

                                        The communication pattern between the trainers and the parameter servers depends on the category of optimization algorithm:

                                        -
                                          -
                                        • Synchronous Stochastic Gradient Descent (sync-SGD)

                                          -

                                          Parameter server will wait for all trainer finish n-th mini-batch calculation and send their gradients before broadcasting new parameters to every trainer. Every trainer will wait for the new parameters before starting n+1-th mini-batch.

                                          -
                                        • -
                                        • Asynchronous Stochastic Gradient Descent (async-SGD)

                                          -

                                          There will no synchronization between different trainers, and parameter server updates its parameter as soon as it receives new gradient:

                                          -
                                            -
                                          • Each trainer uploads its accumulated gradient every n mini-batches.
                                          • -
                                          • Every m mini-batches, the trainer downloads new parameters from parameter server.
                                          • -
                                          • n and m do not have to be equal.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Fault Tolerant

                                        -

                                        The training job will pause if the master server processes is dead, or any of the parameter server process is dead. They will be started by Kubernetes and recover in few minutes. Please refer to fault recovery.

                                        -

                                        The training job will continue to make progress if there is at least one training process running. The strategy depends on the type of optimization algorithm:

                                        -
                                          -
                                        • sync-SGD

                                          -

                                          TODO

                                          -
                                        • -
                                        • async-SGD

                                          -

                                          Since async-SGD does not require synchronization between mini-batches, the system will by definition make process if at least one trainer is running.

                                          -
                                        • -
                                        -
                                        -
                                        -

                                        Fault Recovery

                                        -

                                        PaddlePaddle uses etcd to keep track of the states of processes. Because etcd is a distributed reliable key-value store, the restarted process can recover its states from etcd. The model parameters are periodically saved into distributed file system, so a restarted parameter server can recover its parameters from the saved file.

                                        -

                                        Now we will introduce how each process recovers from a failure, the graph below shows how etcd is used:

                                        -

                                        -
                                        -

                                        Master Server Process

                                        -

                                        When the master is started by the Kubernetes, it executes the following steps at startup:

                                        -
                                          -
                                        1. Grabs a unique master lock in etcd, which prevents concurrent master instantiations.
                                        2. -
                                        3. Recovers the task queues from etcd if they already exist, otherwise, the master will create them.
                                        4. -
                                        5. Write its ip address to /master/addr so that trainers can discover it.
                                        6. -
                                        7. Listens to trainers’ request of task, dispatch one upon request, and updates task queue using an etcd transaction to ensure lock is held during the update.
                                        8. -
                                        -

                                        When the master server process is dead for any reason, Kubernetes will restart it. It will be online again with all states recovered from etcd in few minutes.

                                        -
                                        -
                                        -

                                        Trainer Process

                                        -

                                        When the trainer is started by the Kubernetes, it executes the following steps at startup:

                                        -
                                          -
                                        1. Watches the available parameter server prefix keys /ps/ on etcd and waits until the count of parameter servers reaches the desired count /ps_desired.
                                        2. -
                                        3. Finds and watches /master/addr to get master’s address.
                                        4. -
                                        5. Requests for tasks from the master to start training.
                                        6. -
                                        -

                                        When a trainer fails, Kuberentes would try to restart it. The recovered trainer would fetch tasks from master and go on training.

                                        -
                                        -
                                        -

                                        Parameter Server Process

                                        -

                                        When the parameter server is started by Kubernetes, it executes the following steps at startup:

                                        -
                                          -
                                        1. Read desired total number of parameter servers from etcd /ps_desired

                                          -
                                        2. -
                                        3. Search through etcd keys /ps/<index> (/ps/0, /ps/1, ...) to find the first non-existant key whose index is smaller than the total number of parameter servers. Set the key using a transaction to avoid concurrent writes. The parameter server’s index is inferred from the key name.

                                          -

                                          The desired number of parameter servers is 3:

                                          -

                                          -

                                          The third parameter server joined:

                                          -

                                          -
                                        4. -
                                        5. The parameter server can load parameters if there are already saved parameters in the save path (inferred from its index).

                                          -
                                        6. -
                                        7. Now the parameter server is ready for the trainers’ requests.

                                          -
                                        8. -
                                        -

                                        If the parameter server’s etcd lease expires, the parameter server will kill itself.

                                        -
                                        -
                                        -
                                        -

                                        Parameter Server Checkpointing

                                        -

                                        See here

                                        -
                                        -
                                        -

                                        Store and dispatching trainning data

                                        -

                                        See here

                                        -
                                        -
                                        -

                                        Dynamic Scaling

                                        -
                                        -

                                        Trainer Scaling

                                        -

                                        TODO

                                        -
                                        -
                                        -

                                        Parameter Server Scaling

                                        -

                                        Not planned for v1.

                                        -
                                        -
                                        -
                                        -

                                        Training Dataset Format

                                        -

                                        TODO

                                        -
                                        -
                                        -

                                        User Interface

                                        -

                                        TODO

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/checkpointing.html b/develop/doc_cn/design/cluster_train/checkpointing.html deleted file mode 100644 index 1cb698e6a9cbdc765bd33cc9f0a6756d6764cb96..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/checkpointing.html +++ /dev/null @@ -1,313 +0,0 @@ - - - - - - - - - - - - - 模型参数检查点(Checkpointing) — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        模型参数检查点(Checkpointing)

                                        -

                                        模型数据检查点的实现,可以有效的避免parameter server的单点或多点同时故障。模型参数检查点通过定期向磁盘上保存一份存储在parameter server内存中的模型数据的完整镜像,来保证训练过程可以从中间状态重新启动。在一个不可中断并缺少备份的训练任务中,可以通过阶段性的保存每个parameter server的数据快照(snapshot)到 分布式存储服务 达到容灾的目的,比如每隔10分钟最新的快照,并删除更早的快照。在出现单点故障时,只需要恢复这台节点,或者将这台节点迁移到另一个节点并启动即可恢复训练任务。

                                        -

                                        -
                                        -

                                        快照保存的设计如下:

                                        -

                                        说明:

                                        -
                                          -
                                        • parameter server在集群中启动后,自动挂载分布式存储目录,并把快照保存到这个目录下。
                                        • -
                                        • 注:每个parameter server的检查点各自独立保存,暂时不考虑多个parameter server同步的保存一个特定时间点的全局检查点,因为这样做也没法保证消除随机性。
                                        • -
                                        -

                                        检查点保存程序流程:

                                        -
                                          -
                                        1. 如果满足条件”每隔10分钟”时,parameter server会获取parameters内存的read_lock,启动一个新的线程开始保存检查点。如果已经正在执行保存检查点的线程,则忽略。由于对parameters的更新需要获取parameters内存的write_lock,所以在写入快照的过程中,parameter server会暂停参数更新并等待。
                                        2. -
                                        3. parameter server生成一个UUID,向指定的目录中一个新的文件(文件名为此UUID)写入快照数据。在快照写入完成后,计算这个文件的MD5 sum。然后在etcd的/checkpoints/[pserver_id]中写入json内容:{"uuid": [UUID], "md5", "MD5 sum", "timestamp": xxxx}
                                        4. -
                                        5. 删除磁盘目录中不是当前uuid的快照文件。
                                        6. -
                                        7. 释放对paramters内存的锁定,停止保存检查点的线程。
                                        8. -
                                        -

                                        这里需要用户额外注意,在您的实际环境中,训练任务的运行可能会占满trainer和parameter server之间的网络带宽,如果parameter server此时还需要通过网络访问分布式存储以保存快照,可能会造成网络拥塞,而出现阶段性的运行停滞。

                                        -
                                        -
                                        -

                                        从快照恢复

                                        -

                                        在parameter server第一次启动或任意时间parameter server故障后被Kubernetes重新启动,则需要回滚到上一个检查点:

                                        -
                                          -
                                        1. 从etcd中读取节点:/checkpoints/[pserver_id]获取最新的检查点的文件uuid
                                        2. -
                                        3. 从磁盘文件中加载uuid文件名的检查点快照文件,并加载其中的参数
                                        4. -
                                        5. 如果上面两步出现错误,则使用启动参数定义的初始化方法初始化参数
                                        6. -
                                        7. 开始提供服务
                                        8. -
                                        -
                                        -
                                        -
                                        -

                                        TODO List

                                        -
                                        -

                                        推测执行/加速执行(TODO)

                                        -

                                        在异构集群中,如果存在某些trainer执行速度过慢会影响整体集群的速度(如图中Trainer 1),此时master将负责启动一个新的Trainer(Accelerate Trainer 2),使用同样的训练数据block。哪个trainer先完成block的训练,则把另一个慢速的kill掉。

                                        -
                                        -
                                        -

                                        动态扩容/缩容

                                        -

                                        目前只考虑动态扩容trainer数量,可以减小系统复杂性。

                                        -
                                        -
                                        -
                                        -

                                        术语

                                        -
                                          -
                                        • model: 指深度学习训练之后得到的所有参数,使用这个神经网络可以完成对新数据的预测
                                        • -
                                        • parameters: 神经网络中的参数,包括权重w和偏置b。一个神经网络的模型由大量的参数组成
                                        • -
                                        • shard: 分片,通常指将一个整体拆分成多份的其中的一份。
                                        • -
                                        • model shard: 将一个神经网络参数拆分成多份,每个shard分别存储在其中一台parameter server之上
                                        • -
                                        • parameter block: 多个parameter block构成一个model shard
                                        • -
                                        • 单点故障: 任意时刻只可能同时有一台服务器故障。由于集群中同时存在两台机器故障的概率极低((平均故障率*平均故障修复时间)^2)只对特殊在线系统考虑两台以上同时故障的容灾。
                                        • -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/data_dispatch.html b/develop/doc_cn/design/cluster_train/data_dispatch.html deleted file mode 100644 index f5edd01c514c82af16d1bb1316ae6e341bfbea1c..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/data_dispatch.html +++ /dev/null @@ -1,414 +0,0 @@ - - - - - - - - - - - - - 训练数据的存储和分发 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        训练数据的存储和分发

                                        -
                                        -

                                        概念解释

                                        -
                                        -
                                        -

                                        流程介绍

                                        -

                                        生产环境中的训练数据集通常体积很大,并被存储在诸如Hadoop HDFS,Ceph,AWS S3之类的分布式存储之上。这些分布式存储服务通常会把数据切割成多个分片分布式的存储在多个节点之上。这样就可以在云端执行多种数据类计算任务,包括:

                                        -
                                          -
                                        • 数据预处理任务
                                        • -
                                        • Paddle训练任务
                                        • -
                                        • 在线模型预测服务
                                        • -
                                        -
                                        - -

                                        在上图中显示了在一个实际生产环境中的应用(人脸识别)的数据流图。生产环境的日志数据会通过实时流的方式(Kafka)和离线数据的方式(HDFS)存储,并在集群中运行多个分布式数据处理任务,比如流式数据处理(online data process),离线批处理(offline data process)完成数据的预处理,提供给paddle作为训练数据。用户也可以上传labeled data到分布式存储补充训练数据。在paddle之上运行的深度学习训练输出的模型会提供给在线人脸识别的应用使用。

                                        -
                                        -
                                        -

                                        训练数据存储

                                        -

                                        我们选择CephFS作为存储系统。

                                        -
                                          -
                                        • 无论是从PFSClient的角度,还是从Pod中运行任务的角度,统一用/pfs/$DATACENTER/home/$USER来访问用户自己的数据。
                                        • -
                                        • /pfs/$DATACENTER/common下存放公共数据集合
                                            -
                                          • 做只读挂载
                                          • -
                                          -
                                        • -
                                        -
                                        - -
                                        -
                                        -

                                        文件预处理

                                        -

                                        在开始训练之前, 数据集需要预先被转换成PaddlePaddle分布式训练使用的存储格RecordIO。我们提供两个转换方式:

                                        -
                                          -
                                        1. 用户在本地转换好再上传
                                        2. -
                                        3. 用户上传数据后,在机群上运行转换程序
                                        4. -
                                        -

                                        转换生成的文件名会是以下格式:

                                        -
                                        name_prefix-aaaaa-of-bbbbb
                                        -
                                        -
                                        -

                                        “aaaaa”和”bbbbb”都是五位的数字,每一个文件是数据集的一个shard,”aaaaa”代表shard的index,”bbbbb”代表这个shard的最大index。

                                        -

                                        比如ImageNet这个数据集可能被分成1000个shard,它们的文件名是:

                                        -
                                        imagenet-00000-of-00999
                                        -imagenet-00001-of-00999
                                        -...
                                        -imagenet-00999-of-00999
                                        -
                                        -
                                        -
                                        -

                                        转换库

                                        -

                                        无论是在本地或是云端转换,我们都提供Python的转换库,接口是:

                                        -
                                        def convert(output_path, reader, num_shards, name_prefix)
                                        -
                                        -
                                        -
                                          -
                                        • output_path: directory in which output files will be saved.
                                        • -
                                        • reader: a data reader, from which the convert program will read data instances.
                                        • -
                                        • num_shards: the number of shards that the dataset will be partitioned into.
                                        • -
                                        • name_prefix: the name prefix of generated files.
                                        • -
                                        -

                                        reader每次输出一个data instance,这个instance可以是单个值,或者用tuple表示的多个值:

                                        -
                                        yield 1 # 单个值
                                        -yield numpy.random.uniform(-1, 1, size=28*28) # 单个值
                                        -yield numpy.random.uniform(-1, 1, size=28*28), 0 # 多个值
                                        -
                                        -
                                        -

                                        每个值的类型可以是整形、浮点型数据、字符串,或者由它们组成的list,以及numpy.ndarray。如果是其它类型,会被Pickle序列化成字符串。

                                        -
                                        -
                                        -
                                        -

                                        示例程序

                                        -
                                        -

                                        使用转换库

                                        -

                                        以下reader_creator生成的reader每次输出一个data instance,每个data instance包涵两个值:numpy.ndarray类型的值和整型的值:

                                        -
                                        def reader_creator():
                                        -    def reader():
                                        -        for i in range(1000):
                                        -            yield numpy.random.uniform(-1, 1, size=28*28), 0 # 多个值
                                        -    return reader
                                        -
                                        -
                                        -

                                        reader_creator生成的reader传入convert函数即可完成转换:

                                        -
                                        convert("./", reader_creator(), 100, random_images)
                                        -
                                        -
                                        -

                                        以上命令会在当前目录下生成100个文件:

                                        -
                                        random_images-00000-of-00099
                                        -random_images-00001-of-00099
                                        -...
                                        -random_images-00099-of-00099
                                        -
                                        -
                                        -
                                        -
                                        -

                                        进行训练

                                        -

                                        PaddlePaddle提供专用的data reader creator,生成给定RecordIO文件对应的data reader。无论在本地还是在云端,reader的使用方式都是一致的

                                        -
                                        # ...
                                        -reader = paddle.reader.creator.RecordIO("/pfs/datacenter_name/home/user_name/random_images-*-of-*")
                                        -batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128)
                                        -trainer.train(batch_reader, ...)
                                        -
                                        -
                                        -

                                        以上代码的reader输出的data instance与生成数据集时,reader输出的data instance是一模一样的。

                                        -
                                        -
                                        -
                                        -

                                        上传训练文件

                                        -

                                        使用下面命令,可以把本地的数据上传到存储集群中。

                                        -
                                        paddle pfs cp filename /pfs/$DATACENTER/home/$USER/folder/
                                        -
                                        -
                                        -

                                        比如,把之前示例中转换完毕的random_images数据集上传到云端的/home/可以用以下指令:

                                        -
                                        paddle pfs cp random_images-*-of-* /pfs/$DATACENTER/home/$USER/folder/
                                        -
                                        -
                                        -

                                        需要$DATACENTER的配置写到配置文件中,例如

                                        -
                                        # config file
                                        -[datacenter_1]
                                        -username=user
                                        -usercert=user.pem
                                        -userkey=user-key.pem
                                        -endpoint=datacenter1.paddlepaddle.org
                                        -
                                        -[datacenter_2]
                                        -username=user
                                        -usercert=user.pem
                                        -userkey=user-key.pem
                                        -endpoint=datacenter2.paddlepaddle.org
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        TODO

                                        -
                                        -

                                        文件访问的权限

                                        -

                                        控制用户权限

                                        -
                                          -
                                        • 用户可以把自己的数据分享给别人
                                        • -
                                        -
                                        -
                                        -

                                        文件访问方式

                                        -

                                        不用mount的方式来访问数据,而是直接用API的接口远程访问

                                        -

                                        例如:

                                        -
                                        f = open('/pfs/datacenter_name/home/user_name/test1.dat')
                                        -
                                        -
                                        -
                                        -
                                        -

                                        支持用户自定义的数据预处理job

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/large_model_dist_train.html b/develop/doc_cn/design/cluster_train/large_model_dist_train.html deleted file mode 100644 index fd10e09a64268ab0fab2d555305b02349da0daec..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/large_model_dist_train.html +++ /dev/null @@ -1,351 +0,0 @@ - - - - - - - - - - - - - Alalysis of large model distributed training in Paddle — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Alalysis of large model distributed training in Paddle

                                        -

                                        NOTE: This is only some note for how we implemeted this scheme in V1, not a new design.

                                        -
                                        -

                                        What is it

                                        -

                                        We often encounter cases that the embedding layer parameters(sparse) are so large that we can not store it in the trainer’s memory when training. So we need to put them to several servers, and fetch them row by row instead of fetch all of the parameters.

                                        -
                                        -
                                        -

                                        How to use

                                        -

                                        Specify command-line argument like --loadsave_parameters_in_pserver=true --ports_num_for_sparse=1 --use_old_updater=1 when starting the paddle trainer. And also add something like --ports_num_for_sparse=1 --pserver_num_threads=5 when starting pserver processes.

                                        -

                                        Accrodingly, configure your embedding layers like:

                                        -
                                        SPARSE_REMOTE=True
                                        -
                                        -w1 = data_layer(name="w1", size=dict_size)
                                        -emb1 = embedding_layer(input=w1, size=32, param_attr=ParameterAttribute(sparse_update=SPARSE_REMOTE))
                                        -w2 = data_layer(name="w2", size=dict_size)
                                        -emb2 = embedding_layer(input=w2, size=32, param_attr=ParameterAttribute(sparse_update=SPARSE_REMOTE))
                                        -...
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Implementation details

                                        -
                                        enum MatType {
                                        -  MAT_NORMAL,
                                        -  MAT_NORMAL_SHARED,
                                        -  MAT_VALUE_SHARED,
                                        -  MAT_SPARSE_ROW_IDS,
                                        -  MAT_SPARSE_ROW_AUTO_GROW,
                                        -  MAT_CACHE_ROW,
                                        -  MAT_SPARSE_ROW,
                                        -  MAT_SPARSE_ROW_PREFETCH,
                                        -  MAT_SPARSE_ROW_PREFETCH_FULL_SIZE,
                                        -};
                                        -
                                        -
                                        -

                                        MAT_SPARSE_ROW_PREFETCH is what we use when configured to fetch only row of matrix when training.

                                        -

                                        In trainer_internal.cpp:L93 trainOneBatch:

                                        -
                                          if (config_->getOptConfig().use_sparse_remote_updater()) {
                                        -    REGISTER_TIMER("prefetch");
                                        -    gradientMachine_->prefetch(inArgs);
                                        -    parameterUpdater_->getParametersRemote();
                                        -  }
                                        -
                                        -
                                        -

                                        When doing actual network forward and backward, at the beginning of each batch, the trainer will try to download one row of data from pserver.

                                        -

                                        In trainer/RemoteParameterUpdater.cpp: parameterUpdater_->getParametersRemote();:

                                        -
                                        if (fullSize) {
                                        -    ...
                                        -} else {
                                        -getParams = [&] {
                                        -    parameterClient_->getParameterSparse(
                                        -        /* recvParameterType= */ PARAMETER_VALUE, sendBackParameterType);
                                        -};
                                        -applyL1 = [](Parameter& para, real decayRate) {
                                        -    para.getMat(PARAMETER_VALUE)->applyL1(/*lr=*/1.0f, decayRate);
                                        -};
                                        -}
                                        -
                                        -
                                        -

                                        Calling parameterClient_->getParameterSparse will do remote call to pserver’s getParameterSparse:

                                        -
                                        void ParameterServer2::getParameterSparse(const SendParameterRequest& request,
                                        -                                          std::vector<Buffer>& inputBuffers,
                                        -                                          SendParameterResponse* response,
                                        -                                          std::vector<Buffer>* outputBuffers) {
                                        -  (void)inputBuffers;
                                        -  auto& buffer = *readWriteBuffer_;
                                        -  size_t numReals = 0;
                                        -  for (const auto& block : request.blocks()) {
                                        -    numReals += getParameterConfig(block).dims(1);
                                        -  }
                                        -  buffer.resize(numReals);
                                        -
                                        -  VLOG(3) << "pserver: getParameterSparse, numReals=" << numReals;
                                        -
                                        -  ReadLockGuard guard(parameterMutex_);
                                        -  size_t offset = 0;
                                        -  for (const auto& block : request.blocks()) {
                                        -    size_t width = getParameterConfig(block).dims(1);
                                        -    Buffer buf = {buffer.data() + offset, width};
                                        -    int type = request.send_back_parameter_type();
                                        -    sendBackParameterSparse(block, type, response, &buf, width, outputBuffers);
                                        -    offset += width;
                                        -  }
                                        -}
                                        -
                                        -
                                        -

                                        getParameterConfig(block).dims(1) returns the width of the current “parameter block”(a shard of parameter object), -then getParameterSparse remote call returns only one row of data to the client.

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/master_server.html b/develop/doc_cn/design/cluster_train/master_server.html deleted file mode 100644 index d731ffebb6a11e304d2235f39c70c02f8aa515ca..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/master_server.html +++ /dev/null @@ -1,349 +0,0 @@ - - - - - - - - - - - - - Design Doc: Master Server — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Master Server

                                        -

                                        For an overview of master server’s role, please refer to distributed training design doc. In this design doc we will discuss the master server in more details. The master will be implemented in Go.

                                        -
                                        -

                                        Dataset

                                        -

                                        -

                                        A dataset is a list of files in RecordIO format. A RecordIO file consists of chunks, whereas each chunk consists some records.

                                        -
                                        -
                                        -

                                        Task Queue

                                        -

                                        As mentioned in distributed training design doc, a task is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple chunks from one or multiple files. The master server maintains task queues to track the training progress.

                                        -
                                        -

                                        Task Queue Creation

                                        -
                                          -
                                        1. Each trainer will make an RPC call (using Go’s rpc package) to the master server, telling it the RecordIO files representing the dataset specified by the user. Since every trainer will tell the master server the same dataset, only the first RPC call will be honored.

                                          -

                                          The RPC interface is:

                                          -
                                          func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error {
                                          -}
                                          -
                                          -
                                          -
                                        2. -
                                        3. The master server will scan through each RecordIO file to generate the chunk index and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.

                                          -

                                          The definition of the chunk is:

                                          -
                                          type Chunk struct {
                                          -    Idx   int // index of the chunk within the file
                                          -    Path  string
                                          -    Index recordio.Index // chunk index
                                          -}
                                          -
                                          -
                                          -
                                        4. -
                                        5. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.

                                          -

                                          The definition of the task is:

                                          -
                                          type Task struct {
                                          -    Index  int
                                          -    Chunks []Chunk
                                          -}
                                          -
                                          -
                                          -

                                          The elements in the tasks queues is of type TaskEntry, containing a timeout counter (described in task retry logic), and a task:

                                          -
                                          type TaskEntry struct {
                                          -    NumTimeout int
                                          -    Task       Task
                                          -}
                                          -
                                          -
                                          -

                                          The definition of task queues is:

                                          -
                                          type TaskQueues struct {
                                          -    Todo    []TaskEntry
                                          -    Pending map[int]TaskEntry // map from task index to task entry
                                          -    Done    []TaskEntry
                                          -}
                                          -
                                          -
                                          -
                                        6. -
                                        -
                                        -
                                        -

                                        Task Queue Persistence

                                        -

                                        The task queues need to be persisted on etcd for fault recovery. Since the task queues only change once a task is completed or timed out, which is not very frequent, we can afford to synchronize with etcd every time the task queues change.

                                        -

                                        We will serialize the task queues data structure with gob encoding, compress with gzip, and save into etcd synchronously under key /task_queues.

                                        -
                                        -
                                        -

                                        Task Dispatch

                                        -

                                        The trainer will make an RPC call to master to get a new task when:

                                        -
                                          -
                                        • the trainer first started, or
                                        • -
                                        • the trainer finishes a task.
                                        • -
                                        -

                                        The RPC interface is:

                                        -
                                        func (m *RPCServer) GetTask(finished *Task, result *Task) error {
                                        -}
                                        -
                                        -
                                        -

                                        Argument finished will be nil when the trainer is just started.

                                        -

                                        During the RPC call the master will do the following:

                                        -
                                          -
                                        • Make a copy of the task queues, and update the copy reflecting the finished tasks and the new pending tasks.
                                        • -
                                        • Synchronize the copy of task queues with etcd using a transaction conditioned on holding the master lock.
                                        • -
                                        • Replace the task queues with the copy and report to the trainer with the new tasks if succeeded, or discard the copy and report the error to the trainer if failed.
                                        • -
                                        -
                                        -
                                        -

                                        Task Retry Logic

                                        -

                                        When a task is dispatched to the trainer, the master will schedule a function for execution after the timeout duration (based on the moving average of task completion time). If the task entry in still in the pending queue, its timeout counter will increase by one, and the task will be moved to todo queue. If the timeout counter is above the threshold, the master will log the error and discard the task.

                                        -

                                        Please note that since a timed out task could be completed after it has been dispatched for retry, so it is possible for a task to be processed multiple times. We do not try to prevent it from happening since it’s fine to train on the same task multiple times due to the stochastic nature of the stochastic gradient decent algorithm.

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/pserver_client.html b/develop/doc_cn/design/cluster_train/pserver_client.html deleted file mode 100644 index 9a53cc4743b0a6253bec5c26ce7fba5b64041a51..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/pserver_client.html +++ /dev/null @@ -1,426 +0,0 @@ - - - - - - - - - - - - - Design Doc: The Client Library of Parameter Server — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: The Client Library of Parameter Server

                                        -

                                        For an overview of trainer’s role, please refer to distributed training design doc. In this design doc, we will discuss the parameter server’s client library, which will manage communication with parameter servers. The library will be implemented in Go and made available as a static or dynamic library with a C header file.

                                        -
                                        -

                                        Parameter Partition

                                        -

                                        Each parameter will be partitioned into parameter blocks to make the parameters evenly distributed on parameter servers. The partition is done automatically by the client library. The sparse parameter require a little different treatment:

                                        -
                                        -

                                        Sparse Parameter

                                        -

                                        The sparse parameter is a parameter that is updated sparsely. The name is somewhat misleading, it does not have a sparse representation, it has the same representation as a dense vector.

                                        -

                                        Because a sparse parameter is updated sparsely, the trainer will have to partition the sparse parameter. Because the parameter server will merge all sparse parameter shard into the same file when saving the parameter. It needs special naming convention:

                                        -

                                        If a sparse parameter is partitioned into n shards, they should be named as:

                                        -
                                        name:sparse-0
                                        -name:sparse-1
                                        -...
                                        -name:sparse-n-1
                                        -
                                        -
                                        -

                                        The library is unaware of the partition, and treat each parameter independently. Only when saving parameters, the parameter servers will merge the sparse parameters according to the naming convention.

                                        -
                                        -
                                        -
                                        -

                                        Model Optimization Using Gradients

                                        -

                                        There are two ways to perform model optimization using gradients:

                                        -
                                          -
                                        • On Client

                                          -

                                          The client does multiple steps of forward and backward update. In each step, the gradients are calculated and a new model is generated. After some steps, the client will calculate the difference between the newest model and the old model at step 0. The difference will be updated to parameter servers. Parameter servers will just update parameters using the difference without any optimization using gradients (such as Adam and L1 regularization).

                                          -
                                        • -
                                        • On Parameter Server

                                          -

                                          The client will send accumulated gradients to parameter servers, the parameter server will do the optimization using gradients.

                                          -
                                        • -
                                        -
                                        -
                                        -

                                        L1 and L2 Regularization

                                        -

                                        PaddlePaddle allows L1 or L2 regularizations to be specified per parameter, so when the trainer initializes the parameter it needs include a parameter configuration when L1 or L2 regularization is necessary.

                                        -
                                        -
                                        -

                                        Parameter Initialization

                                        -

                                        The parameters on parameter servers need to be initialized. To provide maximum flexibility, the trainer will initialize the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.

                                        -
                                        -

                                        Trainer Selection

                                        -

                                        To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:

                                        -

                                        -
                                        -
                                        -

                                        Trainer Selection Process

                                        -

                                        The trainer select process is encapsulated in the C API function:

                                        -
                                        int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
                                        -
                                        -
                                        -

                                        The selected trainer’s call to paddle_begin_init_params will return with 1, and the other trainers’ call to paddle_begin_init_params will return 0. paddle_get_params will be blocked until initialization is completed. As illustrated below:

                                        -

                                        -
                                        -
                                        -
                                        -

                                        C Interface

                                        -
                                        typedef enum {
                                        -  PADDLE_ELEMENT_TYPE_INT32   = 0,
                                        -  PADDLE_ELEMENT_TYPE_UINT32  = 1,
                                        -  PADDLE_ELEMENT_TYPE_INT64   = 2,
                                        -  PADDLE_ELEMENT_TYPE_UINT64  = 3,
                                        -  PADDLE_ELEMENT_TYPE_FLOAT32 = 4,
                                        -  PADDLE_ELEMENT_TYPE_FLOAT64 = 5,
                                        -} paddle_element_type;
                                        -
                                        -typedef struct {
                                        -  char*               name;
                                        -  paddle_element_type element_type;
                                        -  unsigned char*      content;
                                        -  int                 content_len;
                                        -} paddle_parameter, paddle_gradient;
                                        -
                                        -typedef int paddle_pserver_client;
                                        -
                                        -/**
                                        - * @brief creates a pserver client that talks to etcd for coordination.
                                        - */
                                        -paddle_pserver_client paddle_new_etcd_pserver_client(char* etcd_addr);
                                        -
                                        -/**
                                        - * @brief creates a pserver client given pserver addresses.
                                        - *
                                        - * @param pserver_addrs comma-separated pserver addresses.
                                        - * @param selected if current pserver client is selected to initialize all parameter servers.
                                        - */
                                        -paddle_pserver_client paddle_new_pserver_client(char* pserver_addrs, int selected);
                                        -void paddle_pserver_client_release(paddle_pserver_client c);
                                        -
                                        -/**
                                        - * @brief paddle_begin_init_params begins to initialize parameters on
                                        - * parameter servers.
                                        - *
                                        - * paddle_begin_init_params will be called from multiple trainers,
                                        - * only one trainer will be selected to initialize the parameters on
                                        - * parameter servers. Other trainers need to get the initialized
                                        - * parameters from parameter servers using @paddle_get_params.
                                        - *
                                        - * @return 1 if the trainer is selected to initialize parameter
                                        - * servers, otherwise 0.
                                        - */
                                        -int paddle_begin_init_params(paddle_pserver_client client);
                                        -
                                        -/**
                                        - * @brief paddle_init_param initializes the parameter on parameter
                                        - * servers.
                                        - *
                                        - * @param param the parameter to initialize.
                                        - * @param param_config_proto the configuration for the parameter.
                                        - * @param config_len the length of param_config_proto
                                        - * @return 0 if successful, otherwise -1. On failure, the trainer
                                        - * needs to restart the entire initialization process (starting from
                                        - * @paddle_begin_init_param). Or simply exit the program and wait for
                                        - * the cluster management system to restart the trainer.
                                        - */
                                        -int paddle_init_param(paddle_pserver_client client, paddle_parameter param, const unsigned char* param_config_proto, int config_len);
                                        -
                                        -/**
                                        - * @brief paddle_finish_init_params tells parameter servers client has
                                        - * sent all parameters to parameter servers as initialization.
                                        - *
                                        - * @return 0 if successful, otherwise -1. On failure, the trainer
                                        - * needs to restart the entire initialization process (starting from
                                        - * @paddle_begin_init_param). Or simply exit the program and wait for
                                        - * the cluster management system to restart the trainer.
                                        - */
                                        -int paddle_finish_init_params(paddle_pserver_client client);
                                        -
                                        -/**
                                        - * @brief paddle_send_grads sends gradients to parameter servers for
                                        - * updating parameters.
                                        - *
                                        - * @param grads the array of gradients to send.
                                        - * @param len the length of the gradient array.
                                        - * @param learning_rate the learning rate for the gradients.
                                        - * @return 0 if successful, otherwise -1.
                                        - */
                                        -int paddle_send_grads(paddle_pserver_client client, const paddle_gradient* grads, int len);
                                        -
                                        -/**
                                        - * @brief paddle_get_params gets parameters from parameter servers.
                                        - *
                                        - * paddle_get_params will block until parameters are initialized on
                                        - * the parameter servers.
                                        - *
                                        - * @param dst the destination array of parameter pointers to save to.
                                        - * The parameter pointer must be pre-popullated with required parameter name,
                                        - * and the content of parameter must be pre-allocated of the size of required
                                        - * parameter on pserver.
                                        - * @param len the length of the names array and the paddle_parameter
                                        - * array.
                                        - * @return 0 if successful, otherwise -1.
                                        - */
                                        -int paddle_get_params(paddle_pserver_client client, paddle_parameter** dst, int len);
                                        -
                                        -/**
                                        - * @brief paddle_save_model indicates parameters to save the parameter
                                        - * to the given path
                                        - *
                                        - * @param path the path to save parameters.
                                        - * @return 0 if successful, otherwise -1.
                                        - */
                                        -int paddle_save_model(paddle_pserver_client client, const char* path);
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/remote_parameter_updater.html b/develop/doc_cn/design/cluster_train/remote_parameter_updater.html deleted file mode 100644 index 3d5853d0cf9ca2662a73810a763178be36505f7d..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/remote_parameter_updater.html +++ /dev/null @@ -1,281 +0,0 @@ - - - - - - - - - - - - - Design Doc: Remote Parameter Updater for Cluster Train — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Remote Parameter Updater for Cluster Train

                                        -

                                        For an overview of distribute training, please refer to distributed training design doc. In this design doc, we will discuss the parameter updater that will use parameter server cclient The Client Library of Parameter Server Design Doc to manage and update parameters.

                                        -
                                        -

                                        Parameter Updater

                                        -

                                        Parameter Updater is used by trainer to manage and update parameter, there are mainly two kind of parameter updater: local and remote, since this design is for cluster train, we will only discuss remote parameter updater here.

                                        -
                                        -

                                        Remote Parameter Updater

                                        -

                                        Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver(The Client Library of Parameter Server Design Doc)

                                        -

                                        In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a instance of parameter updater and call it’s functions directly. In this design, we will also expose the api of RemoteParameterUpdater to python with swig.

                                        -
                                        -

                                        Sparse Remote Parameter Updater

                                        -

                                        Since we will only implement dense parameter management new, the mechanism for sparse parameter will be discussed in next stage.

                                        -
                                        -
                                        -
                                        -

                                        Interface Design

                                        -

                                        TBD

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/save_model.html b/develop/doc_cn/design/cluster_train/save_model.html deleted file mode 100644 index c137aaa38befbd154ac1d9a65c949796a35a5ecd..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/save_model.html +++ /dev/null @@ -1,368 +0,0 @@ - - - - - - - - - - - - - Design Doc: Save Model — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Save Model

                                        -
                                        -

                                        Overview

                                        -

                                        The model is the output of the training process. There are two -ways from which user can obtain a model:

                                        -
                                          -
                                        • Save model triggered by user code: user code asks PaddlePaddle to -save a model.
                                        • -
                                        • Convert model from the checkpoint: model being converted from -pservers’ periodic checkpoint. In this way, the user can cancel a -job at any time, and still have a relatively fresh model (we -checkpoint around every 5 minutes).
                                        • -
                                        -
                                        -

                                        Trainer Saving Model vs. Pservers Saving Model

                                        -

                                        Both trainers and pservers have access to the model. So the model can -be saved from a trainer or pservers. We need to decide where the model -is saved from.

                                        -
                                        -

                                        Dense Update vs. Sparse Update

                                        -

                                        There are two types of model update methods: dense update and sparse -update (when the model parameter is configured to be sparse).

                                        -
                                          -
                                        • Dense update

                                          -

                                          Every trainer has it’s own full copy of the model. Every model -update will update the entire model.

                                          -
                                        • -
                                        • Sparse update

                                          -

                                          The training input is sparse, and the trainer does not have the -entire model. It will only download the sub-model necessary related -to the input. When updating the model, only the sub-model related to -the training input is updated.

                                          -
                                        • -
                                        -
                                        -
                                        -

                                        Pservers Saving Model

                                        -

                                        The benefit of letting pservers save model is they have the entire -model all the time. However, since pservers are on different nodes, it -requires a merging process to merge model shards into the same -model. Thus requires the pservers to write models to a distributed -filesystem, making the checkpoint shards visible to the merge program.

                                        -
                                        -
                                        -

                                        Trainer Saving Model

                                        -

                                        The benefit of letting one trainer to save the model is it does not -require a distributed filesystem. And it’s reusing the same save model -logic when training locally - except when doing sparse update, the -trainer needs to download the entire model during the saving process.

                                        -
                                        -
                                        -

                                        Conclusion

                                        -

                                        Given trainer saving model does not require a distributed filesystem, -and is an intuitive extension to trainer saving model when training -locally, we decide to let the trainer save the model when doing -distributed training.

                                        -
                                        -
                                        -
                                        -

                                        Convert Model from Checkpoint

                                        -

                                        TODO

                                        -
                                        -
                                        -
                                        -

                                        Timeline

                                        -

                                        We first implement trainer save the model. Converting the latest -snapshot to a model will be a TODO for future.

                                        -
                                        -
                                        -

                                        Trainer Save Model

                                        -
                                        -

                                        Trainer Election

                                        -

                                        One trainer will be elected as the one to save the model. When using -etcd, trainer ID is a randomly generated UUID, the trainer will -contact the master server requesting to save the model, and find out -if itself is elected. When the master server is not used, unique -trainer IDs will be given by the administrator, the trainer whose ID -is “0” is elected to save the model.

                                        -
                                        -
                                        -

                                        Model Save Path

                                        -

                                        Each trainer will be given the directory to save the model. The -elected trainer will save the model to -given-directory/trainerID. Since the trainer ID is unique, this -would prevent concurrent save to the same file when multiple trainers -are elected to save the model when split-brain problem happens.

                                        -
                                        -
                                        -

                                        What Happens When Model Is Saving

                                        -

                                        It takes some time to save model, we need to define what will happen -when save model is taking place.

                                        -

                                        When doing dense update, the trainer uses the local model. Pservers -does not need to pause model update.

                                        -

                                        When doing sparse update. The trainer needs to download the entire -model while saving. To get the most accurate model, the model update -needs to be paused before the download starts and resumed after the -download finishes. Otherwise, the trainer gets a model that is -“polluted”: some part of the model is old, some part of the model is -new.

                                        -

                                        It’s unclear that the “polluted” model will be inferior due to the -stochastic nature of deep learning, and pausing the model update will -add more complexity to the system. Since supporting sparse update is a -TODO item. We defer the evaluation of pause the model update or not -during saving model to the future.

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cluster_train/submit-job.html b/develop/doc_cn/design/cluster_train/submit-job.html deleted file mode 100644 index 05aaeea8f6821dbbeec0fc4b3f8f18047ad0f1ab..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cluster_train/submit-job.html +++ /dev/null @@ -1,397 +0,0 @@ - - - - - - - - - - - - - Submit a Distributed Training Job — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Submit a Distributed Training Job

                                        -

                                        The user can submit a distributed training job with Python code, rather than with a command-line interface.

                                        -
                                        -

                                        Runtime Environment On Kubernetes

                                        -

                                        For a distributed training job, there is two Docker image called runtime Docker image and base Docker image. The runtime Docker image is the Docker image that gets scheduled by Kubernetes to run during training. The base Docker image is for building the runtime Docker image.

                                        -
                                        -

                                        Base Docker Image

                                        -

                                        Usually, the base Docker image is PaddlePaddle product Docker image including paddle binary files and python package. And of course, users can specify any image name hosted on any docker registry which users have the access right.

                                        -
                                        -
                                        -

                                        Runtime Docker Image

                                        -

                                        The trainer package which user upload and some Python dependencies are packaged into a runtime Docker image based on base Docker image.

                                        -
                                          -
                                        • Handle Python Dependencies

                                          -

                                          You need to provide requirements.txt file in your trainer-package folder. Example:

                                          -
                                          pillow
                                          -protobuf==3.1.0
                                          -
                                          -
                                          -

                                          More details about requirements, an example project looks like:

                                          -
                                            paddle_example
                                          -    |-quick_start
                                          -      |-trainer.py
                                          -      |-dataset.py
                                          -      |-requirements.txt
                                          -
                                          -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Submit Distributed Training Job With Python Code

                                        -

                                        -
                                          -
                                        • paddle.job.dist_train() will call the Job Server API /v1/packages to upload the trainer package and save them on CephFS, and then call /v1/trainer/job to submit the PaddlePaddle distributed job.
                                        • -
                                        • /v1/trainer/job will start a building job for preparing the runtime Docker image. When the building job is finished, Job Server will submit the PaddlePaddle distributed job to Kubernetes.
                                        • -
                                        • NOTE: For the first version, we will not prepare the runtime Docker image, instead, the package is uploaded to Paddle Cloud, and Paddle Cloud will mount the package in a temporary folder into the base Docker image. We will not support custom Python dependencies in the first version as well.
                                        • -
                                        -

                                        You can call paddle.job.dist_train and provide distributed training configuration as the parameters:

                                        -
                                        paddle.job.dist_train(
                                        -  trainer=dist_trainer(),
                                        -  paddle_job=PaddleJob(
                                        -    job_name = "paddle-cloud",
                                        -    entry_point = "python %s"%__file__,
                                        -    trainer_package = "/example/word2vec",
                                        -    image = "yancey1989/paddle-job",
                                        -    trainers = 10,
                                        -    pservers = 3,
                                        -    trainer_cpu = 1,
                                        -    trainer_gpu = 1,
                                        -    trainer_mem = "10G",
                                        -    pserver_cpu = 1,
                                        -    pserver_mem = "2G"
                                        -  ))
                                        -
                                        -
                                        -

                                        The parameter trainer of paddle.job.dist_train is a function and you can implement it as follows:

                                        -
                                        def dist_trainer():
                                        -  def trainer_creator():
                                        -    trainer = paddle.v2.trainer.SGD(...)
                                        -    trainer.train(...)
                                        -  return trainer_creator
                                        -
                                        -
                                        -

                                        The pseudo code of paddle.job.dist_train is as follows:

                                        -
                                        def dist_train(trainer, paddle_job):
                                        -  # if the code is running on cloud, set PADDLE_ON_CLOUD=YES
                                        -  if os.getenv("RUNNING_ON_CLOUD", "NO") == "NO":
                                        -    #submit the paddle job
                                        -    paddle_job.submit()
                                        -  else:
                                        -    #start the training
                                        -    trainer()
                                        -
                                        -
                                        -
                                        -

                                        PaddleJob Parameters

                                        -

                                        parameter | type | explanation -— | — | — -job_name | str | the unique name for the training job -entry_point | str | entry point for startup trainer process -trainer_package | str | trainer package file path which user have the access right -image|str|the base image for building the runtime image -pservers|int| Parameter Server process count -trainers|int| Trainer process count -pserver_cpu|int| CPU count for each Parameter Server process -pserver_mem|str| memory allocated for each Parameter Server process, a plain integer using one of these suffixes: E, P, T, G, M, K -trainer_cpu|int| CPU count for each Trainer process -trainer_mem|str| memory allocated for each Trainer process, a plain integer using one of these suffixes: E, P, T, G, M, K -trainer_gpu|int| GPU count for each Trainer process, if you only want CPU, do not set this parameter

                                        -
                                        -
                                        -

                                        Deploy Parameter Server, Trainer and Master Process

                                        -
                                          -
                                        • Deploy PaddlePaddle Parameter Server processes, it’s a Kubernetes ReplicaSet.
                                        • -
                                        • Deploy PaddlePaddle Trainer processes, it’s a Kubernetes Job.
                                        • -
                                        • Deploy PaddlePaddle Master processes, it’s a Kubernetes ReplicaSet.
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Job Server

                                        -
                                          -
                                        • RESTful API

                                          -

                                          Job server provides RESTful HTTP API for receiving the trainer package and displaying -PaddlePaddle job related informations.

                                          -
                                            -
                                          • POST /v1/package receive the trainer package and save them on CephFS
                                          • -
                                          • POST /v1/trainer/job submit a trainer job
                                          • -
                                          • GET /v1/jobs/ list all jobs
                                          • -
                                          • GET /v1/jobs/<job-name> the status of a job
                                          • -
                                          • DELETE /v1/jobs/<job-name> delete a job
                                          • -
                                          • GET /v1/version job server version
                                          • -
                                          -
                                        • -
                                        • Build Runtime Docker Image on Kubernetes

                                          -

                                          paddle.job.dist_train will upload the trainer package to Job Server, save them on the distributed filesystem, and then start up a job for building the runtime Docker image that gets scheduled by Kubernetes to run during training.

                                          -

                                          There are some benefits for building runtime Docker image on JobServer:

                                          -
                                            -
                                          • On Paddle Cloud, users will run the trainer code in a Jupyter Notebook which is a Kubernetes Pod, if we want to execute docker build in the Pod, we should mount the host’s docker.sock to the Pod, user’s code will connect the host’s Docker Engine directly, it’s not safe.
                                          • -
                                          • Users only need to upload the training package files, does not need to install docker engine, docker registry as dependencies.
                                          • -
                                          • If we want to change another image type, such as RKT, users do not need to care about it.
                                          • -
                                          -
                                        • -
                                        • Deploy Parameter Server, Trainer and Master Processes

                                          -

                                          POST /v1/trainer/job receives the distributed training parameters, and deploy the job as follows:

                                          -
                                            -
                                          • Deploy PaddlePaddle Parameter Server processes, it’s a Kubernetes ReplicaSet.
                                          • -
                                          • Deploy PaddlePaddle Trainer processes, it’s a Kubernetes Job.
                                          • -
                                          • Deploy PaddlePaddle Master processes, it’s a Kubernetes ReplicaSet.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/concurrent_programming.html b/develop/doc_cn/design/concurrent_programming.html deleted file mode 100644 index 9996b90bf674abd02058c88b3daac659997b82ec..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/concurrent_programming.html +++ /dev/null @@ -1,421 +0,0 @@ - - - - - - - - - - - - - Design Doc: Concurrent Programming with Fluid — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Concurrent Programming with Fluid

                                        -

                                        With PaddlePaddle Fluid, users describe a program other than a model. The program is a ProgramDesc protobuf message. TensorFlow/MxNet/Caffe2 applications generate protobuf messages too, but their protobuf messages represent the model, a graph of operators, but not the program that trains/uses the model.

                                        -

                                        Many know that when we program TensorFlow, we can specify the device on which each operator runs. This allows us to create a concurrent/parallel AI application. An interesting questions is how does a ProgramDesc represents a concurrent program?

                                        -

                                        The answer relies on the fact that a ProgramDesc is similar to an abstract syntax tree (AST) that describes a program. So users just program a concurrent program that they do with any concurrent programming language, e.g., Go.

                                        -
                                        -

                                        An Analogy

                                        -

                                        The following table compares concepts in Fluid and Go

                                        -

                                        | Go | Fluid | -|—-|——-| -|user-defined functions | layers | -| control-flow and built-in functions | intrinsics/operators | -| goroutines, channels | class ThreadPool | -| runtime | class Executor |

                                        -
                                        -
                                        -

                                        An Example Concurrent Program

                                        -

                                        To review all above concepts in an example, let us take a simple program and writes its distributed version.

                                        -

                                        Suppose that we want to parallelize a naive Fluid program (written in Go and calling Fluid’s Go binding) that multiplies two tensors.

                                        -
                                        import "fluid"
                                        -
                                        -func paddlepaddle() {
                                        -  X = fluid.read(...)
                                        -  W = fluid.Tensor(...)
                                        -  Y = fluid.mult(X, W)
                                        -}
                                        -
                                        -
                                        -

                                        Please be aware that the Fluid’s Go binding provides the default main function, which calls the paddlepaddle function, which, in this case, is defined in above program and creates the following ProgramDesc message.

                                        -
                                        message ProgramDesc {
                                        -  block[0] = Block {
                                        -    vars = [X, W, Y],
                                        -    ops = [
                                        -      read(output = X)
                                        -      assign(input = ..., output = W)
                                        -      mult(input = {X, W}, output = Y)
                                        -    ],
                                        -  }
                                        -}
                                        -
                                        -
                                        -

                                        Then, the default main function calls fluid.run(), which creates an instance of the class Executor and calls Executor.Run(block[0]), where block[0] is the first and only block defined in above ProgramDesc message.

                                        -

                                        The default main function is defined as follows:

                                        -
                                        func main() {
                                        -  paddlepaddle()
                                        -  fluid.run()
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        The Concurrent Version

                                        -

                                        By parallelizing the above program, we could support very big tensor X by splitting into small pieces {x_1, x_2, ...} and sent each piece to worker process/node for parallel multiplication.

                                        -

                                        In this case, we can write a transpiler that takes a ProgramDesc message that represents the above example program and outputs two ProgramDesc messages, one for running on the master process/node, and the other one for worker processes/nodes.

                                        -
                                        -

                                        The Master Program

                                        -

                                        The master program could look like the following:

                                        -
                                        message ProgramDesc {
                                        -  block[0] = Block {
                                        -    vars = [X, L, Y],
                                        -    ops = [
                                        -      read(output = X)
                                        -      kube_get_workers_addrs(output = L)
                                        -      Y = tensor_array(len(L))
                                        -      parallel_for(input = X, output = Y, 
                                        -                   attrs = {L, block_id(1)}) # referring to block 1
                                        -    ]
                                        -  }
                                        -  
                                        -  block[1] = Block {
                                        -    parent = 0,
                                        -    vars = [x, y, index],
                                        -    ops = [
                                        -      slice(input = [X, index], output = x) # index is initialized by parallel_for
                                        -      send(input = x, attrs = L[index])
                                        -      recv(outputs = y, attrs = L[index])
                                        -      assign(input = y, output = Y[index])
                                        -    ]
                                        -  }
                                        -}
                                        -
                                        -
                                        -

                                        The equivalent Fluid program (calling the Go binding) is:

                                        -
                                        func main() {  //// block 0
                                        -  X = fluid.read(...)
                                        -  L = fluid.k8s.get_worker_addrs()
                                        -  Y = fluid.tensor_array(len(L))
                                        -  fluid.parallel_for(X, L, 
                                        -                     func(index int) {  //// block 1
                                        -                       x = X[index]
                                        -                       fluid.send(L[index], x)
                                        -                       y = fluid.recv(L[index])
                                        -                       Y[index] = y
                                        -                     })
                                        -}
                                        -
                                        -
                                        -

                                        An explanation of the above program:

                                        -
                                          -
                                        • fluid.k8s is a package that provides access to Kubernetes API.
                                        • -
                                        • fluid.k8s.get_worker_addrs returns the list of IP and ports of all pods of the current job except for the current one (the master pod).
                                        • -
                                        • fluid.tensor_array creates a tensor array. fluid.parallel_for creates a ParallelFor intrinsic, which, when executed,
                                            -
                                          1. creates len(L) scopes, each for the concurrent running of the sub-block (block 1 in this case), and initializes a variable named “index” in the scope to an integer value in the range [0, len(L)-1], and
                                          2. -
                                          3. creates len(L) threads by calling into the ThreadPool singleton, each thread
                                              -
                                            1. creates an Executor instance, and
                                            2. -
                                            3. calls Executor.Run(block), where block is block 1 as explained above.
                                            4. -
                                            -
                                          4. -
                                          -
                                        • -
                                        -
                                          -
                                        1. Please be aware that block 1 is a sub-block of block 0, so ops in block 1 could refer to variables defined in block 0.
                                        2. -
                                        -
                                        -
                                        -

                                        The Worker Program

                                        -

                                        The worker program looks like

                                        -
                                        func main() {
                                        -  W = Tensor(...)
                                        -  x = fluid.listen_and_do(
                                        -        fluid.k8s.self_addr(),
                                        -        func(input Tensor) {
                                        -          output = fluid.mult(input, W)
                                        -        })
                                        -}
                                        -
                                        -
                                        -

                                        where

                                        -
                                          -
                                        • fluid.listen_and_do creates a ListenAndDo intrinsic, which, when executed,
                                            -
                                          1. listens on the current pod’s IP address, as returned by fliud.k8s.self_addr(),
                                          2. -
                                          3. once a connection is established,
                                              -
                                            1. creates a scope of two parameters, “input” and “output”,
                                            2. -
                                            3. reads a Fluid variable and saves it into “input”,
                                            4. -
                                            5. creates an Executor instance and calls Executor.Run(block), where the block is generated by running the lambda specified as the second parameter of fluid.listen_and_do.
                                            6. -
                                            -
                                          4. -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Summarization

                                        -

                                        From the above example, we see that:

                                        -
                                          -
                                        1. Fluid enables the imperative programming paradigm by:
                                            -
                                          1. letting users describe a program, but not a model (a sequence of layers, or a graph of operators), and
                                          2. -
                                          3. call the fluid.run function that runs the program implicitly.
                                          4. -
                                          -
                                        2. -
                                        3. The program is described as a ProgramDesc protobuf message.
                                        4. -
                                        5. Function Executor.Run takes a block, instead of a ProgramDesc, as its parameter.
                                        6. -
                                        7. fluid.run calls Executor.Run to run the first block in the ProgramDesc message.
                                        8. -
                                        9. Executor.Run‘s implementation is extremely simple – it doesn’t plan the execution nor create threads; instead, it runs on the current thread and execute intrinsics/operators’ Run method sequentially as they appear in the Block.ops array.
                                        10. -
                                        11. Intrinsics/operators’ Run method might create threads. For example, the ListenAndDo operator creates a thread to handle each incoming request.
                                        12. -
                                        13. Threads are not necessarily OS thread; instead, they could be green threads managed by ThreadPool. Multiple green threads might run on the same OS thread. An example green threads is Go’s goroutines.
                                        14. -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/cpp_data_feeding.html b/develop/doc_cn/design/cpp_data_feeding.html deleted file mode 100644 index d0f5160d917b2c285e4810c40af2d69b3abe8ea1..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/cpp_data_feeding.html +++ /dev/null @@ -1,330 +0,0 @@ - - - - - - - - - - - - - C++ Data Feeding — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        C++ Data Feeding

                                        -

                                        In training with Paddle V2 API, data feeding wholly dependents on Python code. To get rid of the Python environment and achieve the goal of “wrapping the whole training by a while loop op” in Paddle Fluid, a C++ data feeding mechanism is required.

                                        -

                                        In this document we show the fundamental design of C++ data feeding process, which includes the data reading, shuffling and batching.

                                        -
                                        -

                                        Reader

                                        -

                                        A new concept named ‘Reader’ is introduced. Reader is a series of inherited classes which can be hold by our Variable and they are used to read or process file data.

                                        -
                                        -

                                        ReaderBase

                                        -

                                        ReaderBase is the abstract base class of all readers. It defines the all readers’ interfaces.

                                        -
                                        class ReaderBase {
                                        - public:
                                        -  explicit ReaderBase(const std::vector<DDim>& shapes) : shapes_(shapes) {
                                        -    PADDLE_ENFORCE(!shapes_.empty());
                                        -  }
                                        -  // Read the next batch of data. (A 'batch' can be only one instance)
                                        -  virtual void ReadNext(std::vector<LoDTensor>* out) = 0;
                                        -  // Show whether the next bacth exists.
                                        -  virtual bool HasNext() const = 0;
                                        -  
                                        -  // Reinitialize the reader and read the file from the begin.
                                        -  virtual void ReInit() = 0;
                                        -  
                                        -  // Get a certain read in data's shape.
                                        -  DDim shape(size_t idx) const;
                                        -  // Get shapes of all read in data.
                                        -  std::vector<DDim> shapes() const { return shapes_; }
                                        -  // Set shapes of read in data.
                                        -  void set_shapes(const std::vector<DDim>& shapes) { shapes_ = shapes; }
                                        -
                                        -  virtual ~ReaderBase() {}
                                        -
                                        - protected:
                                        -  std::vector<DDim> shapes_;
                                        -};
                                        -
                                        -
                                        -
                                        -
                                        -

                                        FileReader and DecoratedReader

                                        -

                                        These two classes are derived from the ReaderBase and will further be derived by respective specific readers. That is to say, in our design, there are two kinds of readers: file readers and decorated readers. A file reader reads from a file of some specific format, and yield only one instance of data at a time. e.g. RecordIO reader, jpg reader, .... A decorated reader takes another reader(both file reader and decorated reader are OK) as its ‘underlying reader’. It gets data from its underlying reader, does some process on them(shuffling, or batching), then yields processed data. The output data of a decorated reader can be a single instance or a batch. ShuffleReader and BatchReader are both decorated readers.

                                        -

                                        All the readers share exactly the same interfaces defined in ReaderBase. So they can be decorated for more than one time: We can shuffle a reader’s outputs and then batch the shuffle outputs. The interface consistency also allows related ops use readers without knowing what they are exactly.

                                        -
                                        -
                                        -

                                        ReaderHolder

                                        -

                                        Different readers belong to different class types. It leads to a problem: How can we drop them into Variables and fetch them out by a unified method? For example, if a Variable holds a BatchReader, we can not get it by the following code:

                                        -
                                        var->Get<ReaderBase>("batch_reader");
                                        -
                                        -
                                        -

                                        we have to write:

                                        -
                                        var->Get<BatchReader>("batch_reader");
                                        -
                                        -
                                        -

                                        This requires each time getting a reader from a variable we must know the reader’s type exactly. It is nearly impossible.

                                        -

                                        To solve this problem, we introduce ReaderHolder as a wrapper. It acts as an empty decorator of ReaderBase, which erases reader’s type. With ReaderHolder we are able to fetch all types of readers by var->Get<ReaderHolder>("...") and regard the obtained object as a reader.

                                        -
                                        -
                                        - -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/csp.html b/develop/doc_cn/design/csp.html deleted file mode 100644 index 3b43da3f4de92692127c5cf30c90d9006e8de75e..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/csp.html +++ /dev/null @@ -1,459 +0,0 @@ - - - - - - - - - - - - - Design Doc: CSP in PaddlePaddle Fluid — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: CSP in PaddlePaddle Fluid

                                        -
                                        -

                                        Motivation

                                        -

                                        Concurrent programming is important for deep learning. Few example applications are:

                                        -
                                          -
                                        1. The main thread keeps reading the next mini-batch while another thread uses the GPU for computing.
                                        2. -
                                        3. The main thread performs the computation while another thread uploads the local gradients from each trainer to the parameter server.
                                        4. -
                                        -

                                        Most DL systems, including TensorFlow, Caffe2, and MxNet, can asynchronously execute operators in a graph. However, Fluid doesn’t have the concept of a graph at all, as the design goal of Fluid is that of a programming language.

                                        -
                                        -
                                        -

                                        Concurrent Programming Models

                                        -

                                        There were many concurrent programming models, implemented in various forms:

                                        -

                                        | concurrent programming model | implementation | -|—–|—–| -| mutex | types and functions in standard libraries | -| semaphore | types and functions in standard libraries | -| communicating sequential processes (CSP) | Go programming language | -| actor model | Erlang programming language | -| message passing | MPI | -| bulk synchronous parallel (BSP) | Pregel distributed programming framework |

                                        -

                                        Since Fluid was designed to be a programming language, we would like to implement CSP in Fluid.

                                        -
                                        -

                                        CSP v.s. Actor Model

                                        -

                                        A well-known implementation of Actor Model is the Erlang programming language. In Actor Model, processes could send messages to another process and receive messages from another process given the process IDs. We can find the three ingredients, process with ID, send, and recv, in MPI too. Indeed, we can rewrite Erlang programs in Python + MPI with possibly fewer lines of code. Our concern with Actor Model is that it doesn’t seem reasonable to implement process management in a programming language’s runtime library; instead, it should be the operating systems’ responsibility to manage processes and libraries like MPI for send/recv.

                                        -
                                        -
                                        -
                                        -

                                        CSP in Fluid

                                        -

                                        Fluid has two fundamental control-flows: if-else and while. If we are to implement CSP, we need the following:

                                        -
                                          -
                                        1. a new data type: channel and operators send and recv,
                                        2. -
                                        3. goroutine or thread, and
                                        4. -
                                        5. a new control-flow: select.
                                        6. -
                                        -

                                        We also need Python wrappers for the above components.

                                        -

                                        The type channel is conceptually the blocking queue. In Go, its implemented is a blocking circular queue, which supports send and recv.

                                        -

                                        The select operation has been in OS kernels long before Go language. All Unix kernels implement system calls poll and select. They monitor multiple file descriptors to see if I/O is possible on any of them. This takes O(N) time. Since Linux 2.6, a new system call, epoll, can do the same in O(1) time. In BSD systems, there is a similar system call kqueue. Go’s Linux implementation uses epoll.

                                        -

                                        It might be a good idea to implement Fluid’s select using epoll too. In this design doc, we start from the O(N) way so that we could focus on Python binding and the syntax.

                                        -
                                        -

                                        Type Channel

                                        -

                                        Fluid supports many data types:

                                        -
                                          -
                                        1. Tensor,
                                        2. -
                                        3. Row-sparse Tensor
                                        4. -
                                        5. LoD Tensor,
                                        6. -
                                        7. Tensor array, etc
                                        8. -
                                        -

                                        Each data type is registered in the framework.proto as an enum value. To add a new type channel, we need to add a new type enum.

                                        -

                                        To expose a C++ type to Python, we need to edit the pybind.cc file. Here is an example how we expose C++ class LoDTensor.

                                        -
                                        -
                                        -
                                        -

                                        Syntax Design

                                        -
                                        -

                                        Create Channel

                                        -

                                        In Go, we create a channel by specifying the element type and buffer size:

                                        -
                                        ch  := make(chan int)       // a channel without buffer
                                        -ch1 := make(chan int, 100)  // a channel that can buffer 100 ints.
                                        -
                                        -
                                        -

                                        In Fluid, we should be able to do the same:

                                        -
                                        ch  = fluid.make_channel(dtype=INT)
                                        -ch1 = fluid.make_channel(dtype=INT, 100)
                                        -
                                        -
                                        -

                                        In addition to that, we want channels that can hold more complex element types, e.g., Tensors of float16:

                                        -
                                        ch = fluid.make_channel(dtype=Tensor, etype=float16)
                                        -
                                        -
                                        -

                                        or Tensors of Tensors of float16 etc.

                                        -

                                        The point here is that we need a consistent way to compose types, like in C++ we can have Tensor<Tensor<...<float16>...> >.

                                        -
                                        -
                                        -

                                        Send and Recv

                                        -

                                        Go’s CSP implementation depends on data type channel. There are two types of channels:

                                        -
                                          -
                                        1. The unblocked channel, or buffered channel, is a blocking queue with a non-zero sized buffer. The sending to buffered channel blocks if the buffer is full, and the receive operation blocks if the buffer is empty.
                                        2. -
                                        3. blocked channel, or unbuffered channel, is a blocking queue with no buffer. Both sending and receiving block with unbuffered channels.
                                        4. -
                                        -

                                        There are four types of actions with a channel:

                                        -
                                          -
                                        1. Create a channel

                                          -
                                          ch := make(chan int) // this is an unbuffered channel
                                          -ch := make(chan int, 100) // this is a buffered channel of 100 ints.
                                          -
                                          -
                                          -
                                        2. -
                                        3. Send

                                          -
                                          ch <- 111
                                          -
                                          -
                                          -
                                        4. -
                                        5. Recv

                                          -
                                          y, ok <- ch
                                          -
                                          -
                                          -
                                        6. -
                                        7. Close

                                          -
                                          close(ch)
                                          -
                                          -
                                          -

                                          Please be aware that a closed channel is not a nil channel, which is var ch chan int.

                                          -
                                        8. -
                                        -

                                        There are some axioms with channels:

                                        -
                                          -
                                        1. A send to a nil channel blocks forever
                                        2. -
                                        3. A receive from a nil channel blocks forever
                                        4. -
                                        5. A send to a closed channel panics
                                        6. -
                                        7. A receive from a closed channel returns the residual values and then zeros.
                                        8. -
                                        -

                                        In Fluid, we have buffered channels and unbuffered channels

                                        -

                                        The following program illustrates the Python syntax for accessing Fluid buffers.

                                        -
                                        import fluid
                                        -
                                        -buffer_size = 10
                                        -ch = fluid.make_channel(dtype=INT, buffer_size)
                                        -
                                        -# Now write three elements to the channel
                                        -with fluid.while(steps=buffer_size):
                                        -  fluid.send(ch, step)
                                        -
                                        -fluid.close_channel(ch)
                                        -
                                        -with fluid.while(steps=buffer_size):
                                        -  fluid.print(fluid.recv(ch))
                                        -
                                        -
                                        -

                                        The following example shows that to avoid the always-blocking behavior of unbuffered channels, we need to use Fluid’s goroutines.

                                        -
                                        import fluid
                                        -
                                        -ch = fluid.make_channel(dtype=INT)
                                        -
                                        -with fluid.go():
                                        -  fluid.send(ch)
                                        -
                                        -y = fluid.recv(ch)
                                        -
                                        -fluid.close_channel(ch)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Select

                                        -

                                        In Go, the select statement lets a goroutine wait on multiple communication operations. A select blocks until one of its cases can run, then it executes that case. It chooses one at random if multiple are ready.

                                        -
                                        ch1  := make(chan int)       
                                        -ch2  := make(chan int, 100)
                                        -
                                        -x := 0
                                        -
                                        -for {
                                        -    select {
                                        -    case ch1 <- x:
                                        -      x := x + 1
                                        -    case y <- ch2:
                                        -      fmt.Println("Received on channel")
                                        -    default:
                                        -      fmt.Println("Default")
                                        -    }
                                        -  }
                                        -
                                        -
                                        -

                                        In Fluid, we should be able to do the same:

                                        -
                                        ch1  = fluid.make_chan(dtype=INT)
                                        -ch2 = fluid.make_chan(dtype=INT, 100)
                                        -
                                        -sel = fluid.select()
                                        -
                                        -with sel.case(ch1, 'w', X):
                                        -    fluid.layers.increment(X)
                                        -
                                        -with sel.case(ch2, 'r', Y):
                                        -    fluid.print("Received on Channel")
                                        -
                                        -with sel.default():
                                        -    fluid.print("Default")
                                        -
                                        -
                                        -
                                        -

                                        In the above code snippet, X and Y are variables. Now let us look at each of these statements one by one.

                                        -
                                          -
                                        • sel.case(ch1, 'w', X) : This specifies that we are writing to ch1 and we want to write the integer in variable X to the channel. The character w is used here to make the syntax familiar to write syntax in Python I/O.
                                        • -
                                        • sel.case(ch2, 'r', Y) : This specifies that we would like to read the result from ch2 into variable Y. The character r is used here to make the syntax familiar to read syntax in Python I/O.
                                        • -
                                        • sel.default() : This is equivalent to the default in Go select. If none of the channels are ready for read or write, then the fluid code in the default block will be executed.
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Example Programs

                                        -
                                        -

                                        1. RPC between Trainers and Parameter Servers

                                        -
                                        -
                                        -

                                        2. Concurrent Minibatch Loading

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/dist_refactor/distributed_architecture.html b/develop/doc_cn/design/dist_refactor/distributed_architecture.html deleted file mode 100644 index 99c2c2ab6280d40e162d68beb4c4f2a56df97893..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/dist_refactor/distributed_architecture.html +++ /dev/null @@ -1,432 +0,0 @@ - - - - - - - - - - - - - Design Doc: Distributed Training Architecture — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Distributed Training Architecture

                                        -
                                        -

                                        Abstract

                                        -

                                        PaddlePaddle version 0.10.0 uses the “trainer-parameter server” architecture. We run multiple instances of trainers (where each trainer runs the same model) and parameter servers for distributed training. This architecture serves well, but has few limitations:

                                        -
                                          -
                                        1. There is a need to write special code that handles tasks which should only be run on a single trainer. E.g., initializing the model, saving the model etc.
                                        2. -
                                        3. Model parallelism is hard: It would need all the if-else branches conditioned on the trainer ID to partition the model onto the trainers, and eventually manually writing out the inter-model-shard communication code to communicate between different trainers.
                                        4. -
                                        5. The user can not directly specify the parameter update rule: This would need to modify the parameter server code and compile a new binary. This makes things more complicated for researchers: A lot of extra effort is required to make this work. Besides, the training job submission program may not allow running arbitrary binaries.
                                        6. -
                                        -

                                        This design doc discusses PaddlePaddle’s new distributed training architecture that addresses the above mentioned limitations.

                                        -
                                        -
                                        -

                                        Analysis

                                        -

                                        The assumption is that the user writes the trainer program in either Python or C++.

                                        -
                                        -

                                        Limitation 1

                                        -

                                        There are two basic functionalities in the trainer program:

                                        -
                                          -
                                        1. The training logic such as loading / saving the model and printing out the logs.
                                        2. -
                                        3. The neural network definition such as the definition of the data layer, the fully connected layer, the cost function and the -optimizer.
                                        4. -
                                        -

                                        When we train using PaddlePaddle v0.10.0 in a distributed fashion, multiple instances of the same Python code are run on different nodes, hence both: the -training logic as well as the neural network computation logic, is replicated.

                                        -

                                        The tasks that only need to be run once belong to the training logic. Hence if we only replicate the neural network computation part, and do not -replicate the training logic, the limitation mentioned above can be avoided.

                                        -
                                        -
                                        -

                                        Limitation 2

                                        -

                                        Model parallelism means that a single model is partitioned into different components and each node runs one of the component separately. This comes at the extra cost of managing the -inter-model-shard communication between nodes.

                                        -

                                        PaddlePaddle should ideally be able to modify the neural network computation and figure out the support for model parallelism automatically. However, the -computation is only specified in Python code which sits outside of PaddlePaddle, hence PaddlePaddle can not support the feature in this setup.

                                        -

                                        Similar to how a compiler uses an intermediate representation (IR) so that the programmer does not need to manually optimize their code for most of the cases, we can have an intermediate representation in PaddlePaddle as well. The compiler optimizes the IR as follows:

                                        -

                                        -

                                        PaddlePaddle can support model parallelism by converting the IR so that the user no longer needs to manually perform the computation and operations in the Python component:

                                        -

                                        -

                                        The IR for PaddlePaddle after refactoring is called a Block, it specifies the computation dependency graph and the variables used in the computation.

                                        -
                                        -
                                        -

                                        Limitation 3

                                        -

                                        The user can not directly specify the parameter update rule for the parameter server in the Python module, since the parameter server does not use the same computation definition as the trainer. Instead, the update rule is baked inside the parameter server. The user can not specify the update rule explicitly.

                                        -

                                        This could be fixed by making the parameter server also run an IR, which can be different to the trainer side -For a detailed explanation, refer to this document - -Design Doc: Parameter Server

                                        -
                                        -
                                        -
                                        -

                                        Distributed Training Architecture

                                        -

                                        The revamped distributed training architecture can address the above discussed limitations. Below is the illustration of how it does so:

                                        -

                                        -

                                        The major components are: Python API, Distribute Transpiler and Remote Executor.

                                        -
                                        -

                                        Python API

                                        -

                                        Python API is the Python library that user’s Python code invokes, to read the data, build the neural network topology, and start training, etc.

                                        -
                                        images = fluid.layers.data(name='pixel', shape=[1, 28, 28], dtype='float32')
                                        -label = fluid.layers.data(name='label', shape=[1], dtype='int64')
                                        -...
                                        -predict = fluid.layers.fc(input=conv_pool_2, size=10, act="softmax")
                                        -cost = fluid.layers.cross_entropy(input=predict, label=label)
                                        -avg_cost = fluid.layers.mean(x=cost)
                                        -optimizer = fluid.optimizer.Adam(learning_rate=0.01)
                                        -optimizer.minimize(avg_cost)
                                        -
                                        -train_reader = paddle.batch(
                                        -    paddle.reader.shuffle(
                                        -        paddle.dataset.mnist.train(), buf_size=500),
                                        -    batch_size=BATCH_SIZE)
                                        -
                                        -place = fluid.CPUPlace()
                                        -exe = fluid.Executor(place)
                                        -
                                        -for pass_id in range(10):
                                        -    for data in train_reader():
                                        -        loss, acc = exe.run(trainer_prog,
                                        -                            feed=feeder.feed(data),
                                        -                            fetch_list=[avg_cost])
                                        -
                                        -
                                        -

                                        The code above is a typical local training program, the “Training Program” is built using helper functions such as -fluid.layer.fc. The training is done by calling Executor.run -iteratively.

                                        -

                                        For more details, the implementation of IR is Program, and ProgramDesc is the protobuf type.

                                        -

                                        Executor simply runs the ProgramDesc. For local training you generally use -Executor to run the program locally. For any kind of distributed training, you can use -RemoteExecutor to specify desired distributed training method with some optional arguments.

                                        -
                                        -
                                        -

                                        Distributed Transpiler

                                        -

                                        The Distributed Transpiler automatically converts the IR (in protobuf format) to partitioned IRs. Then -the Remote Executor dispatches the new IRs to Remote Executors across the cluster. -Below are the steps that are followed :

                                        -
                                          -
                                        1. User only need to change Executor to RemoteExecutor to change local program to distributed program.
                                        2. -
                                        3. RemoteExecutor calls Distributed Transpiler to “transpile” user’s program to several IRs representing a -distributed training program:
                                            -
                                          1. Parse configurations from RemoteExecutor.
                                          2. -
                                          3. Determine the type of distributed program, can be DataParallelism, ModelParallelism or Streaming.
                                          4. -
                                          5. Partition the ProgramDesc according to type and add send / recv OP pair on the boundaries. Take -DataParallelism type for example, it removes the optimization operators and add a send OP to the -“trainer” role, then add the optimization operators to the parameter server role within the recv OP.
                                          6. -
                                          -
                                        4. -
                                        5. Dispatch the partitioned graph to different RemoteExecutor in the cluster.
                                        6. -
                                        7. RemoteExecutor on each node run the received ProgramDesc utill the end.
                                        8. -
                                        -
                                        -
                                        -

                                        RemoteExecutor

                                        -

                                        As shown in the graph, RemoteExecutor.run sends the IR to the cluster for Execution. -You can also use parameter fetch_list to interactively fetch variable back to local for -log printing.

                                        -

                                        The Python RemoteExecutor is derived from Executor class.

                                        -
                                        exe = RemoteExecutor(
                                        -    feed=feeder.feed(data),
                                        -    fetch_list=[avg_cost],
                                        -    job_desc=JobDesc(
                                        -      jobname,
                                        -      num_trainer,
                                        -      num_pserver,
                                        -      cpu_per_trainer,
                                        -      gpu_per_trainer,
                                        -      mem_per_trainer,
                                        -      cpu_per_pserver,
                                        -      mem_per_pserver
                                        -    ))
                                        -for data in train_reader():
                                        -    loss, acc = exe.run(trainer_prog,
                                        -                        feed=feeder.feed(data),
                                        -                        fetch_list=[avg_cost])
                                        -
                                        -
                                        -

                                        JobDesc object describe the distributed job resource specification to run on -Cluster environment.

                                        -

                                        -

                                        RemoteExecutor.run sends the ProgramDesc and -TrainingJob -to a server in the cluster which executes RemoteExecutor.listen. This server is responsible -to start the final Kubernetes Jobs to run the different role of ProgramDesc from ConfigMap.

                                        -
                                        -
                                        -

                                        Placement Algorithm

                                        -

                                        Our first implementation will only support “trainer-parameter server” placement: the parameters, initializers, and optimizers are all placed on the PaddlePaddle runtimes with the parameter server role. Everything else will be placed on the PaddlePaddle runtimes with the trainer role. This has the same functionality as the “trainer-parameter server” architecture of PaddlePaddle v0.10.0, but is more generic and flexible.

                                        -

                                        In the future, a more general placement algorithm should be implemented, which makes placements according to the input IR, and a model of device computation time and device communication time. Model parallelism requires the generic placement algorithm.

                                        -
                                        -
                                        -

                                        Local Training Architecture

                                        -

                                        The local training architecture will be the same as the distributed training architecture, the difference is that everything runs locally, and there is just one PaddlePaddle runtime:

                                        -

                                        -
                                        -
                                        -

                                        Training Data

                                        -

                                        In PaddlePaddle v0.10.0, training data is typically read -with data reader from Python. This approach is -no longer efficient when training distributedly since the Python -process no longer runs on the same node with the trainer processes, -the Python reader will need to read from the distributed filesystem -(assuming it has the access) and send to the trainers, doubling the -network traffic.

                                        -

                                        When doing distributed training, the user can still use Python data -reader: the training data are sent with Executor.run. However, should -be used for debugging purpose only. The users are encouraged to use -the read data OPs.

                                        -
                                        -
                                        - -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/dist_refactor/multi_cpu.html b/develop/doc_cn/design/dist_refactor/multi_cpu.html deleted file mode 100644 index 3573981c48a362418fc0760522d4187df6734830..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/dist_refactor/multi_cpu.html +++ /dev/null @@ -1,314 +0,0 @@ - - - - - - - - - - - - - Design Doc: Execute the Program with Multi CPU — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Execute the Program with Multi CPU

                                        -
                                        -

                                        Abstract

                                        -

                                        This Design Doc propose an approach to make the user-defined Op graph -running with multi-CPU, we will use an auto transpiler to convert the user-defined -Op graph to a multi-CPU Op graph, and run ParallelDo Op to run the graph.

                                        -
                                        -
                                        -

                                        Transpiler

                                        -

                                        -

                                        After converted:

                                        -

                                        -
                                        -
                                        -

                                        Implement

                                        -
                                          -
                                        • Multi-CPU Transpiler will convert the graph to a multi-CPU graph -which would be executed with multi-threads.

                                          -
                                        • -
                                        • BlockingCounter will Init/Decrement an atomic counter, and Blocking Wait -for the atomic counter become 0:

                                          -
                                          BlockingCounter bc(thread_count);
                                          -for (int i = 0; i < thread_count; ++i) {
                                          -  thread_pool->Start([&bc] {bc.DecrementCount(); })
                                          -}
                                          -bc.Wait();
                                          -
                                          -
                                          -
                                        • -
                                        • ParallelDo Operator

                                          -
                                            -
                                          • Initialize a thread pool which is a Singleton.
                                          • -
                                          • Use a block id as the input, and create run the specify Block on independent scope -with multi-threads.
                                          • -
                                          • Initialize a BlockingCounter instance and wait until all threads are done.
                                          • -
                                          -
                                        • -
                                        • Split Operator will split the Input Tensor into a TensorArray.

                                          -
                                        • -
                                        • Merge merge all the gradients which calculated in different threads -with mean/sum/max/min... method, and then run the Optimizer Op to optimize W.

                                          -
                                        • -
                                        -
                                        -
                                        -

                                        TODO

                                        -
                                          -
                                        • Improve the optimizer stage with multi-threads, since we could -assign the parameters to the different threads and execute -optimizer with multi-threads.
                                        • -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/dist_refactor/parameter_server.html b/develop/doc_cn/design/dist_refactor/parameter_server.html deleted file mode 100644 index c3583de4ad8799d6c1855b301a7b6cf65444e7ce..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/dist_refactor/parameter_server.html +++ /dev/null @@ -1,362 +0,0 @@ - - - - - - - - - - - - - Design Doc: Parameter Server — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Parameter Server

                                        -
                                        -

                                        Abstract

                                        -

                                        We propose an approach to implement the parameter server. In this -approach, there is no fundamental difference between the trainer and -the parameter server: they both run subgraphs, but subgraphs of -different purposes.

                                        -
                                        -
                                        -

                                        Background

                                        -

                                        The previous implementations of the parameter server do not run a -fluid sub-program. Parameter initialization, optimizer computation, network -communication and checkpointing are implemented twice on both the -trainer as well as the parameter server.

                                        -

                                        It would be great if we can write code once and use them on both: the -trainer and the parameter server, since this reduces code duplication and -improves extensibility. Given that after the current refactoring, we are -representing everything as a computation graph on the -trainer. Representing everything as a computation graph on the parameter -server becomes a natural extension.

                                        -
                                        -
                                        -

                                        Design

                                        -
                                        -

                                        Distributed Transpiler

                                        -

                                        The Distributed Transpiler converts the user-defined fluid program -into sub-programs to be scheduled on different nodes with the following -steps:

                                        -
                                          -
                                        1. OP placement: the OPs will be placed on different nodes according -to a heuristic that minimizes the estimated total computation -time. Currently we will use a simple heuristic that puts parameter -variable on parameter server workers and everything else on trainer -workers.
                                        2. -
                                        3. Add communication OPs to enable the communication between nodes.
                                        4. -
                                        -

                                        We will need these OPs: Send, Recv, Enqueue, Dequeue.

                                        -

                                        Below is an example of converting the user defined graph to the -subgraphs for the trainer and the parameter server:

                                        -

                                        -

                                        After converting:

                                        -

                                        -
                                          -
                                        1. The parameter variable W and its optimizer program are placed on the parameter server.
                                        2. -
                                        3. Operators are added to the program.
                                            -
                                          • Send sends data to the connected Recv operator. The -scheduler on the receive node will only schedule Recv operator -to run when the Send operator has ran (the Send OP will mark -the Recv OP runnable automatically).
                                          • -
                                          • Enqueue enqueues the input variable, it can block until space -become available in the queue.
                                          • -
                                          • Dequeue outputs configurable numbers of tensors from the -queue. It will block until the queue has the required number of -tensors.
                                          • -
                                          -
                                        4. -
                                        -
                                        -
                                        -

                                        Benefits

                                        -
                                          -
                                        • Model parallelism becomes easier to implement: it is an extension to -the trainer - parameter server approach. We can have several “Transpilers” -to achieve different goals.
                                        • -
                                        • User-defined optimizer is easier to add - user can now express it as -a sub-program.
                                        • -
                                        • No more duplication logic inside the trainer and the parameter -server mentioned in the background section.
                                        • -
                                        -
                                        -
                                        -

                                        Challenges

                                        -
                                          -
                                        • It is important to balance the parameter shards on multiple -parameter servers. If a single parameter is very big (for example: some -word-embedding, fully connected, softmax layer), we need to -automatically partition the single parameter onto different -parameter servers when possible (only element-wise optimizer depends -on the parameter variable).
                                        • -
                                        • In the “Async SGD” figure, the “W” variable on the parameter server -could be read and written concurrently. See -here for more -details about concurrent program in Fluid.
                                        • -
                                        -
                                        -
                                        -

                                        Discussion

                                        -
                                          -
                                        • Can the Enqueue OP be implemented under our current tensor design -(put the input tensor into the queue tensor)?
                                        • -
                                        • Dequeue OP will have variable numbers of output (depending on the -min_count attribute), does our current design support it? (similar -question for the Add OP)
                                        • -
                                        -
                                        - -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/error_clip.html b/develop/doc_cn/design/error_clip.html deleted file mode 100644 index f6054af0bec1a8f6ac3ec2719464f27f296d360e..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/error_clip.html +++ /dev/null @@ -1,340 +0,0 @@ - - - - - - - - - - - - - Error Clip — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Error Clip

                                        -
                                        -

                                        Overview

                                        -

                                        Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables’ gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next grad_op and be shrunk if necessary.

                                        -
                                        -
                                        -

                                        Usage

                                        -

                                        Users are allowed to assign different error clip methods or attributes to different Variables. Users can specify it as a parameter of Variable‘s constructor:

                                        -
                                        var = framework.Variable(..., error_clip=myErrorClip, ...)
                                        -
                                        -
                                        -

                                        The default value of error_clip is None, which means no error clip is employed. When it’s not None, it should take an object of BaseErrorClipAttr‘s derived class. So far, BaseErrorClipAttr has only one derived class: ErrorClipByValue, whose constructor is:

                                        -
                                        ErrorClipByValue(max, min=None)
                                        -
                                        -
                                        -

                                        max and min represent the maximal and minimal clip threshold respectively. In backward pass, all values of var‘s gradient greater than max or less than min will be clipped to max and min respectively. When the min is None, the minimal threshold will be assigned with -max automatically.

                                        -

                                        So we can enable the error clip with threshold [-5.0, 5.0] for variable var by:

                                        -
                                        var = framework.Variable(..., error_clip=ErrorClipByValue(max=5.0), ...)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Implementation

                                        -

                                        The BaseErrorClipAttr and its derived class ErrorClipByValue are defined in clip.py.

                                        -
                                        class BaseErrorClipAttr(object):
                                        -    def append_clip_op(self, block, grad_name):
                                        -        raise NotImplementedError()
                                        -
                                        -
                                        -class ErrorClipByValue(BaseErrorClipAttr):
                                        -    def __init__(self, max, min=None):
                                        -        max = float(max)
                                        -        if min is None:
                                        -            min = -max
                                        -        else:
                                        -            min = float(min)
                                        -        self.max = max
                                        -        self.min = min
                                        -
                                        -    def append_clip_op(self, block, grad_name):
                                        -        clip_op_desc = block.desc.append_op()
                                        -        clip_op_desc.set_type("clip")
                                        -        clip_op_desc.set_input("X", [grad_name])
                                        -        clip_op_desc.set_output("Out", [grad_name])
                                        -        clip_op_desc.set_attr("min", self.min)
                                        -        clip_op_desc.set_attr("max", self.max)
                                        -
                                        -
                                        -

                                        The BaseErrorClipAttr have one main member functions: append_clip_op(self, block, grad_name).

                                        -

                                        This function is used to create a clip_op and append it to the end of given block. For different error clip algorithm require different clip_op, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.

                                        -

                                        These clip_ops should be inserted after grad_ops whose output gradients need to be clipped. It is equivalent to appending some clip_ops to the end of the target block every time a new grad_op is added.

                                        -
                                        for op_desc in grad_op_descs:
                                        -        new_op_desc = target_block.desc.append_op()
                                        -        new_op_desc.copy_from(op_desc)
                                        -        callback(block=target_block, context=grad_to_var)
                                        -
                                        -
                                        -

                                        Here we employ a callback function to complete this kind of jobs. In _append_backward_ops_ function, each time after a grad_op is added to the target_block, a callback function is invoked. The logic of clip_op appending can be implemented inside the callback function.

                                        -

                                        The callback function for clip_op appending is defined in clip.py:

                                        -
                                        def error_clip_callback(block, context):
                                        -    # the context is a grad_to_var map
                                        -    grad_to_var = context
                                        -    op_desc = block.desc.op(block.desc.op_size() - 1)
                                        -    for grad_n in filter(lambda n: grad_to_var.has_key(n),
                                        -                         op_desc.output_arg_names()):
                                        -        fwd_var = block.var_recursive(grad_to_var[grad_n])
                                        -        error_clip = getattr(fwd_var, "error_clip", None)
                                        -        if not (error_clip is None or isinstance(error_clip,
                                        -                                                 BaseErrorClipAttr)):
                                        -            raise TypeError(
                                        -                "Variable's error_clip should be an instance of BaseErrorClipAttr or None."
                                        -            )
                                        -        if error_clip is not None:
                                        -            error_clip.append_clip_op(block, grad_n)
                                        -
                                        -
                                        -

                                        This function takes a block and a context(which is actually a grad_to_var map) as inputs. It checks each output of the last OpDesc in the block. Notice that the last OpDesc of the block must be a grad_op and its outputs must be some forward variables’ gradients. If an output gradient’s corresponding forward variable has an attribute of error_clip, error_clip_callback will call the error_clip‘s append_clip_op function to append the required clip_op into the block.

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/evaluator.html b/develop/doc_cn/design/evaluator.html deleted file mode 100644 index 938b2de241745238e4fb2edec0101e411318c407..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/evaluator.html +++ /dev/null @@ -1,320 +0,0 @@ - - - - - - - - - - - - - Evaluator Design — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Evaluator Design

                                        -
                                        -

                                        Problem Statement

                                        -

                                        During training or inference, we provide an evaluation function to measure the model performance, for example, accuracy, precision, etc. In the operator based framework design, the data passes through the network pipeline batch by batch. As a result, inside the operator, we only calculate the metrics for one minibatch. Thus, we need to provide a mechanism to calculate the metrics for each N pass/batch the user wants.

                                        -
                                        -
                                        -

                                        Evaluator Design

                                        -

                                        Currently, every operation is expressed in the graph. We divide the evaluator process into three steps.

                                        -
                                          -
                                        1. Initialize the metric state and add it into the block.
                                        2. -
                                        3. Calculate the concerned metrics for every mini-batch. The single evaluator operator is only responsible for calculating the necessary statistics for one mini-batch. For example, the accuracy operator only calculates the accuracy for a minibatch data if run once.
                                        4. -
                                        -
                                          -
                                        1. Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. When it comes to distributed training/Multi-GPU training, aggregate the value from different devices.
                                        2. -
                                        -
                                        -
                                        -

                                        Implementation

                                        -

                                        This design is shown in the Python API. -Each metric operator needs to caculate the metric statistic and return the batch-aware states. Python side is responsible for accumulating the states for each pass.

                                        -
                                        class Evaluator(object):
                                        -    """
                                        -    Evaluator Base class.
                                        -    """
                                        -    def __init__(self, name, **kwargs):
                                        -       """
                                        -       Different evaluator may has different metric states. E.g, Accuracy need two variables, total and right sample counts.
                                        -       Auc need four variables, `true_positives`,
                                        -         `true_negatives`, `false_positives` and `false_negatives`. So every evaluator should create its needed variables and append to main_program
                                        -
                                        -       The initialization of Evaluator should be responsible for:
                                        -       create metric states and append to the main_program
                                        -       """ 
                                        -       pass
                                        -
                                        -    def _update_ops(self, input, label, **kwargs)
                                        -       """
                                        -       Add mini-batch evaluator caculate operators to the main_program.
                                        -       Add increment operator to accumulate the metric states.
                                        -       """
                                        -    
                                        -
                                        -    def reset(self, executor, reset_program=None):
                                        -      """
                                        -      Reset metric states at the begin of each pass/user specified batch number.
                                        -      Execute the reset_program to reset the states.
                                        -      """
                                        -      
                                        -
                                        -    def eval(self, executor, eval_program=None):
                                        -      """
                                        -      Merge the mini-batch statistics to form the evaluation result for multiple mini-batches.
                                        -      Execute the eval_program and return the result.
                                        -      """
                                        -      return eval_result
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/executor.html b/develop/doc_cn/design/executor.html deleted file mode 100644 index c452460306ce4ae1aa60aa5abeb6ad1ae0bad503..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/executor.html +++ /dev/null @@ -1,292 +0,0 @@ - - - - - - - - - - - - - Executor Design Doc — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Executor Design Doc

                                        -
                                        -

                                        Motivation

                                        -

                                        In fluid, we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message -ProgramDesc that describes the process and is conceptually like an abstract syntax tree.

                                        -

                                        The executor runs the ProgramDesc like an interpreter. ProgramDesc contains the intrinsics (operators in this case) and variables which will be used, executor explicitly executes the stored precompiled code.

                                        -
                                        -
                                        -

                                        Overview

                                        -

                                        An executor takes a ProgramDesc, a block_id and a Scope. The ProgramDesc is a list of blocks and each block contains the protobuf definition of all the parameters and operators in the block. The block_id specifies the entrance block. And the Scope is the container of all the variable instances, which is persistent throughout different runs.

                                        -
                                        -
                                        -

                                        Executor

                                        -

                                        The Executor explicitly executes all the intrinsics (operators here) in the block_idth block of a ProgramDesc. Essentially, it instantiates Variables and Operators, then runs all the operators in sequence one-by-one. -It is very similar to how a push stack frame works when entering a block, following which it cleans up all the temporary variables when a mini-batch is finished. It does not however, have the stack frame pop process.

                                        -
                                        -

                                        The interface

                                        -
                                          Executor(places);
                                        -
                                        -
                                        -

                                        A executor does not own any computing resources, a user can only construct an executor using the specified places.

                                        -
                                        -
                                        -

                                        Running an Executor

                                        -
                                          void Run(ProgramDesc, Scope, block_id, create_local_scope);
                                        -
                                        -
                                        -

                                        An Executor only provides a unified way to execute ProgramDesc. ProgramDesc is the target that will be executed, the Scope specifies the variable container, the block_id indicates the entrance block and create_local_scope is a boolean that states whether it will destroy the temporary variables after the execution is finished.

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/file_manager/README.html b/develop/doc_cn/design/file_manager/README.html deleted file mode 100644 index d0345f86ab0b4e486b30988e026fcc96a25d8234..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/file_manager/README.html +++ /dev/null @@ -1,373 +0,0 @@ - - - - - - - - - - - - - FileManager设计文档 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        FileManager设计文档

                                        -
                                        -

                                        目标

                                        -

                                        在本文档中,我们设计说明了名为FileManager系统,方便用户上传自己的训练数据以进行分布式训练

                                        -

                                        主要功能包括:

                                        -
                                          -
                                        • 提供常用的命令行管理命令管理文件和目录
                                        • -
                                        • 支持大文件的断点上传、下载
                                        • -
                                        -
                                        -
                                        -

                                        名词解释

                                        -
                                          -
                                        • PFS:是Paddlepaddle cloud File System的缩写,是对用户文件存储空间的抽象,与之相对的是local filesystem。目前我们用CephFS来搭建。
                                        • -
                                        • CephFS:一个POSIX兼容的文件系统。
                                        • -
                                        • Chunk:逻辑划上文件分块的单位。
                                        • -
                                        -
                                        -
                                        -

                                        模块

                                        -
                                        -

                                        架构图

                                        -

                                        -
                                        -
                                        -

                                        PFSClient

                                        -
                                          -
                                        • 功能: 详细设计link
                                            -
                                          • 提供用户管理文件的命令
                                          • -
                                          • 需要可以跨平台执行
                                          • -
                                          -
                                        • -
                                        • 双向验证PFSClient需要和Ingress之间做双向验证tls,所以用户需要首先在cloud.paddlepaddle.org上注册一下,申请用户空间,并且把系统生成的CA(certificate authority)、Key、CRT(CA signed certificate)下载到本地,然后才能使用PFSClient。
                                        • -
                                        -
                                        -
                                        -

                                        Ingress

                                        -
                                          -
                                        • 功能:提供七层协议的反向代理、基于粘性会话的负载均衡功能。
                                        • -
                                        • 透传用户身份的办法Ingress需要把PFSClient的身份信息传给PFSServer,配置的方法参考link
                                        • -
                                        -
                                        -
                                        -

                                        PFSServer

                                        -

                                        PFSServer提供RESTful API接口,接收处理PFSClient端的文件管理请求,并且把结果返回PFSClient端。

                                        -

                                        RESTful API

                                        -
                                          -
                                        • /api/v1/files
                                            -
                                          • GET /api/v1/files: Get metadata of files or directories.
                                          • -
                                          • POST /api/v1/files: Create files or directories.
                                          • -
                                          • PATCH /api/v1/files: Update files or directories.
                                          • -
                                          • DELETE /api/v1/files: Delete files or directories.
                                          • -
                                          -
                                        • -
                                        • /api/v1/file/chunks
                                            -
                                          • GET /api/v1/storage/file/chunks: Get chunks’s metadata of a file.
                                          • -
                                          -
                                        • -
                                        • /api/v1/storage/files
                                            -
                                          • GET /api/v1/storage/files: Download files or directories.
                                          • -
                                          • POST /api/v1/storage/files: Upload files or directories.
                                          • -
                                          -
                                        • -
                                        • /api/v1/storage/file/chunks
                                            -
                                          • GET /api/v1/storage/file/chunks: Download chunks’s data.
                                          • -
                                          • POST /api/v1/storage/file/chunks: Upload chunks’s data.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        文件传输优化

                                        -
                                        -

                                        分块文件传输

                                        -

                                        用户文件可能是比较大的,上传到Cloud或者下载到本地的时间可能比较长,而且在传输的过程中也可能出现网络不稳定的情况。为了应对以上的问题,我们提出了Chunk的概念,一个Chunk由所在的文件偏移、数据、数据长度及校验值组成。文件的上传和下载都是通过对Chunk的操作来实现的。由于Chunk比较小(默认256K),完成一个传输动作完成的时间也比较短,不容易出错。PFSClient需要在传输完毕最后一个Chunk的时候检查destination文件的MD5值是否和source文件一致。

                                        -

                                        一个典型的Chunk如下所示:

                                        -
                                        type Chunk struct {
                                        -    fileOffset int64
                                        -    checksum uint32
                                        -    len     uint32
                                        -    data    []byte
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        生成sparse文件

                                        -

                                        当destination文件不存在或者大小和source文件不一致时,可以用Fallocate生成sparse文件,然后就可以并发写入多个Chunk。

                                        -
                                        -
                                        -

                                        覆盖不一致的部分

                                        -

                                        文件传输的的关键在于需要PFSClient端对比source和destination的文件Chunks的checksum是否保持一致,不一致的由PFSClient下载或者传输Chunk完成。这样已经传输成功的部分就不用重新传输了。

                                        -
                                        -
                                        -
                                        -

                                        用户使用流程

                                        -

                                        参考link

                                        -
                                        -
                                        -

                                        框架生成

                                        -

                                        swagger生成PFSClient和PFSServer的框架部分,以便我们可以把更多的精力放到逻辑本身上。

                                        -
                                        - -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/file_manager/pfs/pfsclient.html b/develop/doc_cn/design/file_manager/pfs/pfsclient.html deleted file mode 100644 index 61fd0d04c040aa4465797ad5edaf58276d8bfd11..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/file_manager/pfs/pfsclient.html +++ /dev/null @@ -1,399 +0,0 @@ - - - - - - - - - - - - - PFSClient — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        PFSClient

                                        -
                                        -

                                        Description

                                        -

                                        The pfs command is a Command Line Interface to manage your files on PaddlePaddle Cloud

                                        -
                                        -
                                        -

                                        Synopsis

                                        -
                                        paddle [options] pfs <subcommand> [parameters]
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Options

                                        -
                                        --profile (string)
                                        -    Use a specific profile from your credential file.
                                        -
                                        ---help (string)
                                        -    Display more information about command
                                        -
                                        ---version
                                        -    Output version information and exit
                                        -
                                        ---debug
                                        -    Show detailed debugging log 
                                        -    
                                        ---only-show-errors (boolean) 
                                        -    Only errors and warnings are displayed. All other output is suppressed.
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Path Arguments

                                        -

                                        When using a command, we need to specify path arguments. There are two path argument type: localpath and pfspath.

                                        -

                                        A pfspath begin with /pfs, eg: /pfs/$DATACENTER/home/$USER/folder.

                                        -

                                        Here is how to config datacenters.

                                        -
                                        -
                                        -

                                        order of Path Arguments

                                        -

                                        Commonly, if there are two path arguments, the first is the source, and the second is the destination.

                                        -
                                        -
                                        -

                                        Subcommonds

                                        -
                                          -
                                        • rm - remove files or directories
                                        • -
                                        -
                                        Synopsis:
                                        -    rm [-r] [-v] <PFSPath> ...
                                        -
                                        -Options:
                                        -    -r 
                                        -        Remove directories and their contents recursively 
                                        -    -v      
                                        -        Cause rm to be verbose, showing files after they are removed.
                                        -    
                                        -Examples:
                                        -    paddle pfs rm /pfs/$DATACENTER/home/$USER/file
                                        -    paddle pfs rm -r /pfs/$DATACENTER/home/$USER/folder
                                        -
                                        -
                                        -
                                          -
                                        • mv - move (rename) files
                                        • -
                                        -
                                        Synopsis:
                                        -    mv [-f | -n] [-v] <LocalPath> <PFSPath>
                                        -    mv [-f | -n] [-v] <LocalPath> ... <PFSPath>
                                        -    mv [-f | -n] [-v] <PFSPath> <LocalPath> 
                                        -    mv [-f | -n] [-v] <PFSPath> ... <LocalPath> 
                                        -    mv [-f | -n] [-v] <PFSPath> <PFSPath> 
                                        -    mv [-f | -n] [-v] <PFSPath> ... <PFSPath> 
                                        -    
                                        -Options:
                                        -    -f      
                                        -        Do not prompt for confirmation before overwriting the destination path.  (The -f option overrides previous -n options.)
                                        -    -n      
                                        -        Do not overwrite an existing file.  (The -n option overrides previous -f options.)
                                        -    -v      
                                        -        Cause mv to be verbose, showing files after they are moved.
                                        -        
                                        -Examples:
                                        -    paddle pfs mv ./text1.txt /pfs/$DATACENTER/home/$USER/text1.txt
                                        -
                                        -
                                        -
                                          -
                                        • cp - copy files or directories
                                        • -
                                        -
                                        Synopsis:
                                        -    cp [-r] [-f | -n] [-v] [--preserve--links] <LocalPath> <PFSPath>
                                        -    cp [-r] [-f | -n] [-v] [--preserve--links] <LocalPath> ... <PFSPath>
                                        -    cp [-r] [-f | -n] [-v] [--preserve--links] <PFSPath> <LocalPath> 
                                        -    cp [-r] [-f | -n] [-v] [--preserve--links] <PFSPath> ... <LocalPath>
                                        -    cp [-r] [-f | -n] [-v] [--preserve--links] <PFSPath> <PFSPath> 
                                        -    cp [-r] [-f | -n] [-v] [--preserve--links] <PFSPath> ... <PFSPath>
                                        -
                                        -Options:
                                        -    -r
                                        -        Copy directories recursively
                                        -    -f      
                                        -        Do not prompt for confirmation before overwriting the destination path.  (The -f option overrides previous -n options.)
                                        -    -n      
                                        -        Do not overwrite an existing file.  (The -n option overrides previous -f options.)
                                        -    -v      
                                        -        Cause cp to be verbose, showing files after they are copied.
                                        -    --preserve--links
                                        -       Reserve links when copy links
                                        -       
                                        -Examples:
                                        -    paddle pfs cp ./file /pfs/$DATACENTER/home/$USER/file
                                        -    paddle pfs cp /pfs/$DATACENTER/home/$USER/file ./file
                                        -
                                        -
                                        -
                                          -
                                        • ls- list files
                                        • -
                                        -
                                        Synopsis:
                                        -    ls [-r] <PFSPath> ...
                                        -    
                                        -Options:
                                        -    -R
                                        -        List directory(ies) recursively
                                        -
                                        -Examples:
                                        -    paddle pfs ls  /pfs/$DATACENTER/home/$USER/file
                                        -    paddle pfs ls  /pfs/$DATACENTER/home/$USER/folder
                                        -
                                        -
                                        -
                                          -
                                        • mkdir - mkdir directory(ies) -Create intermediate directory(ies) as required.
                                        • -
                                        -
                                        Synopsis:
                                        -    mkdir <PFSPath> ...
                                        -
                                        -Examples:
                                        -    paddle pfs mkdir  /pfs/$DATACENTER/home/$USER/folder
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/float16.html b/develop/doc_cn/design/float16.html deleted file mode 100644 index f65440a8a15ef6ea58f00905db22d691779b04f0..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/float16.html +++ /dev/null @@ -1,378 +0,0 @@ - - - - - - - - - - - - - Design Doc: float16 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: float16

                                        -
                                        -

                                        Why float16

                                        -

                                        Half precision (float16) is a binary floating-point format that occupies 16 bits in memory. float16 is half the size of traditional 32-bit single precision format (float) and has lower precision and smaller range.

                                        -

                                        When high precision computation is not required, using float16 data type could potentially

                                        -
                                          -
                                        • reduce storage space, memory bandwidth, and power usages;
                                        • -
                                        • increase the chance of data fitting into a smaller cache of lower latency;
                                        • -
                                        • provide arithmetic speed up if supported by hardware.
                                        • -
                                        -
                                        -
                                        -

                                        Survey of current float16 support

                                        -

                                        A brief survey of float16 support on different compilers, hardwares, and libraries can be found below. Interested readers can refer to link1 and link2 for more info.

                                        -

                                        The goal of float16 is to serve as a key for the executor to find and run the correct version of compute method specialized for float16 in operator kernel. It should be compatible with various natively supported float16 implementations including __half for cuda, float16_t for ARM, and Eigen::half for Eigen to make writing customized float16 kernels easier.

                                        -
                                        -

                                        Compiler

                                        -
                                          -
                                        • nvcc supports __half data type after CUDA 7.5.
                                        • -
                                        • __fp16 or float16_t is supported as storage type for gcc >= 6.1 and clang >= 3.4.
                                        • -
                                        • __fp16 or float16_t is supported as arithmetic type for gcc >= 7.1 and clang >= 3.9.
                                        • -
                                        -
                                        -
                                        -

                                        Hardware

                                        -
                                          -
                                        • __half is supported on GPU with compute capability >= 5.3.
                                        • -
                                        • __fp16 is supported as storage type for ARMv7-A, ARMv8-A, and above.
                                        • -
                                        • __fp16 is supported as arithmetic type after ARMv8.2-A (currently, the only microarchitecture implementing ARMv8.2-A is ARM Cortex-A75, which is announced in May 2017. There seems to be no application processors currently available on market that adopts this architecture. It is reported that Qualcomm Snapdragon 845 uses Cortex-A75 design and will be available in mobile devices in early 2018).
                                        • -
                                        -
                                        -
                                        -

                                        Libraries

                                        -
                                          -
                                        • Eigen >= 3.3 supports float16 calculation on both GPU and CPU using the Eigen::half class. It is mostly useful for Nvidia GPUs because of the overloaded arithmetic operators using cuda intrinsics. It falls back to using software emulation on CPU for calculation and there is no special treatment to ARM processors.
                                        • -
                                        • ARM compute library >= 17.02.01 supports NEON FP16 kernels (requires ARMv8.2-A CPU).
                                        • -
                                        -
                                        -
                                        -

                                        CUDA version issue

                                        -

                                        There are currently three versions of CUDA that supports __half data type, namely, CUDA 7.5, 8.0, and 9.0. -CUDA 7.5 and 8.0 define __half as a simple struct that has a uint16_t data (see cuda_fp16.h) as follows:

                                        -
                                        typedef struct __align__(2) {
                                        -   unsigned short x;
                                        -} __half;
                                        -
                                        -typedef __half half;
                                        -
                                        -
                                        -

                                        This struct does not define any overloaded arithmetic operators. So you have to directly use __hadd instead of + to correctly add two half types:

                                        -
                                        __global__ void Add() {
                                        -  half a, b, c;
                                        -  c = __hadd(a, b); // correct
                                        -  c = a + b; // compiler error: no operator "+" matches these operands
                                        -}
                                        -
                                        -
                                        -

                                        CUDA 9.0 provides a major update to the half data type. The related code can be found in the updated cuda_fp16.h and the newly added cuda_fp16.hpp.

                                        -

                                        Essentially, CUDA 9.0 renames the original __half type in 7.5 and 8.0 as __half_raw, and defines a new __half class type that has constructors, conversion operators, and also provides overloaded arithmetic operators such as follows:

                                        -
                                        typedef struct __CUDA_ALIGN__(2) {
                                        -    unsigned short x;
                                        -} __half_raw;
                                        -
                                        -
                                        -struct __CUDA_ALIGN__(2) __half {
                                        -protected:
                                        -    unsigned short __x;
                                        -public:
                                        -    // constructors and conversion operators from/to 
                                        -    // __half_raw and other built-in data types
                                        -}
                                        -
                                        -typedef __half half;
                                        -
                                        -__device__ __forceinline__ 
                                        -__half operator+(const __half &lh, const __half &rh) { 
                                        -    return __hadd(lh, rh); 
                                        -}
                                        -
                                        -// Other overloaded operators
                                        -
                                        -
                                        -

                                        This new design makes c = a + b work correctly for CUDA half data type.

                                        -
                                        -
                                        -
                                        -

                                        Implementation

                                        -

                                        The float16 class holds a 16-bit uint16_t data internally.

                                        -
                                        struct float16 {
                                        -  uint16_t x;
                                        -};
                                        -
                                        -
                                        -

                                        float16 supports the following features:

                                        -
                                          -
                                        • constructors / assignment operators that take input from primitive data types including bool, integers of various length, float, and double.
                                        • -
                                        • constructors / assignment operators that take input from __half on cuda, float16_t on ARM, and Eigen::half on Eigen.
                                        • -
                                        • conversion operators to primitive data types and half precision data types on cuda, ARM and Eigen.
                                        • -
                                        • overloaded arithmetic operators for cuda, arm, and non-arm cpu, respectively. These operators will take advantage of the cuda and ARM intrinsics on the corresponding hardware.
                                        • -
                                        -

                                        To support the above features, two fundamental conversion functions are provided:

                                        -
                                        float16 float_to_half_rn(float f);  // convert to half precision in round-to-nearest-even mode
                                        -float half_to_float(float16 h);
                                        -
                                        -
                                        -

                                        which provides one-to-one conversion between float32 and float16. These twos functions will do different conversion routines based on the current hardware. CUDA/ARM instrinsics will be used when the corresonding hardware is available. If the hardware or compiler level does not support float32 to float16 conversion, software emulation will be performed to do the conversion.

                                        -
                                        -
                                        -

                                        To do

                                        -

                                        After float16 class is available, some of the future items are below:

                                        -
                                          -
                                        • Update pybind/tensor_py.h to bind c++ float16 with numpy float16.
                                        • -
                                        • Modify GetKernelType() method in framework/operator.h to make it compatible with float16.
                                        • -
                                        • Create a type-casting operator that can convert the data type in tensor between float16 and other types.
                                        • -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/fluid.html b/develop/doc_cn/design/fluid.html deleted file mode 100644 index 1b426d656242cb9b506bcad48b5b94ae091ce90d..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/fluid.html +++ /dev/null @@ -1,358 +0,0 @@ - - - - - - - - - - - - - Design Doc: PaddlePaddle Fluid — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: PaddlePaddle Fluid

                                        -
                                        -

                                        Why Fluid

                                        -

                                        When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system at the time was Caffe. However, when PaddlePaddle was open-sourced in 2016, many other choices were available. There was a challenge – what is the need for open sourcing yet another deep learning framework?

                                        -

                                        Fluid is the answer. Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the “process” of training or inference using the concept of a model. In fact in PyTorch, TensorFlow Eager Execution and Fluid, there is no concept of a model at all. The details are covered in the sections below. Fluid is currently more extreme in the above mentioned idea than PyTorch and Eager Execution, and we are trying to push Fluid towards the directions of a compiler and a new programming language for deep learning.

                                        -
                                        -
                                        -

                                        The Evolution of Deep Learning Systems

                                        -

                                        Deep learning infrastructure is one of the fastest evolving technologies. Within four years, there have already been three generations of technologies invented.

                                        -

                                        | Existed since | model as sequence of layers | model as graph of operators | No model | -|–|–|–|–| -| 2013 | Caffe, Theano, Torch, PaddlePaddle | | | -| 2015 | | TensorFlow, MxNet, Caffe2, ONNX, n-graph | | -| 2016 | | | PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid |

                                        -

                                        From the above table, we see that the deep learning technology is evolving towards getting rid of the concept of a model. To understand the reasons behind this direction, a comparison of the programming paradigms or the ways to program deep learning applications using these systems, would be helpful. The following section goes over these.

                                        -
                                        -
                                        -

                                        Deep Learning Programming Paradigms

                                        -

                                        With the systems listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following:

                                        -
                                        x = layer.data("image")
                                        -l = layer.data("label")
                                        -f = layer.fc(x, W)
                                        -s = layer.softmax(f)
                                        -c = layer.mse(l, s)
                                        -
                                        -for i in xrange(1000): # train for 1000 iterations
                                        -    m = read_minibatch()
                                        -    forward({input=x, data=m}, minimize=c)
                                        -    backward(...)
                                        -
                                        -print W # print the trained model parameters.
                                        -
                                        -
                                        -

                                        The above program includes two parts:

                                        -
                                          -
                                        1. The first part describes the model, and
                                        2. -
                                        3. The second part describes the training process (or inference process) for the model.
                                        4. -
                                        -

                                        This paradigm has a well-known problem that limits the productivity of programmers. If the programmer made a mistake in configuring the model, the error messages wouldn’t show up until the second part is executed and forward and backward propagations are performed. This makes it difficult for the programmer to debug and locate a mistake that is located blocks away from the actual error prompt.

                                        -

                                        This problem of being hard to debug and re-iterate fast on a program is the primary reason that programmers, in general, prefer PyTorch over the older systems. Using PyTorch, we would write the above program as following:

                                        -
                                        W = tensor(...)
                                        -
                                        -for i in xrange(1000): # train for 1000 iterations
                                        -    m = read_minibatch()
                                        -    x = m["image"]
                                        -    l = m["label"]
                                        -    f = layer.fc(x, W)
                                        -    s = layer.softmax(f)
                                        -    c = layer.mse(l, s)
                                        -    backward()
                                        -
                                        -print W # print the trained model parameters.
                                        -
                                        -
                                        -

                                        We can see that the main difference is the moving the model configuration part (the first step) into the training loop. This change would allow the mistakes in model configuration to be reported where they actually appear in the programming block. This change also represents the model better, or its forward pass, by keeping the configuration process in the training loop.

                                        -
                                        -
                                        -

                                        Describe Arbitrary Models for the Future

                                        -

                                        Describing the process instead of the model also brings Fluid, the flexibility to define different non-standard models that haven’t been invented yet.

                                        -

                                        As we write out the program for the process, we can write an RNN as a loop, instead of an RNN as a layer or as an operator. A PyTorch example would look like the following:

                                        -
                                        for i in xrange(1000):
                                        -    m = read_minibatch()
                                        -    x = m["sentence"]
                                        -    for t in xrange x.len():
                                        -        h[t] = the_step(x[t])
                                        -
                                        -
                                        -

                                        With Fluid, the training loop and the RNN in the above program are not really Python loops, but just a “loop structure” provided by Fluid and implemented in C++ as the following:

                                        -
                                        train_loop = layers.While(cond)
                                        -with train_loop.block():
                                        -  m = read_minibatch()
                                        -  x = m["sentence"]
                                        -  rnn = layers.While(...)
                                        -  with rnn.block():
                                        -    h[t] = the_step(input[t])
                                        -
                                        -
                                        -

                                        An actual Fluid example is described here.

                                        -

                                        From the example, the Fluid programs look very similar to their PyTorch equivalent programs, except that Fluid’s loop structure, wrapped with Python’s with statement, could run much faster than just a Python loop.

                                        -

                                        We have more examples of the if-then-else structure of Fluid.

                                        -
                                        -
                                        -

                                        Turing Completeness

                                        -

                                        In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine. For a programming language, if it provides if-then-else and loop, it is Turing complete. From the above examples, Fluid seems to be Turing complete; however, it is noteworthy to notice that there is a slight difference between the if-then-else of Fluid and that of a programming language. The difference being that the former runs both of its branches and splits the input mini-batch into two – one for the True condition and another for the False condition. This hasn’t been researched in depth if this is equivalent to the if-then-else in programming languages that makes them Turing-complete. Based on a conversation with Yuang Yu, it seems to be the case but this needs to be looked into in-depth.

                                        -
                                        -
                                        -

                                        The Execution of a Fluid Program

                                        -

                                        There are two ways to execute a Fluid program. When a program is executed, it creates a protobuf message ProgramDesc that describes the process and is conceptually like an abstract syntax tree.

                                        -

                                        There is a C++ class Executor, which runs a ProgramDesc, similar to how an interpreter runs a Python program.

                                        -

                                        Fluid is moving towards the direction of a compiler, which is explain in fluid.

                                        -
                                        -
                                        -

                                        Backward Compatibility of Fluid

                                        -

                                        Given all the advantages from the removal of the concept of a model, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as n-graph. Similarly, Movidius is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known ONNX is also a file format of graphs of operators.

                                        -

                                        For Fluid, we can write a converter that extracts the parts in the ProgramDesc protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/fluid_compiler.html b/develop/doc_cn/design/fluid_compiler.html deleted file mode 100644 index 7b1a0f8a85d4b7dfdab9d265e6afb749908c7a88..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/fluid_compiler.html +++ /dev/null @@ -1,360 +0,0 @@ - - - - - - - - - - - - - PaddlePaddle Fluid: Towards a Compiled Programming Language — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        -
                                          -
                                        • Docs »
                                        • - -
                                        • PaddlePaddle Fluid: Towards a Compiled Programming Language
                                        • -
                                        • - - - View page source - - -
                                        • -
                                        -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        PaddlePaddle Fluid: Towards a Compiled Programming Language

                                        -

                                        As described in fluid.md, when a Fluid application program -runs, it generates a ProgramDesc protobuf message as an intermediate -representation of itself. The C++ class Executor can run this -protobuf message as an interpreter. This article describes the Fluid -compiler.

                                        -

                                        -
                                        -

                                        ProgramDesc

                                        -

                                        Before we go deeper into the idea of compiled language, let us take a -look at a simple example Fluid application.

                                        -
                                        import "fluid"
                                        -
                                        -func paddlepaddle() {
                                        -  X = fluid.read(...)
                                        -  W = fluid.Tensor(...)
                                        -  Y = fluid.mult(X, W)
                                        -}
                                        -
                                        -
                                        -

                                        This program consists of a block of three operators – -read, assign, and mult. Its ProgramDesc message looks like -the following

                                        -
                                        message ProgramDesc {
                                        -  block[0] = Block {
                                        -    vars = [X, W, Y],
                                        -    ops = [
                                        -      read(output = X)
                                        -      assign(input = ..., output = W)
                                        -      mult(input = {X, W}, output = Y)
                                        -    ],
                                        -  }
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Transpilers

                                        -

                                        We can write a transpiler program that takes a ProgramDesc, e.g., -the above one, and outputs another ProgramDesc. Let us take some -examples:

                                        -
                                          -
                                        1. Memory optimization transpiler: We can write a transpiler that -inserts some FreeMemoryOps in the above example ProgramDesc so -to free memory early, before the end of an iteration, so to keep a -small memory footprint.
                                        2. -
                                        3. Distributed training transpiler: We can write a transpiler that -converts aProgramDesc into its distributed version of two -ProgramDescs – one for running by the trainer processes and the -other for the parameter server.
                                        4. -
                                        -

                                        In the rest of this article, we talk about a special kind of -transpiler, Native code generator, which takes a ProgramDesc and -generates a .cu (or .cc) file, which could be built by C++ -compilers (gcc, nvcc, icc) into binaries.

                                        -
                                        -
                                        -

                                        Native Code Generator

                                        -

                                        For the above example, the native code generator transpiler, say, the -CUDA code generator, should generate a main function:

                                        -
                                        void main() {
                                        -  auto X = fluid_cuda_read(...);
                                        -  auto W = fluid_cuda_create_tensor(...);
                                        -  auto Y = fluid_cuda_mult(X, W);
                                        -}
                                        -
                                        -
                                        -

                                        and the definitions of functions fluid_cuda_read, -fluid_cuda_create_tensor, and fluid_cuda_mult. Please be aware -that each function could just define a C++ instance of an operator and -run it. For example

                                        -
                                        paddle::Tensor fluid_cuda_read(...) {
                                        -  paddle::Tensor t;
                                        -  paddle::operator::Read r(&t, ...);
                                        -  r.Run();
                                        -  return t;
                                        -}
                                        -
                                        -
                                        -

                                        For computational operators that have multiple kernels, each for a -specific hardware platform, for example, the mult operator, the -generated code should call its CUDA kernel:

                                        -
                                        paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a, 
                                        -                               const paddle::Tensor& b) {
                                        -  paddle::Tensor t;
                                        -  paddle::operator::Mult m(a, b, ...);
                                        -  Mult.Run(cuda_context);
                                        -}
                                        -
                                        -
                                        -

                                        where cuda_context could be a global variable of type -paddle::CUDADeviceContext.

                                        -
                                        -
                                        -

                                        Multi-Block Code Generation

                                        -

                                        Most Fluid application programs may have more than one blocks. To -execute them, we need to trace scopes.

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/functions_operators_layers.html b/develop/doc_cn/design/functions_operators_layers.html deleted file mode 100644 index 0bf43baabca91c3d4926c1ef101bda277d511c16..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/functions_operators_layers.html +++ /dev/null @@ -1,339 +0,0 @@ - - - - - - - - - - - - - Design Doc: Functions, Operators, and Layers — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Functions, Operators, and Layers

                                        -

                                        In a DL system, we can compose one or more fine grained operators into a coarse grained one. For example, the FC layer can be composed of a multiplication operator and an add operator.

                                        -

                                        Historically, some fine grained operations are known as operators, and some coarse level ones are known as layers. But we need a well-defined separation.

                                        -

                                        In general, operators are those very fine grained operations, e.g., mul and add. In the implementation, we can write them as C++ functions:

                                        -
                                        template <typename T> T add(T x, T y) { return x + y; }
                                        -template <typename T> T mul(T x, T y) { return x * y; }
                                        -
                                        -
                                        -

                                        Then we can wrap them into operators which are C++ classes and can be created from Python bindings by name. A C macro can do this. For example, the following macro invocation

                                        -
                                        #define MAKE_FUNCTION_OPERATOR(mul);
                                        -
                                        -
                                        -

                                        generates

                                        -
                                        template <typename T> class mulOp : public OperatorBase {...};
                                        -REGISTER_OP(mulOp<float32>, "mul");
                                        -
                                        -
                                        -

                                        so that in Python we can create operator mul by:

                                        -
                                        X1 = Var()
                                        -X2 = Var()
                                        -Y = Var()
                                        -paddle.cpp.create_operator("mul", input=[X1, X2], output=Y)
                                        -
                                        -
                                        -

                                        Also, at the same time, we can compose a coarse level C++ operator class by composing functions mul and add:

                                        -
                                        template <typename T>
                                        -class FCOp : public OperatorBase {
                                        - public:
                                        -  void Run(...) {
                                        -    add(mul(Input<T>("X"), Input<T>("W")), Input<T>("b");
                                        -  }
                                        -};
                                        -REGISTER_OP(FCOp, "fc");
                                        -
                                        -
                                        -

                                        We need to support such composition in Python as well. To do so, we need a higher level Python wrapping of operator creation than paddle.cpp.create_operator. This higher level operator API should be compatible with the layer API.

                                        -

                                        Let’s explain using an example. Suppose that we are going to compose the FC using mul and add in Python, we’d like to have Python functions mul and add defined in module operator:

                                        -
                                        def operator.mul(X1, X2):
                                        -    O = Var()
                                        -    paddle.cpp.create_operator("mul", input={X1, Y1}, output=O)
                                        -    return O
                                        -
                                        -def operator.add(X1, X2):
                                        -    O = Var()
                                        -    paddle.cpp.create_operator("add", input={X1, X2}, output=O)
                                        -    return O
                                        -
                                        -
                                        -

                                        Above code snippets are automatically generated. Given them, users can define

                                        -
                                        def layer.fc(X):
                                        -    W = Var()
                                        -    b = Var()
                                        -    return operator.add(operator.mul(X, W), b)
                                        -
                                        -
                                        -

                                        If we don’t have operator.mul and operator.add, the definiton of layer.fc would be complicated:

                                        -
                                        def layer.fc(X):
                                        -    W = Var()
                                        -    b = Var()
                                        -    O1 = Var()
                                        -    paddle.cpp.create_operator("mul", input=[X, W], output=O1)
                                        -    O2 = Var()
                                        -    paddle.cpp.create_operator("add", input=[O1, b], output=O2)
                                        -    return O2
                                        -
                                        -
                                        -

                                        We’d like to have Python bindings to operators in package paddle.operator, and Python compositions of operators in package paddle.layer. So we have the following concepts in above illustrative example:

                                        -

                                        | C++ functions/functors | mul | add | | | -|————————|————–|————–|————-|———-| -| C++ operator class | mulOp | addOp | FCOp | | -| Python binding | operator.mul | operator.add | operator.fc | | -| Python function | | | | layer.fc |

                                        -

                                        This is how we differentiate layer and operators in PaddlePaddle:

                                        -
                                          -
                                        • those defined in C++ and have a lightweighted Python wrapper in module operators are operators; whereas
                                        • -
                                        • those who don’t have C++ implementations but a Python implementation that compose C++ operators are known as layers.
                                        • -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/gan_api.html b/develop/doc_cn/design/gan_api.html deleted file mode 100644 index 2be8c4c38c2e0ddcf3bffa3581697dd18a77b27c..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/gan_api.html +++ /dev/null @@ -1,526 +0,0 @@ - - - - - - - - - - - - - Design for GAN — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design for GAN

                                        -

                                        GAN (General Adversarial Net [https://arxiv.org/abs/1406.2661]) is an important model for unsupervised learning and widely used in many areas.

                                        -

                                        It applies several important concepts in machine learning system design, including building and running subgraphs, dependency tracing, different optimizers in one executor and so forth.

                                        -

                                        In our GAN design, we wrap it as a user-friendly easily customized python API to design different models. We take the conditional DC-GAN (Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [https://arxiv.org/abs/1511.06434]) as an example due to its good performance on image generation.

                                        -

                                        -
                                        -Figure 1. The overall running logic of GAN. The black solid arrows indicate the forward pass; the green dashed arrows indicate the backward pass of generator training; the red dashed arrows indicate the backward pass of the discriminator training. The BP pass of the green (red) arrow should only update the parameters in the green (red) boxes. The diamonds indicate the data providers. d\_loss and g\_loss marked in red and green are the two targets we would like to run. -

                                        The operators, layers and functions required/optional to build a GAN demo is summarized in https://github.com/PaddlePaddle/Paddle/issues/4563.

                                        -

                                        -
                                        -Figure 2. Photo borrowed from the original DC-GAN paper. -

                                        -

                                        The Conditional-GAN might be a class.

                                        -

                                        This design we adopt the popular open source design in https://github.com/carpedm20/DCGAN-tensorflow and https://github.com/rajathkmp/DCGAN. It contains following data structure:

                                        -
                                          -
                                        • DCGAN(object): which contains everything required to build a GAN model. It provides following member functions methods as API:
                                        • -
                                        • init(...): Initialize hyper-parameters (like conv dimension and so forth), and declare model parameters of discriminator and generator as well.
                                        • -
                                        • generator(z, y=None): Generate a fake image from input noise z. If the label y is provided, the conditional GAN model will be chosen. -Returns a generated image.
                                        • -
                                        • discriminator(image): -Given an image, decide if it is from a real source or a fake one. -Returns a 0/1 binary label.
                                        • -
                                        • build_model(self): -build the whole GAN model, define training loss for both generator and discrimator.
                                        • -
                                        -
                                        -
                                        -

                                        Discussion on Engine Functions required to build GAN

                                        -
                                          -
                                        • Trace the tensor and variable dependency in the engine executor. (Very critical, otherwise GAN can’be be trained correctly)
                                        • -
                                        • Different optimizers responsible for optimizing different loss.
                                        • -
                                        -

                                        To be more detailed, we introduce our design of DCGAN as following:

                                        -
                                        -

                                        Class member Function: Initializer

                                        -
                                          -
                                        • Set up hyper-parameters, including condtional dimension, noise dimension, batch size and so forth.
                                        • -
                                        • Declare and define all the model variables. All the discriminator parameters are included in the list self.theta_D and all the generator parameters are included in the list self.theta_G.
                                        • -
                                        -
                                        class DCGAN(object):
                                        -  def __init__(self, y_dim=None):
                                        -  
                                        -    # hyper parameters  
                                        -    self.y_dim = y_dim # conditional gan or not
                                        -    self.batch_size = 100
                                        -    self.z_dim = z_dim # input noise dimension
                                        -
                                        -    # define parameters of discriminators
                                        -    self.D_W0 = pd.Variable(shape=[3,3, 1, 128], data=pd.gaussian_normal_randomizer())
                                        -    self.D_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a  numpy data
                                        -    self.D_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer())
                                        -    self.D_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a  numpy data
                                        -    self.D_W2 = pd.Varialble(np.random.rand(128, 1))
                                        -    self.D_b2 = pd.Variable(np.zeros(128))
                                        -    self.theta_D = [self.D_W0, self.D_b0, self.D_W1, self.D_b1, self.D_W2, self.D_b2]
                                        -
                                        -    # define parameters of generators
                                        -    self.G_W0 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer())
                                        -    self.G_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a  numpy data
                                        -    self.G_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer())
                                        -    self.G_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a  numpy data
                                        -    self.G_W2 = pd.Varialble(np.random.rand(128, 1))
                                        -    self.G_b2 = pd.Variable(np.zeros(128))
                                        -    self.theta_G = [self.G_W0, self.G_b0, self.G_W1, self.G_b1, self.G_W2, self.G_b2]
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Class member Function: Generator

                                        -
                                          -
                                        • Given a noisy input z, returns a fake image.
                                        • -
                                        • Concatenation, batch-norm, FC operations required;
                                        • -
                                        • Deconv layer required, which is missing now...
                                        • -
                                        -
                                        class DCGAN(object):
                                        -  def generator(self, z, y = None):
                                        -    # input z: the random noise
                                        -    # input y: input data label (optional)
                                        -    # output G_im: generated fake images
                                        -    
                                        -    if not self.y_dim:
                                        -      z = pd.layer.concat(1, [z, y])
                                        -      
                                        -    G_h0 = pd.layer.fc(z, self.G_w0, self.G_b0)
                                        -    G_h0_bn = pd.layer.batch_norm(G_h0)
                                        -    G_h0_relu = pd.layer.relu(G_h0_bn)
                                        -    
                                        -    G_h1 = pd.layer.deconv(G_h0_relu, self.G_w1, self.G_b1)
                                        -    G_h1_bn = pd.layer.batch_norm(G_h1)
                                        -    G_h1_relu = pd.layer.relu(G_h1_bn)
                                        -    
                                        -    G_h2 = pd.layer.deconv(G_h1_relu, self.G_W2, self.G_b2))
                                        -    G_im = pd.layer.tanh(G_im)
                                        -    return G_im
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Class member function: Discriminator

                                        -
                                          -
                                        • Given a noisy input z, returns a fake image.
                                        • -
                                        • Concatenation, Convolution, batch-norm, FC, Leaky-ReLU operations required;
                                        • -
                                        -
                                        class DCGAN(object):
                                        -  def discriminator(self, image):
                                        -    # input image: either generated images or real ones
                                        -    # output D_h2: binary logit of the label
                                        -
                                        -    D_h0 = pd.layer.conv2d(image, w=self.D_w0, b=self.D_b0)
                                        -    D_h0_bn = pd.layer.batchnorm(h0)
                                        -    D_h0_relu = pd.layer.lrelu(h0_bn)
                                        -    
                                        -    D_h1 = pd.layer.conv2d(D_h0_relu, w=self.D_w1, b=self.D_b1)
                                        -    D_h1_bn = pd.layer.batchnorm(D_h1)
                                        -    D_h1_relu = pd.layer.lrelu(D_h1_bn)
                                        -    
                                        -    D_h2 = pd.layer.fc(D_h1_relu, w=self.D_w2, b=self.D_b2)
                                        -    return D_h2
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Class member function: Build the model

                                        -
                                          -
                                        • Define data readers as placeholders to hold the data;
                                        • -
                                        • Build generator and discriminators;
                                        • -
                                        • Define two training losses for discriminator and generator, respectively. -If we have execution dependency engine to back-trace all tensors, the module building our GAN model will be like this:
                                        • -
                                        -
                                        class DCGAN(object):
                                        -  def build_model(self):
                                        -    if self.y_dim:
                                        -        self.y = pd.data(pd.float32, [self.batch_size, self.y_dim])
                                        -    self.images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size])
                                        -    self.faked_images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size])
                                        -    self.z = pd.data(tf.float32, [None, self.z_size])
                                        -    
                                        -    # step 1: generate images by generator, classify real/fake images with discriminator
                                        -    if self.y_dim: # if conditional GAN, includes label
                                        -        self.G = self.generator(self.z, self.y)
                                        -        self.D_t = self.discriminator(self.images)
                                        -        # generated fake images
                                        -        self.sampled = self.sampler(self.z, self.y)
                                        -        self.D_f = self.discriminator(self.G)
                                        -    else: # original version of GAN
                                        -        self.G = self.generator(self.z)
                                        -        self.D_t = self.discriminator(self.images)
                                        -        # generate fake images
                                        -        self.sampled = self.sampler(self.z)
                                        -        self.D_f = self.discriminator(self.images)
                                        -    
                                        -    # step 2: define the two losses
                                        -    self.d_loss_real = pd.reduce_mean(pd.cross_entropy(self.D_t, np.ones(self.batch_size))
                                        -    self.d_loss_fake = pd.reduce_mean(pd.cross_entropy(self.D_f, np.zeros(self.batch_size))
                                        -    self.d_loss = self.d_loss_real + self.d_loss_fake
                                        -    
                                        -    self.g_loss = pd.reduce_mean(pd.cross_entropy(self.D_f, np.ones(self.batch_szie))
                                        -
                                        -
                                        -

                                        If we do not have dependency engine but blocks, the module building our GAN model will be like this:

                                        -
                                        class DCGAN(object):
                                        -  def build_model(self, default_block):
                                        -    # input data in the default block
                                        -    if self.y_dim:
                                        -        self.y = pd.data(pd.float32, [self.batch_size, self.y_dim])
                                        -    self.images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size])
                                        -    # self.faked_images = pd.data(pd.float32, [self.batch_size, self.im_size, self.im_size])
                                        -    self.z = pd.data(tf.float32, [None, self.z_size])
                                        -
                                        -    # step 1: generate images by generator, classify real/fake images with discriminator
                                        -    with pd.default_block().g_block():
                                        -      if self.y_dim: # if conditional GAN, includes label
                                        -        self.G = self.generator(self.z, self.y)
                                        -        self.D_g = self.discriminator(self.G, self.y)
                                        -      else: # original version of GAN
                                        -        self.G = self.generator(self.z)
                                        -        self.D_g = self.discriminator(self.G, self.y)
                                        -      self.g_loss = pd.reduce_mean(pd.cross_entropy(self.D_g, np.ones(self.batch_szie))
                                        -    
                                        -    with pd.default_block().d_block():
                                        -      if self.y_dim: # if conditional GAN, includes label
                                        -        self.D_t = self.discriminator(self.images, self.y)
                                        -        self.D_f = self.discriminator(self.G, self.y)
                                        -      else: # original version of GAN
                                        -        self.D_t = self.discriminator(self.images)
                                        -        self.D_f = self.discriminator(self.G)
                                        -
                                        -      # step 2: define the two losses
                                        -      self.d_loss_real = pd.reduce_mean(pd.cross_entropy(self.D_t, np.ones(self.batch_size))
                                        -      self.d_loss_fake = pd.reduce_mean(pd.cross_entropy(self.D_f, np.zeros(self.batch_size))
                                        -      self.d_loss = self.d_loss_real + self.d_loss_fake
                                        -
                                        -
                                        -

                                        Some small confusion and problems with this design:

                                        -
                                          -
                                        • D_g and D_f are actually the same thing, but has to be written twice; i.e., if we want to run two sub-graphs conceptually, the same codes have to be written twice if they are shared by the graph.
                                        • -
                                        • Requires ability to create a block anytime, rather than in if-else or rnn only;
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Main function for the demo:

                                        -

                                        Generally, the user of GAN just need to the following things:

                                        -
                                          -
                                        • Define an object as DCGAN class;
                                        • -
                                        • Build the DCGAN model;
                                        • -
                                        • Specify two optimizers for two different losses with respect to different parameters.
                                        • -
                                        -
                                        # pd for short, should be more concise.
                                        -from paddle.v2 as pd
                                        -import numpy as np
                                        -import logging
                                        -
                                        -if __name__ == "__main__":
                                        -    # dcgan class in the default graph/block
                                        -    # if we use dependency engine as tensorflow
                                        -    # the codes, will be slightly different like:
                                        -    # dcgan = DCGAN()
                                        -    # dcgan.build_model()
                                        -    with pd.block() as def_block:
                                        -      dcgan = DCGAN()
                                        -      dcgan.build_model(def_block)
                                        -
                                        -    # load mnist data
                                        -    data_X, data_y = self.load_mnist()
                                        -    
                                        -    # Two subgraphs required!!!
                                        -    with pd.block().d_block():
                                        -      d_optim = pd.train.Adam(lr = .001, beta= .1)
                                        -      d_step = d_optim.minimize(dcgan.d_loss, dcgan.theta_D)
                                        -    with pd.block.g_block():
                                        -      g_optim = pd.train.Adam(lr = .001, beta= .1)
                                        -      g_step = pd.minimize(dcgan.g_loss, dcgan.theta_G)
                                        -
                                        -    # executor
                                        -    sess = pd.executor()
                                        -    
                                        -    # training
                                        -    for epoch in xrange(10000):
                                        -      for batch_id in range(N / batch_size):
                                        -        idx = ...
                                        -        # sample a batch
                                        -        batch_im, batch_label = data_X[idx:idx+batch_size], data_y[idx:idx+batch_size]
                                        -        # sample z
                                        -        batch_z = np.random.uniform(-1., 1., [batch_size, z_dim])
                                        -
                                        -        if batch_id % 2 == 0:
                                        -          sess.run(d_step, 
                                        -                   feed_dict = {dcgan.images: batch_im,
                                        -                                dcgan.y: batch_label,
                                        -                                dcgan.z: batch_z})
                                        -        else:
                                        -          sess.run(g_step,
                                        -                   feed_dict = {dcgan.z: batch_z})
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        More thinking about dependency engine v.s. block design:

                                        -
                                          -
                                        • What if we just want to run an intermediate result? Do we need to run the whole block/graph?
                                        • -
                                        • Should we call eval() to get the fake images in the first stage? And then train the discriminator in the second stage?
                                        • -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/graph.html b/develop/doc_cn/design/graph.html deleted file mode 100644 index 73ea1280ca273fc1e3b38e414b6750f2738ae528..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/graph.html +++ /dev/null @@ -1,320 +0,0 @@ - - - - - - - - - - - - - Design Doc: Computations as a Graph — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Computations as a Graph

                                        -

                                        A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before.

                                        -

                                        This document explains that the construction of a graph as three steps:

                                        -
                                          -
                                        • construct the forward part
                                        • -
                                        • construct the backward part
                                        • -
                                        • construct the optimization part
                                        • -
                                        -
                                        -

                                        The Construction of a Graph

                                        -

                                        Let us take the problem of image classification as a simple example. The application program that trains the model looks like:

                                        -
                                        x = layer.data("images")
                                        -l = layer.data("label")
                                        -y = layer.fc(x)
                                        -cost = layer.mse(y, l)
                                        -optimize(cost)
                                        -train(cost, reader=mnist.train())
                                        -
                                        -
                                        -
                                        -

                                        Forward Part

                                        -

                                        The first four lines of above program build the forward part of the graph.

                                        -

                                        -

                                        In particular, the first line x = layer.data("images") creates variable x and a Feed operator that copies a column from the minibatch to x. y = layer.fc(x) creates not only the FC operator and output variable y, but also two parameters, W and b, and the initialization operators.

                                        -

                                        Initialization operators are kind of “run-once” operators – the Run method increments a class data member counter so to run at most once. By doing so, a parameter wouldn’t be initialized repeatedly, say, in every minibatch.

                                        -

                                        In this example, all operators are created as OpDesc protobuf messages, and all variables are VarDesc. These protobuf messages are saved in a BlockDesc protobuf message.

                                        -
                                        -
                                        -

                                        Backward Part

                                        -

                                        The fifth line optimize(cost) calls two functions, ConstructBackwardGraph and ConstructOptimizationGraph.

                                        -

                                        ConstructBackwardGraph traverses the forward graph in the BlockDesc protobuf message and builds the backward part.

                                        -

                                        -

                                        According to the chain rule of gradient computation, ConstructBackwardGraph would

                                        -
                                          -
                                        1. create a gradient operator G for each operator F,
                                        2. -
                                        3. make all inputs, outputs, and outputs’ gradient of F as inputs of G,
                                        4. -
                                        5. create gradients for all inputs of F, except for those who don’t have gradients, like x and l, and
                                        6. -
                                        7. make all these gradients as outputs of G.
                                        8. -
                                        -
                                        -
                                        -

                                        Optimization Part

                                        -

                                        For each parameter, like W and b created by layer.fc, marked as double circles in above graphs, ConstructOptimizationGraph creates an optimization operator to apply its gradient. Here results in the complete graph:

                                        -

                                        -
                                        -
                                        -
                                        -

                                        Block and Graph

                                        -

                                        The word block and graph are interchangable in the desgin of PaddlePaddle. A Block is a metaphore of the code and local variables in a pair of curly braces in programming languages, where operators are like statements or instructions. A graph of operators and variables is a representation of the block.

                                        -

                                        A Block keeps operators in an array BlockDesc::ops

                                        -
                                        message BlockDesc {
                                        -  repeated OpDesc ops = 1;
                                        -  repeated VarDesc vars = 2;
                                        -}
                                        -
                                        -
                                        -

                                        in the order that they appear in user programs, like the Python program at the beginning of this article. We can imagine that in ops, we have some forward operators, followed by some gradient operators, and then some optimization operators.

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/graph_survey.html b/develop/doc_cn/design/graph_survey.html deleted file mode 100644 index 560d8df6a4cbea88167b8116cba2f4c0ae6ce258..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/graph_survey.html +++ /dev/null @@ -1,453 +0,0 @@ - - - - - - - - - - - - - Survey on Graph — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Survey on Graph

                                        -

                                        Neural network framework often provides symbolic API for users to write network topology conveniently. This doc manily focus on symbolic API in most popular neural network frameworks, and try to find out how to parse symbolic configuration to a portable file, such as protobuf or json.

                                        -
                                        -

                                        Mxnet

                                        -

                                        The core concept of symbolic API is Symbol. Mxnet implements Symbol class in C++, and export to Python using C-API. Please refer to the comments in Mxnet:

                                        -

                                        Symbol is help class used to represent the operator node in Graph. -Symbol acts as an interface for building graphs from different components like Variable, Functor and Group. Symbol is also exported to python front-end (while Graph is not) to enable quick test and deployment. Conceptually, symbol is the final operation of a graph and thus including all the information required (the graph) to evaluate its output value.

                                        -

                                        A simple network topology wrote by Symbol is as follows:

                                        -
                                        def get_symbol(num_classes=10, **kwargs):
                                        -    data = mx.symbol.Variable('data')
                                        -    data = mx.symbol.Flatten(data=data)
                                        -    fc1  = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
                                        -    act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
                                        -    fc2  = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
                                        -    act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
                                        -    fc3  = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=num_classes)
                                        -    mlp  = mx.symbol.SoftmaxOutput(data = fc3, name = 'softmax')
                                        -    return mlp
                                        -
                                        -
                                        -

                                        Varible here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own NodeAttr. There is a op field in NodeAttr class, when a Symbol represents Variable(often input data), the op field is null.

                                        -

                                        Symbol contains a data member, std::vector outputs, and NodeEntry cantains a poniter to Node. We can follow the Node pointer to get all the Graph.

                                        -

                                        And Symbol can be saved to a Json file.

                                        -

                                        Here is a detailed example:

                                        -
                                        >>> import mxnet as mx
                                        ->>> data = mx.symbol.Variable('data')
                                        ->>> print data.debug_str()
                                        -Variable:data
                                        -
                                        ->>> data = mx.symbol.Flatten(data=data)
                                        ->>> print data.debug_str()
                                        -Symbol Outputs:
                                        -    output[0]=flatten0(0)
                                        -Variable:data
                                        ---------------------
                                        -Op:Flatten, Name=flatten0
                                        -Inputs:
                                        -    arg[0]=data(0) version=0
                                        -
                                        ->>> fc1  = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
                                        ->>> print fc1.debug_str()
                                        -Symbol Outputs:
                                        -    output[0]=fc1(0)
                                        -Variable:data
                                        ---------------------
                                        -Op:Flatten, Name=flatten0
                                        -Inputs:
                                        -    arg[0]=data(0) version=0
                                        -Variable:fc1_weight
                                        -Variable:fc1_bias
                                        ---------------------
                                        -Op:FullyConnected, Name=fc1
                                        -Inputs:
                                        -    arg[0]=flatten0(0)
                                        -    arg[1]=fc1_weight(0) version=0
                                        -    arg[2]=fc1_bias(0) version=0
                                        -Attrs:
                                        -    num_hidden=128
                                        -
                                        -
                                        -
                                        -
                                        -

                                        TensorFlow

                                        -

                                        The core concept of symbolic API is Tensor. Tensorflow defines Tensor in Python. Please refer to the comments in TensorFlow:

                                        -

                                        A Tensor is a symbolic handle to one of the outputs of an Operation. It does not hold the values of that operation’s output, but instead provides a means of computing those values in a TensorFlow Session.

                                        -

                                        A simple example is as follows:

                                        -
                                          # Build a dataflow graph.
                                        -  c = tf.constant([[1.0, 2.0], [3.0, 4.0]])
                                        -  d = tf.constant([[1.0, 1.0], [0.0, 1.0]])
                                        -  e = tf.matmul(c, d)
                                        -
                                        -  # Construct a `Session` to execute the graph.
                                        -  sess = tf.Session()
                                        -
                                        -  # Execute the graph and store the value that `e` represents in `result`.
                                        -  result = sess.run(e)
                                        -
                                        -
                                        -

                                        The main method of Tensor is as follows:

                                        -
                                        @property
                                        -def op(self):
                                        -  """The `Operation` that produces this tensor as an output."""
                                        -  return self._op
                                        -
                                        -@property
                                        -def dtype(self):
                                        -   """The `DType` of elements in this tensor."""
                                        -  return self._dtype
                                        -
                                        -@property
                                        -def graph(self):
                                        -  """The `Graph` that contains this tensor."""
                                        -  return self._op.graph
                                        -
                                        -@property
                                        -def name(self):
                                        -  """The string name of this tensor."""
                                        -  if not self._op.name:
                                        -    raise ValueError("Operation was not named: %s" % self._op)
                                        -  return "%s:%d" % (self._op.name, self._value_index)
                                        -
                                        -@property
                                        -def device(self):
                                        -  """The name of the device on which this tensor will be produced, or None."""
                                        -  return self._op.device
                                        -
                                        -
                                        -

                                        Tensor can be taken as target to run by session. Tensor contains all the information of Graph, and tracks data dependency.

                                        -

                                        Here is a detailed example:

                                        -
                                        >>> import tensorflow as tf
                                        ->>> c = tf.constant([[1.0, 2.0], [3.0, 4.0]])
                                        ->>> print c.graph
                                        -<tensorflow.python.framework.ops.Graph object at 0x10f256d50>
                                        ->>> d = tf.constant([[1.0, 1.0], [0.0, 1.0]])
                                        ->>> print d.graph
                                        -<tensorflow.python.framework.ops.Graph object at 0x10f256d50>
                                        ->>> e = tf.matmul(c, d)
                                        ->>> print e.graph
                                        -<tensorflow.python.framework.ops.Graph object at 0x10f256d50>
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Dynet

                                        -

                                        The core concept of symbolic API is Expression, and Dynet defines Expression class in C++.

                                        -

                                        A simple example is as follows:

                                        -
                                        ComputationGraph cg;
                                        -Expression W = parameter(cg, pW);
                                        -
                                        -Expression in = input(cg, xs[i]);
                                        -Expression label = input(cg, ys[i]);
                                        -Expression pred = W * in;
                                        -Expression loss = square(pred - label);
                                        -
                                        -
                                        -

                                        The input data and parameter are also represented by Expression. Every basci Expression corresponds to a Node. And input data is also a Node.

                                        -

                                        Expression has a data member ComputationGraph, and ComputationGraph will be modified in users’ configuring process. Expression can be a running target, beacuse Expression contains all dependency.

                                        -

                                        Here is a detailed example:

                                        -

                                        write topology in C++

                                        -
                                        ComputationGraph cg;
                                        -Expression W = parameter(cg, pW);
                                        -cg.print_graphviz();
                                        -
                                        -Expression pred = W * xs[i];
                                        -cg.print_graphviz();
                                        -
                                        -Expression loss = square(pred - ys[i]);
                                        -cg.print_graphviz();
                                        -
                                        -
                                        -

                                        compile and print

                                        -
                                        # first print
                                        -digraph G {
                                        -  rankdir=LR;
                                        -  nodesep=.05;
                                        -  N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"];
                                        -}
                                        -# second print
                                        -digraph G {
                                        -  rankdir=LR;
                                        -  nodesep=.05;
                                        -  N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"];
                                        -  N1 [label="v1 = v0 * -0.98"];
                                        -  N0 -> N1;
                                        -}
                                        -# third print
                                        -digraph G {
                                        -  rankdir=LR;
                                        -  nodesep=.05;
                                        -  N0 [label="v0 = parameters({1}) @ 0x7ffe4de00110"];
                                        -  N1 [label="v1 = v0 * -0.98"];
                                        -  N0 -> N1;
                                        -  N2 [label="v2 = -1.88387 - v1"];
                                        -  N1 -> N2;
                                        -  N3 [label="v3 = -v2"];
                                        -  N2 -> N3;
                                        -  N4 [label="v4 = square(v3)"];
                                        -  N3 -> N4;
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Conclusion

                                        -

                                        Actually, Symbol/Tensor/Expression in Mxnet/TensorFlow/Dynet are the same level concepts. We use a unified name Expression here, this level concept has following features:

                                        -
                                          -
                                        • Users wirte topoloy with symbolic API, and all return value is Expression, including input data and parameter.
                                        • -
                                        • Expression corresponds with a global Graph, and Expression can also be composed.
                                        • -
                                        • Expression tracks all dependency and can be taken as a run target
                                        • -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/if_else_op.html b/develop/doc_cn/design/if_else_op.html deleted file mode 100644 index c7c101cd58b355e87559cdf17ef6f8ed2d1c2fd8..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/if_else_op.html +++ /dev/null @@ -1,309 +0,0 @@ - - - - - - - - - - - - - The IfElse Operator — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        The IfElse Operator

                                        -

                                        PaddlePaddle’s IfElse operator differs from TensorFlow’s:

                                        -
                                          -
                                        • the TensorFlow version takes a scalar boolean value as the condition so that the whole mini-batch goes to either the true or the false branch, whereas
                                        • -
                                        • the PaddlePaddle version takes a vector of boolean value as the condition, and instances corresponding to true values go to the true branch, those corresponding to false values go to the false branch.
                                        • -
                                        -
                                        -

                                        Example

                                        -

                                        The following PaddlePaddle program shows the usage of the IfElse operator:

                                        -
                                        import paddle as pd
                                        -
                                        -x = minibatch([10, 20, 30]) # shape=[None, 1]
                                        -y = var(1) # shape=[1], value=1
                                        -z = minibatch([10, 20, 30]) # shape=[None, 1]
                                        -cond = larger_than(x, 15) # [false, true, true]
                                        -
                                        -ie = pd.ifelse()
                                        -with ie.true_block():
                                        -    d = pd.layer.add(x, y)
                                        -    ie.output(d, pd.layer.softmax(d))
                                        -with ie.false_block():
                                        -    d = pd.layer.fc(z)
                                        -    ie.output(d, d+1)
                                        -o1, o2 = ie(cond)
                                        -
                                        -
                                        -

                                        A challenge to implement the IfElse operator is to infer those variables to be split, or, say, to identify the variable of the mini-batch or those derived from the mini-batch.

                                        -

                                        An equivalent C++ program is as follows:

                                        -
                                        namespace pd = paddle;
                                        -
                                        -int x = 10;
                                        -int y = 1;
                                        -int z = 10;
                                        -bool cond = false;
                                        -int o1, o2;
                                        -if (cond) {
                                        -  int d = x + y;
                                        -  o1 = z;
                                        -  o2 = pd::layer::softmax(z);
                                        -} else {
                                        -  int d = pd::layer::fc(z);
                                        -  o1 = d;
                                        -  o2 = d+1;
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/infer_var_type.html b/develop/doc_cn/design/infer_var_type.html deleted file mode 100644 index 4b303634d3f61df8f5cc8658d1ba6050e7ca656a..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/infer_var_type.html +++ /dev/null @@ -1,326 +0,0 @@ - - - - - - - - - - - - - Design Doc: InferVarType — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: InferVarType

                                        -
                                        -

                                        The Problem Posed

                                        -

                                        The variable in our design can hold variant types. Such as LoDTensor and SelectedRows. An operator should be able to inference the variable types of its output.

                                        -

                                        For example, a lookup table operator takes two LoDTensor; one is a float tensor as the embedding table, the other is an int tensor as word ID. The gradient operator of lookup table will generate a SelectedRows as its output. A sum operator can take both LoDTensor and SelectedRows as its inputs and will generate a LoDTensor if any of its inputs is LoDTensor, otherwise, the sum operator will generate SelectedRows as its output.

                                        -

                                        The variable type will be constant at runtime. Every variable’s type can either be set by the user (input data and parameter) or be inferred by the operator in compile time.

                                        -
                                        -
                                        -

                                        Proposed Solution

                                        -

                                        The InferVarType is a compile-time function which is registered to each operator. The inferface of that function is:

                                        -
                                        using InferVarTypeFN = std::function<
                                        -    void (const OpDescBind& /*op_desc*/, BlockDescBind* /*block*/)>;
                                        -
                                        -
                                        -

                                        It takes an operator description as its input and will write the output variable type and store them in block description.

                                        -

                                        The InferVarTypeFN will be registered in OpInfo, to replace infer_var_type_ field. The OpInfo should be

                                        -
                                        struct OpInfo {
                                        -  InferVarTypeFN infer_var_type_;
                                        -  ...
                                        -};
                                        -
                                        -
                                        -

                                        The default InferVarType will set output type as LoDTensor. It can be done by GetInferVarType().

                                        -
                                        void DefaultInferVarType(const OpDescBind& op_desc, BlockDescBind* block) {
                                        -  // set the output type of variable as `LoDTensor`.
                                        -  // ...
                                        -}
                                        -
                                        -struct OpInfo {
                                        -  InferVarTypeFN infer_var_type_;
                                        -  InferVarTypeFN GetInferVarType() const {
                                        -    if (infer_var_type_) {
                                        -      return infer_var_type_;
                                        -    } else {
                                        -      return DefaultInferVarType;
                                        -    }
                                        -  }
                                        -};
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Register InferVarType

                                        -

                                        We provide a thin base class for registering an InferVarTypeFN. To use a base class will ease the implementation of registry since we can detect the registry entry is an InferVarTypeFN or not.

                                        -
                                        class VarTypeInferer {
                                        -public:
                                        -  virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const = 0;
                                        -}
                                        -
                                        -
                                        -

                                        Operator developers can write the specialize VarTypeInferer as follow.

                                        -
                                        class SpecialVarTypeInferer : public VarTypeInferer {
                                        -public:
                                        -  virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const {
                                        -    // .. own logic
                                        -  }
                                        -}
                                        -
                                        -
                                        -

                                        Then user can register the InferVarType just like GradOpDescMaker and OpInfoMaker.

                                        -
                                        REGISTER_OPERATOR(some_op, OpType, SpecialVarTypeInferer, ...);
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/kernel_hint_design.html b/develop/doc_cn/design/kernel_hint_design.html deleted file mode 100644 index e19fd72eff0b42495c0066a963fe52ccd68b1641..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/kernel_hint_design.html +++ /dev/null @@ -1,314 +0,0 @@ - - - - - - - - - - - - - Problem — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Problem

                                        -

                                        In PaddlePaddle’s Design, one Operator may have multiple kernels. Users may have some personal preference to choose a certain type of kernel for an operator, such as force_cpu to choose a CPU kernel, use_cudnn to choose a CUDNN kernel, we need to provide a way for users to do this.

                                        -

                                        In the current design, we use KernelType to describe one kernel.

                                        -
                                        struct KernelType {
                                        -  Place place_;
                                        -  DataType data_type_;
                                        -  LayoutType layout_;
                                        -};
                                        -
                                        -
                                        -

                                        place_ data_type_ and layout_ can be got from the input tensors of the operator, GetActualKernelType(inputs) use inputs to infer the proper kernel key that fit the incoming data, but users can not directly configure it.

                                        -

                                        The design also provides a virtual method GetExpectedKernelType that user can overload and use to choose the KernelType they want to use.

                                        -

                                        So we should send the information user defined in proto to GetExpectedKernelType for choosing a kernel.

                                        -

                                        The problem is, how should we define and send the information for GetExpectedKernelType to use?

                                        -
                                        -
                                        -

                                        Solution

                                        -
                                        -

                                        Potential choice

                                        -
                                          -
                                        1. Do nothing, let the user add the information they want to operator‘s attribute and get them inside GetExpectedKernelType, this can work properly. But there is a little problem that users may define many kinds of hints for the same purpose, such as force_cpu, use_cpu, cpu_kernel to choose CPU kernel, and use_cudnn, force_cudnn, cudnn_kernel to choose CUDNN kernel.
                                        2. -
                                        3. Pre-define all the needed option and use a single attr key such as kernel_hint for the user, this is not so flexible if the user wants to define some more kind of hint.
                                        4. -
                                        -
                                        -
                                        -

                                        Final choice

                                        -

                                        To provide enough flexibility while avoiding confusion definition, we can define some global constants for these attribute names, such as force_cpu, use_cudnn, use_mkldnn for a user to choose.

                                        -

                                        In C++

                                        -
                                        const std::string kForceCPU = "force_cpu";
                                        -const std::string kUseCUDNN = "use_cudnn";
                                        -const std::string kUseMKLDNN = "use_mkldnn";
                                        -
                                        -KernelType GetExpectedKernelType() {
                                        -  if (Attr<bool>(kForceCPU)) {
                                        -    return KernelType(CPUPlace, ...)
                                        -  } else {
                                        -    ...
                                        -  }
                                        -}
                                        -
                                        -
                                        -

                                        In Python code

                                        -
                                        FORCE_CPU = core.kForceCPU()
                                        -
                                        -def xx_layer(..., force_cpu=false):
                                        -  layer_helper = LayerHelper(...)
                                        -  layer_helper.append_op(
                                        -    type="xx",
                                        -    attr={FORCE_CPU: force_cpu})
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/kernel_selection.html b/develop/doc_cn/design/kernel_selection.html deleted file mode 100644 index 8915a43af011f4db52ec773b45bfd808b58147f8..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/kernel_selection.html +++ /dev/null @@ -1,345 +0,0 @@ - - - - - - - - - - - - - Background — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Background

                                        -

                                        Every operator has many kernels because there are multiple data types, places, data layout, library type that Fluid supports. We use the OpKernelType to describe kernel types that operators can hold.

                                        -

                                        The OpKernelType is as follows:

                                        -
                                        struct OpKernelType {
                                        -  Place place_;
                                        -  DataType data_type_;
                                        -  DataLayout data_layout_;
                                        -  LibraryType library_type_;
                                        -};
                                        -
                                        -
                                        -
                                          -
                                        • The place_ is a descriptor of the device, e.g., CPUPlace, CUDAPlace.
                                        • -
                                        • The data_type_ is the data type that this kernel performs on, e.g., FP32, INT64. Note that one kernel may have inputs with different data types. However, it will be a major data_type. For example, the cross_entropy takes int64 as it label, and double/float as its input logit and output cost. The major data_type of cross_entropy is float or double.
                                        • -
                                        • The data_layout_ is useful for some computational library. One example is that MKLDNN uses many kinds of layout, such as nChw8c. Each kind of layout will invoke the different kernel.
                                        • -
                                        • The library_type_ describes the computational library, e.g., MKLDNN, CUDNN.
                                        • -
                                        -
                                        -
                                        -

                                        Problem

                                        -

                                        We register a kernel for every operator and every kernel type ideally. However, it is impracticable for the following situations.

                                        -
                                          -
                                        1. Some operators, like CRF, are complicated and inefficient to be implemented on GPU. The CRF operator will only have a CPU kernel.
                                        2. -
                                        3. Some operators will take too many memory. It is better to force them into CPU. However, the rest of operators in this neural network will be performed on GPU, i.e., model parallel problem.
                                        4. -
                                        5. Some layout and place are particular. One example is that MKLDNN uses nChw8 and there is no other library uses nChw8c.
                                        6. -
                                        -

                                        Take one situation to give a detailed explanation, if we have two Operators: OP1 and OP2, OP1 has one output op1_to_op2, and op1_to_op2 is the input of OP2.

                                        -

                                        If OP1 and OP2 run on the same place(for example CPUPlace), then op1_2_op2 can be used directly by OP2.

                                        -
                                        OP1(CPUPlace)
                                        -     |
                                        - op1_2_op2
                                        -     |
                                        -OP2(CPUPlace)
                                        -
                                        -
                                        -

                                        If OP1 and OP2 run one different place, then OP2 cannot use op1_2_op2 directly.

                                        -

                                        Problems under these situations are similar. We can formalize this problem as follow.

                                        -

                                        We register kernels with types $KT = {kt_1, kt_2, kt_3, ...}$ for one operator. The inputs of this operator should be run on kernel type $kt_{?}$, which the $kt_{?} \notin KT$. How to cast the input of this operator from $kt_{?}$ to any of kernel type in $KT$.

                                        -
                                        -
                                        -

                                        Solution: data transform

                                        -

                                        It is clear that transforming inputs of an operator to adapt another kernel type is not related to the particular operator. So we should register these transformation methods as global methods.

                                        -

                                        We can infer kernel type for each input of an operator. We let this kernel type as actual kernel type for var, which means this kernel type is the kernel type that can process this input variable.

                                        -

                                        We can get a kernel type by 1) The configuration of operator description. (Users may want to force use MKL for conv operator). 2) The place of the current executor. (Executor is running on GPU). This kernel type is what we expect the operator will be performed on. We let this kernel type as expect kernel type.

                                        -

                                        We transform the input data from actual to expect if the actual kernel type is not as same as expect kernel type.

                                        -

                                        The algorithm is described as following

                                        -
                                        void OperatorWithKernel::Run(
                                        -        const Scope& scope,
                                        -        const platform::Place& place) const {
                                        -  ExecutionContext ctx(...);
                                        -  auto expected_kernel_key = this->GetExpectedKernelType(ctx);
                                        -
                                        -  Scope& new_scope = scope.NewScope();
                                        -
                                        -  for (auto& var_name : this->Inputs()) {
                                        -    auto* tensor_in = GetTensor(var_name);
                                        -    auto kernel_type_for_var = this->GetKernelTypeForVar(...);
                                        -    if (kernel_type_for_var.place_ != expected_kernel_key.place_) {
                                        -      auto* trans_var = new_scope.Var(var_name);
                                        -      auto* out = DataTransform(expected_kernel_key,
                                        -                                kernel_type_for_var,
                                        -                                *tensor_in);
                                        -      CopyVariableWithTensor(...);
                                        -    }
                                        -  }
                                        -
                                        -  auto kernel = kernels.find(expected_kernel_key);
                                        -  kernel->Compute(ExecutionContext(...));
                                        -}
                                        -
                                        -
                                        -

                                        then the actual process for the multi-device above will be:

                                        -
                                        OP1(CPUPlace)
                                        -     |
                                        -op1_2_op2(on CPU)
                                        -     |
                                        -[transform](from CPU to GPU)
                                        -     |
                                        -op1_2_op2(on GPU)
                                        -     |
                                        -OP2(CUDAPlace)
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/memory_optimization.html b/develop/doc_cn/design/memory_optimization.html deleted file mode 100644 index c7d87940ee112967e6a0149fc0381f00326f4ca1..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/memory_optimization.html +++ /dev/null @@ -1,454 +0,0 @@ - - - - - - - - - - - - - Memory Optimization — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Memory Optimization

                                        -
                                        -

                                        Problem

                                        -

                                        In a lecture from Andrew Ng, he attributes the recent sucess of AI due to a combination of these:

                                        -
                                          -
                                        • Availability of Big Data
                                        • -
                                        • Supercomputing power to process this Big Data over very large neural networks
                                        • -
                                        • Modern algorithms
                                        • -
                                        -

                                        Following graph shows the details:

                                        -

                                        -

                                        Larger model usually bring better performance. However, GPU memory is limited. For example, the memory size of a GTX TITAN X is only 12GB. To train complex and large models, we have to take care of memory usage. Besides, memory optimization is also necessary in both online/mobile inference.

                                        -
                                        -
                                        -

                                        Solution

                                        -
                                        -

                                        Basic Strategy

                                        -

                                        There are some basic strategies to improve memory usage, including in-place operations and memory sharing.

                                        -
                                        -

                                        In-place Operation

                                        -

                                        In a relu activation operator:

                                        -

                                        $y = \max(x, 0)$

                                        -

                                        If the variable x is not used in any other operator, we can make an in-place operation. In other words, the memory block of variable y and variable x will be the same. In-place operations will save 50% memory occupancy immediately.

                                        -
                                        -
                                        -

                                        Memory Sharing

                                        -

                                        Not all operators support in-place operations. Memory sharing is a more general strategy.

                                        -

                                        Following is an example:

                                        -
                                        a = op1(b, c);
                                        -d = op2(a)
                                        -e = op3(d, f)
                                        -
                                        -
                                        -

                                        In this case, variable a is no longer used, and op2 does not support in-place operation. After op2 finishes, we can put the memory of variable a to a memory pool. Then, variable e can share the memory of variable a from the pool.

                                        -
                                        -
                                        -
                                        -

                                        Live Variable Analysis

                                        -

                                        It’s not enough to only have some basic strategies. The pre-requisite of memory optimization is to know if a variable is still “live” after an operation.

                                        -

                                        In our design, the neural network topology is defined as a program. Luckily, live variable analysis is a classic problem in compilers which can be used in many stages, such as register allocation.

                                        -

                                        In compilers, the front end of the compiler translates programs into an intermediate language with an unbounded number of temporary variables. This program must run on a machine with a bounded number of registers. Two temporary variables a and b can fit into the same register, if a and b are never “in use” at the same time. Thus, many temporary variables can fit in few registers; if they don’t all fit, the excess tempory variables can be kept in memory.

                                        -

                                        Therefore, the compiler needs to analyze the intermediate-representation program to determine which temporary variables are in use at the same time. We say a variable is “live” if it holds a value that may be needed in the future, so this analysis is called liveness analysis.

                                        -

                                        We can leran these techniques from compilers. There are mainly two stages to make live variable analysis:

                                        -
                                          -
                                        • construct a control flow graph
                                        • -
                                        • solve the dataflow equations
                                        • -
                                        -
                                        -

                                        Control Flow Graph

                                        -

                                        To perform analysis on a program, it is often useful to make a control flow graph. A control flow graph (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. Each statement in the program is a node in the flow graph; if statemment x can be followed by statement y, there is an egde from x to y.

                                        -

                                        Following is the flow graph for a simple loop.

                                        -

                                        -
                                        -
                                        -

                                        Dataflow Analysis

                                        -

                                        Liveness of variable “flows” around the edges of the control flow graph; determining the live range of each variable is an example of a dataflow problem. Dataflow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program.

                                        -

                                        A simple way to perform data-flow analysis of programs is to set up dataflow equations for each node of the control flow graph and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes.

                                        -
                                          -
                                        • Flow Graph Terminology
                                        • -
                                        -

                                        A flow graph node has out-edges that lead to sucessor nodes, and in-edges that come from predecessor nodes. The set pred[n] is all the predecessors of node n, and succ[n] is the set of sucessors. -In former control flow graph, the out-edges of node 5 are 5 –> 6 and 5 –> 2, and succ[5] = {2, 6}. The in-edges of 2 are 5 –> 2 and 1 –> 2, and pred[2] = {1, 5}.

                                        -
                                          -
                                        • Uses and Defs
                                        • -
                                        -

                                        An assignmemt to a variable or temporary defines that variable. An occurence of a variable on the right-hand side of an assginment(or in other expressions) uses the variable. We can define the def of a variable as the set of graph nodes that define it; or the def of a graph node as the set of variables that it defines; and the similarly for the use of a variable or graph node. In former control flow graph, def(3) = {c}, use(3) = {b, c}.

                                        -
                                          -
                                        • Liveness
                                        • -
                                        -

                                        A variable is live on an edge if there is a directed path from that edge to a use of the variable that does not go through any def. A variable is live-in at a node if it is live on any of the in-edges of that node; it is live-out at a node if it is live on any of the out-edges of the node.

                                        -

                                        The calcution of liveness can be solved by iteration until a fixed pointer is reached. Following is the recursive formula:

                                        -

                                        -
                                        -
                                        -
                                        -

                                        Memory optimization transpiler

                                        -

                                        At last, we take basic strategy and liveness analysis techniques learning from compilers to implement our memory optimization transpiler.

                                        -
                                        -

                                        add in-place attribute

                                        -

                                        In-place is a built-in attribute of an operator. Since we treat in-place and other operators differently, we have to add an in-place attribute for every operator.

                                        -
                                        -
                                        -

                                        contruct control flow graph

                                        -

                                        Following is the ProgramDesc protobuf of machine translation example.

                                        -
                                          -
                                        • Block0:
                                        • -
                                        -
                                        lookup_table
                                        -mul
                                        -...
                                        -while(sub-block idx 1)
                                        -...
                                        -array_to_lod_tensor
                                        -cross_entropy
                                        -...
                                        -while_grad(sub-block idx 2)
                                        -read_from_array
                                        -array_to_lod_tensor
                                        -...
                                        -
                                        -
                                        -
                                          -
                                        • Block1
                                        • -
                                        -
                                        read_from_array
                                        -read_from_array
                                        -...
                                        -write_to_array
                                        -increment
                                        -write_to_array
                                        -less_than
                                        -
                                        -
                                        -
                                          -
                                        • Block2
                                        • -
                                        -
                                        read_from_array
                                        -increment
                                        -...
                                        -write_to_array
                                        -write_to_array
                                        -
                                        -
                                        -

                                        We can transfer all the operators and variables in ProgramDesc to build a control flow graph.

                                        -
                                        class ControlFlowGraph(object):
                                        -    def __init__(self, Program):
                                        -        self._sucessors = defaultdict(set)
                                        -        self._presucessors = defaultdict(set)
                                        -        self._uses = defaultdict(set)
                                        -        self._defs = defaultdict(set)
                                        -        self._live_in = defaultdict(set)
                                        -        self._live_out = defaultdict(set)
                                        -        self._program = Program
                                        -    
                                        -    def build(self):
                                        -        pass
                                        -    
                                        -    def dataflow_analysis(self):
                                        -        pass
                                        -        
                                        -    def memory_optimization(self):
                                        -        pass
                                        -        
                                        -    def get_program(self):
                                        -        return self._program
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Make dataflow analysis

                                        -

                                        We follow the guide from compilers and try to solve the dataflow equation to get liveness of every variable. If the live-in of an operator node is different from the live-out, then we can make memory sharing.

                                        -

                                        For example:

                                        -
                                        a = op1(b, c);
                                        -d = op2(a)
                                        -e = op3(d, f)
                                        -
                                        -
                                        -

                                        The dataflow analysis result is:

                                        -
                                        live_in(op1) = {b, c, f}
                                        -live_out(op1) = {a, f}
                                        -
                                        -live_in(op2) = {a, f}
                                        -live_out(op2) = {d, f}
                                        -
                                        -live_in(op3) = {d, f}
                                        -live_out(op3) = {}
                                        -
                                        -
                                        -

                                        After op1, we can process variable b and variable c; After op2, we can process variable a. After op3, we can process variable d and variable f.

                                        -
                                        -
                                        -

                                        memory sharing policy

                                        -

                                        A memory pool will be mantained in the stage of memory optimization. Each operator node will be scanned to determine memory optimization is done or not. If an operator satifies the requirement, following policy will be taken to handle input/output variables.

                                        -
                                        if op.support_inplace():
                                        -    i --> pool
                                        -    pool --> o
                                        -else:
                                        -    pool --> o
                                        -    i --> pool
                                        -
                                        -
                                        -
                                        -
                                        -
                                        - -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/mkl/mkl_packed.html b/develop/doc_cn/design/mkl/mkl_packed.html deleted file mode 100644 index db2edaa0555e0f1abc580b2ed9bcfe8bf530fbf6..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/mkl/mkl_packed.html +++ /dev/null @@ -1,388 +0,0 @@ - - - - - - - - - - - - - Intel® MKL Packed on PaddlePaddle: Design Doc — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Intel® MKL Packed on PaddlePaddle: Design Doc

                                        - -
                                        -

                                        Overview

                                        -

                                        我们计划将 Intel® MKL 中引入的 GEMM Packed APIs[1] 集成到 PaddlePaddle 中,充分发挥英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。 -现阶段的优化主要针对 Recurrent Neural Network(以下简称RNN)相关层(包括RecurrentLayer, GatedRecurrentLayerLstmLayer), 以及 PaddlePaddle V1 API。

                                        -
                                        -
                                        -

                                        Key Points

                                        -
                                        -

                                        Background

                                        -

                                        目前PaddlePaddle采用了 Intel® MKL库的cblas_?gemm函数,这个函数本身会在计算前将原数据转换为更适合英特尔平台的内部格式。

                                        -
                                          -
                                        1. 转换耗时 这一数据格式的转换操作(Packing),在问题本身的计算量比较小的时候,显得相对来说较为耗时。例如在DeepSpeech2 [2] 的Vanilla RNN部分中,矩阵大小是batch_size * 2048
                                        2. -
                                        3. 转换冗余 由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
                                        4. -
                                        -

                                        为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:

                                        -
                                        - -
                                        -

                                        通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。

                                        -
                                        -
                                        -

                                        Solution

                                        -

                                        在RNN的情况下,同一次前向、后向(forward/backward)过程中所有时间步(time step)共享同一个权重(weight)。当只做推断(inference)时,各次前向之间也都使用了相同的权重,没有必要在每次前向中每个时间步的计算时对权重进行重复的Packing操作。

                                        -

                                        我们通过使用新引入的GEMM Packed APIs,在层初始化的时候,先完成对权重的Packing操作,然后在前向,后向时复用已经转换过的权重,并在每次权重更新后,对新的权重进行转换用于下次迭代。

                                        -
                                          -
                                        • 优化前,对于序列长度(sequence length)为T的网络模型(model), N次迭代执行的转换次数为:
                                            -
                                          • inferenceN * T
                                          • -
                                          • training2 * N * T
                                          • -
                                          -
                                        • -
                                        • 优化后,对于同样设置的网络模型,其转换次数减少至:
                                            -
                                          • inference1
                                          • -
                                          • training2 * N
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Actions

                                        -

                                        添加的相关文件和目录结构如下:

                                        -
                                        PaddlePaddle/Paddle
                                        -├── ...
                                        -└── paddle/
                                        -    ├── ...
                                        -    └── gserver/
                                        -        ├── ...
                                        -        ├── layers/
                                        -        │   ├── ...
                                        -        │   ├── MKLPackedRecurrentLayer.*
                                        -        |   ├── MKLPackedGatedRecurrentLayer.*
                                        -        |   ├── MKLPackedLstmLayer.*
                                        -        |   └── MKLPackedGemm.h
                                        -        └── tests/
                                        -            ├── ...
                                        -            └── test_MKLPacked.cpp
                                        -
                                        -
                                        -
                                        -

                                        CMake

                                        -

                                        在对应的CMakeLists.txt中根据WITH_MKL是否打开,来决定是否开启MKL Packed相关功能。

                                        -
                                        -
                                        -

                                        Layers

                                        -

                                        所有的MKLPacked*Layer都继承于PaddlePaddle的基类Layer, 并添加头文件 MKLPackedGemm.h,该文件对相关GEMM Packed APIs做了封装。

                                        -
                                        -
                                        -

                                        Unit Tests

                                        -

                                        我们会添加test_MKLPacked.cpp用于MKL Packed优化后layer的测试。 -对于每一个新加的RNN layer,我们会对比如下2个方面:

                                        -
                                          -
                                        1. 对比优化后layer自身,sequence mode(rnn_use_batch=false)与batch mode(rnn_use_batch=true)的结果。
                                        2. -
                                        3. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
                                        4. -
                                        -
                                        -
                                        -

                                        Python API

                                        -

                                        计划在paddle/utils.Flags中添加use_mkl_packed的flag,用于选择是否使用相关功能,并且当编译时WITH_MKL=ON的情况下,默认设置为true

                                        -

                                        同时,在python/paddle/trainer/config_parser.py中对应的layer处,添加use_mkl_packed这个选择,方便用户在Python端选择是否启用这个功能。

                                        -

                                        具体实现方式比如:

                                        -
                                        use_mkl_packed = bool(int(g_command_config_args.get("use_mkl_packed", 0)))
                                        -if use_mkl_packed:
                                        -    self.layer_type = mkl_packed_*
                                        -
                                        -
                                        -

                                        所有相关的layer_type会以*mkl_packed_*开头,这些会在MKLPacked*Layer注册layer的时候保证,以示区分。

                                        -
                                        -
                                        -

                                        Benchmarking

                                        -

                                        会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。

                                        -
                                        -
                                        - -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/mkl/mkldnn.html b/develop/doc_cn/design/mkl/mkldnn.html deleted file mode 100644 index 2c986c167aa8a332facd48e162a8affee858400e..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/mkl/mkldnn.html +++ /dev/null @@ -1,469 +0,0 @@ - - - - - - - - - - - - - Intel® MKL-DNN on PaddlePaddle: Design Doc — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Intel® MKL-DNN on PaddlePaddle: Design Doc

                                        -

                                        我们计划将英特尔深度神经网络数学库Intel MKL-DNN -(Intel Math Kernel Library for Deep Neural Networks)集成到PaddlePaddle, -充分展现英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。

                                        -
                                        -
                                        -Figure 1. PaddlePaddle on IA -

                                        近期目标

                                        -
                                          -
                                        • 完成常用Layer的MKL-DNN实现。
                                        • -
                                        • 完成常见深度神经网络VGG,GoogLeNet 和 ResNet的MKL-DNN实现。
                                        • -
                                        -

                                        目前的优化,主要针对PaddlePaddle在重构之前的代码框架以及V1的API。 -具体的完成状态可以参见这里

                                        - -
                                        -

                                        Overview

                                        -

                                        我们会把MKL-DNN会作为第三方库集成进PaddlePaddle,与其他第三方库一样,会在编译PaddlePaddle的时候下载并编译MKL-DNN。

                                        -

                                        同时,为了进一步提升PaddlePaddle在基本数学运算的计算速度,我们也将MKLML即(MKL small library[1]) -作为另一个第三方库集成进PaddlePaddle,它只会包括生成好的动态库和头文件。

                                        -

                                        MKL,MKLML以及MKL-DNN三者关系如下表:

                                        -

                                        | Name | Open Source | License | Descriptions | -| :———- | :————— | :———- | :———— | -| MKL | No | Proprietary | Accelerate math processing routines | -| MKLML | No | Proprietary | Small package of MKL, especially for Machine Learning | -| MKL-DNN | Yes | Apache 2.0 | Accelerate primitives processing routines especially for Deep Neural Networks |

                                        -

                                        MKLML可以与MKL-DNN共同使用,以此达到最好的性能。

                                        -
                                        -
                                        -Figure 2. PaddlePaddle with MKL Engines -
                                        -
                                        -

                                        Actions

                                        -

                                        添加的相关文件和目录结构如下:

                                        -
                                        PaddlePaddle/Paddle
                                        -├── ...
                                        -├── cmake/
                                        -│   ├── external/
                                        -│   │   ├── ...
                                        -│   │   ├── mkldnn.cmake
                                        -│   │   └── mklml.cmake
                                        -└── paddle/
                                        -    ├── ...
                                        -    ├── math/
                                        -    │   ├── ...
                                        -    │   └── MKLDNNMatrix.*
                                        -    └── gserver/
                                        -        ├── ...
                                        -        ├── layers/
                                        -        │   ├── ...
                                        -        │   └── MKLDNN*Layer.*
                                        -        ├── activations/
                                        -        │   ├── ...
                                        -        │   └── MKLDNNActivations.*
                                        -        └── tests/
                                        -            ├── ...
                                        -            ├── MKLDNNTester.*
                                        -            └── test_MKLDNN.cpp
                                        -
                                        -
                                        -
                                        -

                                        CMake

                                        -

                                        CMakeLists.txt中提供一个与MKL有关的总开关:WITH_MKL,它负责决定编译时是否使用MKLML和MKL-DNN

                                        -
                                          -
                                        • WITH_MKLML 控制是否使用MKLML库。 -当打开WITH_MKL时,会自动使用MKLML库作为PaddlePaddle的CBLAS和LAPACK库,同时会开启Intel OpenMP用于提高MKLML的性能。 -编译时会把对应的头文件和库放在build/third_party/install/mklml/*目录下对应的地方。 -MKLML的库目前都是动态库,主要包括libiomp5.solibmklml_intel.so
                                        • -
                                        • WITH_MKLDNN 控制是否使用MKL-DNN。 -当开启WITH_MKL时,会自动根据硬件配置[2]选择是否编译MKL-DNN。 -编译时会把对应的头文件和库放在build/third_party/install/mkldnn/*目录下对应的地方。 -MKL-DNN的库目前只有动态库libmkldnn.so
                                        • -
                                        -
                                        -
                                        -

                                        Matrix

                                        -

                                        目前在PaddlePaddle中数据都是以NCHW的格式存储,但是在MKL-DNN中的排列方式不止这一种。 -所以我们定义了一个MKLDNNMatrix用于管理MKL-DNN数据的不同格式以及相互之间的转换。

                                        -
                                        -
                                        -Figure 3. MKLDNNMatrix -
                                        -
                                        -

                                        Layers

                                        -

                                        所有MKL-DNN的Layers都会继承于MKLDNNLayer,该类继承于PaddlePaddle的基类Layer。 -在MKLDNNLayer中会提供一些必要的接口和函数,并且会写好forwardbackward的基本逻辑, -子类只需要使用定义好的接口,实现具体的函数功能即可。

                                        -
                                        -
                                        -Figure 4. MKLDNNLayer -

                                        每个MKLDNNLayer都包含用于内部存储和外部存储的一系列MKLDNNMatrix:

                                        -
                                          -
                                        • 内部存储(internel memory):inVal_,inGrad_,outVal_outGrad_,分别代表输入数据,输入梯度,输出数据和输出梯度。
                                        • -
                                        • 外部存储(external memory):都是以ext开头,比如extInVal_extInGrad_,它们主要是用于, -当数据格式与PaddlePaddle默认的NCHW格式不匹配时,转换内存的工作。 -需要注意的是,PaddlePaddle的activation会直接使用output_.valueoutput_.grad, -所以extOutVal_extOutGrad_必须分别与output_.valueoutput_.grad共享内存, -如果不需要外部存储用于转换,那么对应的内部存储也会与它们共享内存。
                                        • -
                                        • 转换函数(resetXXX): 包括resetInValueresetInGradresetOutValueresetOutGrad, -表示对输入数据,输入梯度,输出数据和输出梯度的转换。 -这些函数会根据输入参数重新设置内部和外部存储,当然这两者也可以相等,即表示不需要转换。
                                        • -
                                        -

                                        注意:每个MKLDNNlayer的子类只需要使用内部存储就可以了,所有外部的转换工作都会在reset系列函数中都准备好。

                                        -
                                        -
                                        -

                                        Activations

                                        -

                                        在重构前的PaddlePaddle中,激活函数是独立于Layer的概念,并且输入输出都是共用一块内存, -所以添加了对应的MKLDNNActivation来实现,方式类似于MKLDNNLayer

                                        -
                                        -
                                        -

                                        Parameters

                                        -

                                        对于有参数的层,我们会保证MKLDNNLayer使用的参数与PaddlePaddle申请的buffer共用一块内存。 -如果存在数据排列格式不一样的情况时,我们会在网络训练之前把格式转换为MKL-DNN希望的格式, -在训练结束的时候再保存为PaddlePaddle的格式,但是整个训练过程中不需要任何转换。 -这样既使得最终保存的参数格式与PaddlePaddle一致,又可以避免不必要的转换。

                                        -
                                        -
                                        -

                                        Gradients

                                        -

                                        由于MKL-DNN的操作都是直接覆盖的形式,也就是说输出的结果不会在原来的数据上累加, -这样带来的好处就是不需要一直清空memory,节省了不必要的操作。 -但是注意的是,当网络出现分支且在backward的时候,需要累加不同Layer传过来的梯度。 -所以在MKLDNNlayer中实现了一个merge的方法,此时每个小分支的Input Gradient -会先临时保存在MKLDNNMatrix中,由分支处的Layer负责求和,并把结果放到当前层的output_.grad中。 -所以整体上,在实现每个子类的时候就不需要关心分支的事情了。

                                        -
                                        -
                                        -Figure 5. Merge Gradients -
                                        -
                                        -

                                        Unit Tests

                                        -

                                        我们会添加test_MKLDNN.cppMKLDNNTester.*用于MKL-DNN的测试。 -测试分为每个Layer(或Activation)的单元测试和简单网络的整体测试。 -每个测试会对比PaddlePaddle中CPU算出的结果与MKL-DNN的结果,小于某个比较小的阈值认为通过。

                                        -
                                        -
                                        -

                                        Python API

                                        -

                                        目前只考虑v1 API

                                        -

                                        计划在python/paddle/trainer/config_parser.py里面添加use_mkldnn这个选择,方便用户选择使用MKL-DNN的layers。

                                        -

                                        具体实现方式比如:

                                        -
                                        use_mkldnn = bool(int(g_command_config_args.get("use_mkldnn", 0)))
                                        -if use_mkldnn
                                        -    self.layer_type = mkldnn_*
                                        -
                                        -
                                        -

                                        所有MKL-DNN的layer_type会以*mkldnn_*开头,这些会在MKLDNN*Layer注册layer的时候保证,以示区分。

                                        -

                                        同时,会在paddle/utils.Flags中添加一个use_mkldnn的flag,用于选择是否使用MKL-DNN的相关功能。

                                        -
                                        -
                                        -

                                        Benchmarking

                                        -

                                        会添加相应的脚本在这里,用于测试和对比在使用MKL-DNN前后的CNN网络性能。 -测试的性能对比结果会在IntelOptimizedPaddle.md

                                        -
                                        -
                                        -

                                        Others

                                        -
                                          -
                                        1. 如果在使用MKL-DNN的情况下,会把CPU的Buffer对齐为4096,具体可以参考MKL-DNN中的memory
                                        2. -
                                        3. 深入PaddlePaddle,寻找有没有其他可以优化的可能,进一步优化。比如可能会用OpenMP改进SGD的更新性能。
                                        4. -
                                        -
                                        -
                                        -
                                        -

                                        Design Concerns

                                        -

                                        为了更好的符合PaddlePaddle的代码风格[3],同时又尽可能少的牺牲MKL-DNN的性能[4]。

                                        -

                                        我们总结出一些特别需要注意的点:

                                        -
                                          -
                                        1. 使用**deviceId_**。为了尽可能少的在父类Layer中添加变量或者函数, -我们决定使用已有的deviceId_变量来区分layer的属性,定义-2MKLDNNLayer特有的设备ID。
                                        2. -
                                        3. 重写父类Layer的init函数,修改deviceId_-2,代表这个layer是用于跑在MKL-DNN的环境下。
                                        4. -
                                        5. 创建MKLDNNBase,定义一些除了layer和memory相关的类和函数。 -包括MKL-DNN会用到MKLDNNStreamCPUEngine,和未来可能还会用到FPGAEngine等。
                                        6. -
                                        7. 如果MKL-DNN layer的后面接有cpu device,那么就会使output_.valueextOutVal_共享内存, -同时数据格式就是NCHW,这样下一个cpu device就能拿到正确的数据。 -在有普通的CPU layer时, extOutVal_extOutGrad_的格式始终是NCHW或者NC
                                        8. -
                                        -
                                        -
                                        -

                                        References

                                        -
                                          -
                                        1. MKL small libraryIntel MKL的一个子集。 -主要包括了深度学习相关的数学原语与操作,一般由MKL-DNN在发布新版本时一起更新。
                                        2. -
                                        3. MKL-DNN System Requirements。 -目前在PaddlePaddle中,仅会在支持AVX2指令集及以上的机器才使用MKL-DNN。
                                        4. -
                                        5. 原来的方案会引入nextLayer的信息。 -但是在PaddlePaddle中,无论是重构前的layer还是重构后的op,都不会想要知道next layer/op的信息。
                                        6. -
                                        7. MKL-DNN的高性能格式与PaddlePaddle原有的NCHW不同(PaddlePaddle中的cuDNN部分使用的也是NCHW,所以不存在这个问题)。 -所以需要引入一个转换方法,并且只需要在必要的时候转换这种格式,才能更好的发挥MKL-DNN的性能。
                                        8. -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/mkl/mkldnn_fluid.html b/develop/doc_cn/design/mkl/mkldnn_fluid.html deleted file mode 100644 index a0bca9184f6be47d2e1da8b303ab0c3312ba3e44..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/mkl/mkldnn_fluid.html +++ /dev/null @@ -1,413 +0,0 @@ - - - - - - - - - - - - - Design Doc: Add MKLDNN Kernel in Fluid Operator — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Add MKLDNN Kernel in Fluid Operator

                                        -
                                        -

                                        Principles

                                        -

                                        First of all, we should follow some basical principles like:

                                        -
                                          -
                                        1. How to write a new operator. We are trying to add a new kind of kernel into operators, so basically we should follow this doc.
                                        2. -
                                        3. Supporting new Device/Library. Since MKLDNN is a new library to fluid, we should add MKLDNNDeviceContext and maybe mkldnn_helper.h, just like cudnn_helper.h.
                                        4. -
                                        5. Switch Kernel. Another important point is that we should ensure the data synchronization between different kernel types, which is this topic. So basically we should override GetExpectedKernelType and trans functions to support switching kernels.
                                        6. -
                                        7. The Keys of Operator Kernel Type. Kernel Type is a pivotal conception which can record the Place, Library, DataType and Layout.
                                        8. -
                                        -
                                        -
                                        -

                                        Sulution

                                        -

                                        In general, there are four parts we should follow to run a MKL-DNN primitive.

                                        -
                                          -
                                        • Create a primitive descriptor that describe this operator
                                        • -
                                        • Create a primitive itself by primitive descriptor and the engine
                                        • -
                                        • Create all memory buffers that primitive needed
                                        • -
                                        • Launch a stream to execute the primitive created -More details can refer to here.
                                        • -
                                        -

                                        It’s better to avoid reinitialization of primitives and memory handles in the first three stages in every iteration. So we plan to create a map to record all the primitive and memory, which should not take too much memories as discussed here.

                                        -

                                        It’s assumed that following three conditions should be satisfied.

                                        -
                                          -
                                        1. there is a unique key for each operator instance. May be the actual name of Output Tensor.
                                        2. -
                                        3. the Input Tensor inside Compute function is the one after converted.
                                        4. -
                                        5. we can get the phase(eg. is_test) inside Compute function, otherwise we need to expose this attribue to user.
                                        6. -
                                        -
                                        -

                                        Compute

                                        -

                                        The algorithm of Compute would be described as follow, let’s take conv like an example.

                                        -
                                          PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), "It must use CPUPlace.");
                                        -  PADDLE_ENFORCE(platform::is_mkldnn_library(ctx.GetLibrary()), "It must use MKLDNN Library.");
                                        -
                                        -  auto& dev_ctx = ctx.template device_context<platform::MKLDNNDeviceContext>();
                                        -
                                        -  // find primitive by unique key from mkldnn context
                                        -  // the op_key should be a unique name of this op instance
                                        -  auto& p = dev_ctx.findPrimitive(op_key + "_fwd");
                                        -
                                        -  // assuming the input tensor inside this compute function is the one after converted
                                        -  // this point should be guarantee by another mechanism
                                        -  auto& i = dev_ctx.findMemory(op_key + "_input");
                                        -  
                                        -  if (p == nullptr || i == nullptr || inputSizeChanged(p, i))  {
                                        -    auto fwd_primitive_desc = createPrimitiveDesc(ctx);
                                        -    auto* input = ctx.Input<Tensor>("Input");
                                        -    auto* filter = ctx.Input<Tensor>("Filter");
                                        -    auto* output = ctx.Output<Tensor>("Output");
                                        -    shared_ptr<mkldnn::memory> in(new mkldnn::memory(fwd_primitive_desc->src_primitive_desc(), input->data<T>()));
                                        -    shared_ptr<mkldnn::memory> wgt(new mkldnn::memory(fwd_primitive_desc->weights_primitive_desc(), filter->data<T>()));
                                        -    shared_ptr<mkldnn::memory> out(new mkldnn::memory(fwd_primitive_desc->dst_primitive_desc(), output->mutable_data<T>(ctx.GetPlace())));
                                        -    shared_ptr<mkldnn::conv_fwd> fwd_primitive(new mkldnn::conv_fwd(*fwd_primitive_desc, *in, *wgt, *out));
                                        -
                                        -    dev_ctx.addMemory(op_key+"_input", in);
                                        -    dev_ctx.addMemory(op_key+"_output", out);
                                        -    dev_ctx.addMemory(op_key+"_filer", wgt);
                                        -    dev_ctx.addPrimitive(op_key+"_fwd", fwd_primitive);
                                        -    dev_ctx.addPrimitiveDesc(op_key+"_fwd_PD", fwd_primitive_desc);
                                        -  }
                                        -
                                        -  p = dev_ctx.findPrimitive(op_key + "_fwd");
                                        -
                                        -  PADDLE_ENFORCE(p, "Should have forward Primitive");
                                        -  PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_input"), "Should have input memory");
                                        -  PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_output"), "Should have output memory");
                                        -  PADDLE_ENFORCE(dev_ctx.findMemory(op_unique_key+"_filter"), "Should have filter memory");
                                        -  PADDLE_ENFORCE(dev_ctx.findPrimitiveDesc(op_unique_key+"_fwd_PD"), "Should have forward PrimitiveDesc");
                                        -  dev_ctx.submit(p);
                                        -  dev_ctx.execute();  // the convert primitive should have already contained.
                                        -
                                        -
                                        -

                                        The createPrimitiveDesc returns the primitive descripotor of this operator, would be like this:

                                        -
                                          auto* input = ctx.Input<Tensor>("Input");
                                        -  auto* filter = ctx.Input<Tensor>("Filter");
                                        -  auto* output = ctx.Output<Tensor>("Output");
                                        -  std::vector<int> strides = ctx.Attr<std::vector<int>>("strides");
                                        -  std::vector<int> paddings = ctx.Attr<std::vector<int>>("paddings");
                                        -  std::vector<int> dilations = ctx.Attr<std::vector<int>>("dilations");
                                        -  int groups = ctx.Attr<int>("groups");
                                        -  algorithm algo = static_cast<algorithm>(ctx.Attr<int>("convolution_algorithm_option"));
                                        -  prop_kind pk = ctx.Attr<bool>("is_test") ? prop_kind::forward_inference : prop_kind::forward_training;
                                        -    
                                        -  auto fwd_desc = mkldnn::conv_fwd::desc(/* all the setting above*/);
                                        -  shared_ptr<mkldnn::conv_fwd::primitive_desc> fwd_primitive_desc(new mkldnn::conv_fwd::primitive_desc(fwd_desc, ctx.getEngine()));
                                        -
                                        -  return fwd_primitive_desc;
                                        -  }
                                        -
                                        -
                                        -
                                        -
                                        -

                                        MKLDNNDeviceContext

                                        -

                                        MKLDNNDeviceContext, which is very straightforward, should contain some base information like: stream, engine and the map needed.

                                        -
                                        -
                                        -

                                        mkldnn_helper

                                        -

                                        Some functions would be put in paddle/platform/mkldnn_helper.h.

                                        -
                                          -
                                        • create MKLDNN memories
                                        • -
                                        • create MKLDNN primitives
                                        • -
                                        • error check function
                                        • -
                                        • etc
                                        • -
                                        -
                                        -
                                        -

                                        Kernel Switch

                                        -

                                        We should reorder the different Layout from other device or to other device. GetExpectedKernelType and trans functions can help us to implement it.

                                        -

                                        GetExpectedKernelType should get the context, and this operator can return the best KernelType. -trans would be like this:

                                        -
                                        void trans(inputs, ctx) override {
                                        -  if (NoNeedTrans()) {
                                        -    return;
                                        -  }
                                        -  // find reorder primitive by op_key from context
                                        -  auto& dev_ctx = ctx.template device_context<platform::MKLDNNDeviceContext>();
                                        -  auto& p = dev_ctx.findPrimitive(op_key + "_reorder_input");
                                        -  auto& i = dev_ctx.findMemory(op_key + "_src_input");
                                        -
                                        -  if (p == nullptr || i == nullptr || changeSized(i, input)) {
                                        -    auto prim = createPrimitiveDesc(ctx);
                                        -    auto src = createMemory(memoryDesc(input->dims(), actual_layout), input->data);
                                        -    auto newbuffer = paddle::memory::Alloc(ctx.GetPlace(), input->size_in_bytes());
                                        -    auto dst = createMemory(p->expected_desc(), newbuffer->data);
                                        -    auto reorder_primitive(new mkldnn::reorder(src, dst));
                                        -
                                        -    dev_ctx.addMemory(op_key+"_src_input", src);
                                        -    dev_ctx.addMemory(op_key+"_input", dst);
                                        -    dev_ctx.addPrimitive(op_key+"_reorder_input", reorder_primitive);
                                        -  }
                                        -
                                        -  p = dev_ctx.findPrimitive(op_key + "_reorder_input");
                                        -  PADDLE_ENFORCE(p, "Should have Reorder Primitive");
                                        -  dev_ctx.submit(p);
                                        -  if (! this->isMKLDNNKernel()) {
                                        -    // execute immediately only if this is not mkldnn kernel function.
                                        -    // otherwise, it can be executed with the operator primitive in Compute
                                        -    dev_ctx.stream();
                                        -  }
                                        -  // after submit, the input tensor in ExecutionContext should be changed as the converted one
                                        -  // there should be another mechanism to ensure this
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Unit Test

                                        -

                                        All the functions should be tested corresponding. -TBD

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/model_format.html b/develop/doc_cn/design/model_format.html deleted file mode 100644 index a4d812c0691c9549085e4c630dc6c6a517f9a613..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/model_format.html +++ /dev/null @@ -1,293 +0,0 @@ - - - - - - - - - - - - - Design Doc: Model Format — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Model Format

                                        -
                                        -

                                        Motivation

                                        -

                                        A model is an output of the training process. One complete model consists of two parts, the topology and the parameters. In order to support industrial deployment, the model format must be self-complete and must not expose any training source code.

                                        -

                                        As a result, In PaddlePaddle, the topology is represented as a ProgramDesc, which describes the model structure. The parameters contain all the trainable weights in the model. We must support large size parameters and efficient serialization/deserialization of parameters.

                                        -
                                        -
                                        -

                                        Implementation

                                        -

                                        The topology is saved as a plain text in a detailed self-contain protobuf file.

                                        -

                                        The parameters are saved as a binary file. As we all know, the protobuf message has a limit of 64M size. We have done a benchmark experiment, which shows that protobuf is not fit for the task.

                                        -

                                        As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a LoDTensor, and has a description information proto of LoDTensorDesc. We save the DescProto as the byte string header. It contains all the necessary information, such as the dims, and the LoD information in LoDTensor. A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,

                                        -

                                        The table below shows a tensor’s byte view in detail. Note that all the signed values are written in the little-endian format.

                                        -

                                        |field name | type | description | -| — | — | — | -| version | uint32_t | Version of saved file. Always 0 now. | -| tensor desc length | uint32_t | TensorDesc(Protobuf message) length in bytes. | -| tensor desc | void* | TensorDesc protobuf binary message | -| tensor data | void* | Tensor’s data in binary format. The length of tensor_data is decided by TensorDesc.dims() and TensorDesc.data_type() | -| lod_level | uint64_t | Level of LoD | -| length of lod[0] | uint64_t | [Optional] length of lod[0] in bytes. | -| data of lod[0] | uint64_t* | [Optional] lod[0].data() | -| ... | ... | ... |

                                        -
                                        -
                                        -

                                        Summary

                                        -
                                          -
                                        • We introduce a model format.
                                        • -
                                        • The model represented by its forward-pass computation procedure is saved in a ProgramDesc protobuf message.
                                        • -
                                        • A bunch of specified format binary tensors describe the parameters.
                                        • -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/multi_language_interface/00.why_plain_c.html b/develop/doc_cn/design/multi_language_interface/00.why_plain_c.html deleted file mode 100644 index b9145baccfcac0bd1485e8548f9e91e4e2fe0190..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/multi_language_interface/00.why_plain_c.html +++ /dev/null @@ -1,399 +0,0 @@ - - - - - - - - - - - - - Paddle多语言接口实现 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Paddle多语言接口实现

                                        -
                                        -

                                        背景

                                        -

                                        Paddle需要一个多语言接口,这个接口需要做到:

                                        -
                                          -
                                        • 有标准的,良好的文档
                                            -
                                          • 例如Python可以使用Sphinx生成API文档,golang可以使用GoDoc生成文档。这都需要这个接口按照约定俗成的规则来注释完备。
                                          • -
                                          -
                                        • -
                                        • 不同语言的接口适应不同语言的特性
                                            -
                                          • 例如Java与Python的错误处理是直接扔出来Exception,而对于golang错误处理应该使用返回值。
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -

                                        基本要求

                                        -

                                        Paddle的多语言接口实现包括一下几个方面:

                                        -
                                          -
                                        • 我们使用动态库来分发Paddle。在这个动态库中不嵌入任何其他语言的解释器,也不使用其他动态库。
                                        • -
                                        • 这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号。
                                        • -
                                        • 不导出Paddle内部的结构体、类,仅仅使用void*指针作为类型的句柄(handler)。
                                        • -
                                        • 不使用SWIG这种代码生成器,而是手写多语言绑定。
                                        • -
                                        -
                                        -
                                        -

                                        原因

                                        -
                                        -

                                        使用动态库来分发Paddle

                                        -
                                          -
                                        • Paddle的链接方式比较复杂
                                            -
                                          • 如果用户要把Paddle的静态库(libpaddle.a)链接到自己的程序里,得使用 --whole-archive (for GCC) 或者 --force_load (for Clang) 参数,来确保把 libpaddle.a 里所有的符号都写入自己的程序的二进制文件里。这是因为 Paddle 的源码里使用了object factory design pattern
                                          • -
                                          -
                                        • -
                                        • 编译型语言,例如C/C++使用静态库和动态库难度差不多。但是解释性语言,例如Python或者Java,只能调用Paddle的动态库,否则得把Paddle静态库链接到解释器里。
                                            -
                                          • 解释性语言实际运行的二进制是解释器本身,如果调用静态库只能将静态库与解释器链接。例如对于Java来说,便是将静态库加入JVM中。这对于通常的Java的开发者来说,是不常见的做法。
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -

                                        动态库中不嵌入任何其他语言的解释器

                                        -
                                          -
                                        • 目前Paddle的进程模型是C++内部驱动Python解释器进行模型配置解析和数据读取
                                        • -
                                        • 我们最终的动态库中不嵌入Python或者其他任何语言的解释器。模型配置解析,数据读取均交由其他语言完成
                                        • -
                                        -

                                        现阶段Paddle有一个问题是,Paddle内嵌的Python解释器和外部使用的Python如果版本不同,会直接报错退出。

                                        -
                                        -
                                        -

                                        Paddle动态库中,不引用其他动态库

                                        -
                                          -
                                        • 即这个动态库是不依赖于其他任何文件的,可以在任何机器上执行的。
                                        • -
                                        -
                                        -
                                        -

                                        这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号

                                        -
                                          -
                                        • 由于C++编译器没有名字修饰的规范,不同版本的编译器之间,对于同一段C++代码生成的符号可能不一致。而多语言接口需要直接读取生成的二进制(动态库),需要有稳定的导出符号。
                                        • -
                                        • C语言是有导出符号的标准的,并且在常见的平台上,都是ABI调用标准的。
                                        • -
                                        • 大多数语言都支持使用C语言API
                                        • -
                                        • 使用C99而不使用C89,是因为C99支持Fixed-width integer typesBoolean type
                                        • -
                                        • 使用C99而不使用C11的原因是,C11并没有Paddle特别需要的特性,且C99相对于C11使用更加广泛。
                                        • -
                                        -
                                        -
                                        -

                                        不导出Paddle内部的结构体、类,仅仅使用void*指针作为类型的句柄(handler)

                                        -
                                          -
                                        • Paddle内部的类为C++书写,直接导出到C的接口比较困难。
                                        • -
                                        • 在C-API中使用void*来表示Paddle内部类。再在每一个API中自己检查类型。
                                        • -
                                        -

                                        在C的头文件 paddle_matrix.h 中:

                                        -
                                        typedef void* paddle_matrix;
                                        -typedef int paddle_error;
                                        -
                                        -extern "C"
                                        -paddle_error paddle_matrix_get_shape(paddle_matrix matrix,
                                        -                                     uint64_t* width,
                                        -                                     uint64_t* height);
                                        -
                                        -
                                        -

                                        而在CPP里面实现这个C的接口,文件 paddle_matrix.cpp

                                        -
                                        #include "paddle/math/matrix.h"
                                        -extern "C"
                                        -paddle_error paddle_matrix_shape(paddle_matrix matrix,
                                        -                                 uint64_t *width,
                                        -                                 uint64_t *height) {
                                        -  auto m = (paddle::capi::CMatrix*)(matrix);
                                        -  *width = m->width();
                                        -  *height = m->height();
                                        -}
                                        -
                                        -
                                        -

                                        其中paddle/capi/CMatrix.hpp文件内容为:

                                        -
                                        namespace paddle {
                                        -namespace math {  
                                        -
                                        -class CMatrix {
                                        -  std::shared_ptr<paddle::Matrix> mat;
                                        -};
                                        -
                                        -}  // namespace math
                                        -}  // namespace paddle
                                        -
                                        -
                                        -
                                        -
                                        -

                                        不使用SWIG这种代码生成器,而是手写多语言绑定

                                        -
                                          -
                                        • SWIG是一个多语言接口的代码生成器。他的目标是使用C/C++写代码,SWIG直接读取C/C++的头文件,生成各种语言的绑定代码。
                                            -
                                          • 对于多语言接口,SWIG需要写一个interface文件。这个文件具有独特的语法,学习成本高。且增加一个第三方语言,就需要对这个第三方语言增加一些定义。有的时候,interface文件的写法非常tricky。社区贡献代码学习成本高。
                                          • -
                                          • SWIG暴露的接口保留了C++的接口样式,很难保证多语言代码风格的一致性。(函数命名,错误处理)
                                              -
                                            • 因为SWIG在第三方语言中暴露的函数名,类名和C++中完全一致。C++的命名风格并不能适应其他第三方语言。如果使用SWIG我们需要将在interface文件里,将大量的SomeCppClass重命名成some_python_class,或者SomeGoTypes
                                            • -
                                            • 对于不同语言,错误处理的方式也不尽相同。例如对于Java或者Python,最常见的错误处理方式是Exception,而对于Golang,错误处理方式是返回值。而SWIG只能简单的暴露C++接口,无法做到对于各种语言错误处理方式的适配。
                                            • -
                                            -
                                          • -
                                          • 对于大多数语言,直接使用C语言的.h并不困难。例如Python的cffi或者Cython, golang的cgo
                                          • -
                                          • SWIG支持的语言或者解释器有局限。例如对于Python,使用SWIG只支持CPython解释器,而不支持PyPy解释器。
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        原因列表

                                        -

                                        | 结论 | 对比 | 原因 | -|—| — | — | -| 使用动态库 | 不使用静态库 | 解释型语言只能调用动态库,Paddle静态库链接复杂 | -| 不嵌入其他语言解释器 | 不嵌入Python解释器 | Paddle C++目前嵌入Python解释器,会导致不同版本Python在一个进程里的bug | -| 不引用其他动态库 | | Paddle一个动态库可以在任何Linux系统上运行 | -| 使用C99做接口 | 不使用C++做接口 | C有标准的ABI,C99是目前C最广泛的使用标准,且C99支持bool类型和定长整数(uint64_t等)类型 | -| 使用void*作为类句柄 | 不显示的写每个类具体包含什么| 实现简单,并且让接口脱离实现细节 | -| 手写多语言绑定 | 不使用SWIG | 使用SWIG需要多语言绑定的开发人员熟练掌握SWIG配置,社区参与困难。SWIG生成的代码不能保证多语言代码风格的一致性 |

                                        -
                                        -
                                        -

                                        实现

                                        -

                                        参考Inference implementation

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/multi_language_interface/01.inference_implementation.html b/develop/doc_cn/design/multi_language_interface/01.inference_implementation.html deleted file mode 100644 index 27f4d7343eb9007497773ac9cbeda94fbe53d229..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/multi_language_interface/01.inference_implementation.html +++ /dev/null @@ -1,398 +0,0 @@ - - - - - - - - - - - - - C-API 模型推断实现文档 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        C-API 模型推断实现文档

                                        -

                                        本文档描述Paddle C-API的实现细节。Paddle C-API是多语言API的基础部分。Paddle需要暴露的API很多。先实现模型推断的API,通过模型推断API的实现作为一个样例,来进行讨论。至于为什么需要C-API,请参考Why Plain C

                                        - -
                                        -

                                        暴露接口原则

                                        -
                                          -
                                        1. 所有的接口均为C接口。即使用extern "C"
                                        2. -
                                        3. 除构造某种类型的函数(paddle_matrix_create等),其他函数均返回paddle_error。且调用时不能抛出异常或出现运行时错误。
                                        4. -
                                        5. 所有类型名为paddle_类型名,所有与类型相关的函数,函数名为paddle_类型名_函数名
                                        6. -
                                        7. 如果某一个Paddle Core概念(GradientMachine/Matrix)需要被暴露到其他语言,那么
                                            -
                                          • 为了暴露的接口尽量简单。只暴露概念的接口,而不暴露概念的实现。即暴露GradientMachine或者Matrix但不暴露RecurrentGradientMachineCpuSparseMatrix
                                          • -
                                          • 暴露这个概念必要函数。必要是指,即完成某一个任务的最少函数。
                                          • -
                                          -
                                        8. -
                                        9. 不在capi接口层做过多封装。
                                            -
                                          • 如果某一个Paddle概念必须要暴露,但是又过于琐碎。不在capi这一层进行封装,而是直接修改Paddle Core。让Paddle核心中,这一概念不再琐碎。
                                          • -
                                          -
                                        10. -
                                        -
                                        -
                                        -

                                        目录结构

                                        -
                                        Paddle
                                        -  `-- paddle
                                        -        `-- capi
                                        -              `-- examples  # The example project for C-API.
                                        -              `-- tests  # unittests for C-API
                                        -              `-- capi.h  # C-API header file.
                                        -              `-- capi_private.h  # The shared header file between implementation sources.
                                        -              `-- matrix.{h, cpp}
                                        -              `-- gradient_machine.{h, cpp}
                                        -              `-- ...
                                        -
                                        -
                                        -

                                        Paddle的C-API目录结构如上图表所示。这个目录中除了capi_private.h之外的所有头文件,均会被安装到include/paddle路径下。C-API生成的二进制文件会被安装到lib目录下。即,安装后的目录结构为

                                        -
                                        `-- include
                                        -      `-- paddle
                                        -             `-- capi.h
                                        -             `-- matrix.h
                                        -             `-- gradient_machine.h
                                        -             `-- ...
                                        -`-- lib
                                        -     `-- libpaddle_capi_shared.{so, dylib}  # In mac, dynamic libary's file name extention is `dylib`
                                        -     `-- libpaddle_capi_whole.a  # static library for all symbols of Paddle.
                                        -
                                        -
                                        -
                                        -
                                        -

                                        实现方式

                                        -

                                        下面分别介绍某一类文件的实现方式。

                                        -
                                        -

                                        capi.h

                                        -

                                        capi.h是用户使用C-API时所唯一需要引入的头文件。在capi.h中,引入了类型的头文件,matrix.h, gradient_machine.h。在引入其他类型的头文件时,使用相对路径的引用方式。即#include "matrix.h"

                                        -
                                        -
                                        -

                                        具体某种类型的头文件

                                        -

                                        具体某种类型的头文件,即例如matrix.hgradient_machine.h等。在这些头文件中,包含了某种类型的类型定义和暴露的全部函数。

                                        -

                                        这个头文件不假设其他文件的引用顺序,即使用户直接引用某种类型的头文件,也不应该报错(虽然不鼓励这样)。如果某一个类型需要引用另一个类型,例如gradient_machine需要引用matrix,则直接引入另一种类型的头文件,即#include "matrix.h"

                                        -
                                        -
                                        -

                                        capi_private.h

                                        -

                                        capi_prviate.h是各个实现中共享的头文件,他主要包含了实际暴露的类型结构。在用户使用C-API时,Paddle的类型全部退化成void *,即typedef paddle_matrix void*。但,对于每种C-API暴露的类型,均是在capi_private.h中实现的结构体。

                                        -
                                        struct CMatrix {
                                        -   int type = MatrixType;
                                        -   std::shared_ptr<paddle::Matrix> mat;
                                        -};
                                        -
                                        -
                                        -

                                        通常,这个结构体包含两个项目。

                                        -
                                          -
                                        • type是一个类型的标志。对于每种类型,type字段均不尽相同。这样,即使C-API接受的类型全是void *,我们也可以确定每一个参数的类型。

                                          -
                                          void some_c_api_function(void* some_instance) {
                                          -   int* type = (int *) some_instance;
                                          -   switch (*type) {
                                          -     case MatrixType:
                                          -       CMatrix* mat = (CMatrix *) some_instance;
                                          -       ...
                                          -     ...
                                          -   }
                                          -}
                                          -
                                          -
                                          -
                                        • -
                                        • 这个结构体中的另一个项目是,Paddle Core中这一类型接口的智能指针(shared_ptr)。

                                          -
                                            -
                                          • 使用智能指针的原因是: 用户可以安全的释放某个C-API的实例,而不必在意Paddle Core是否还在使用这个实例。
                                          • -
                                          • 例如,用户通过C-API获得了神经网络的参数实例。当用户使用完这个参数后,直接删除这个参数即可。即便Paddle Core中的模型还在使用这个参数,这个参数也不会一并删除。
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -

                                        具体某种类型的实现文件

                                        -

                                        具体某种类型的实现文件,即matrix.cpp, gradient_machine.cpp等文件。在这些文件中,使用C++ 11实现了C-API的接口,并且使用extern "C"导出这些接口。在实现过程中,对输入参数的安全性进行了必要的判断,并将C-API接口的参数转发给Paddle Core

                                        -
                                        -
                                        -

                                        libpaddle_capi_shared.{so, dylib}

                                        -

                                        libpaddle_capi_shared是C-API导出的动态库。这个动态库的连接参数与Paddle的其他二进制(例如paddle_trainer)类似。用户可以直接使用这个动态库来引入Paddle C-API。具体使用方法为-lpaddle_capi_shared

                                        -
                                        -
                                        -

                                        libpaddle_capi_whole.a

                                        -

                                        libpaddle_capi_whole是C-API导出的静态库。这个静态库包含了Paddle的全部符号。他是将libpaddle_gserver.a, libpaddle_math.a, libpaddle_capi.a等全部静态库中的目标文件全部打包后产生的文件。具体使用方法为--whole-archive -lpaddle_capi_whole --no-whole-archive

                                        -
                                        -
                                        -

                                        examples

                                        -

                                        在样例中,使用C99开发了模型预测的样例代码。具体请参考example/README.md

                                        -
                                        -
                                        -
                                        -

                                        编译选项

                                        -

                                        C-API的编译选项默认关闭,打开这个编译选项,需要在cmake的时候,设置

                                        -
                                        cmake ${YOUR_SOURCE_ROOT} -DWITH_C_API=ON -DWITH_PYTHON=OFF -DWITH_SWIG_PY=OFF
                                        -
                                        -
                                        -

                                        编译C-API的时候推荐Paddle不嵌入Python解释器,也不生成SWIG接口,具体原因参考Why Plain C

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/operator_kernel_type.html b/develop/doc_cn/design/operator_kernel_type.html deleted file mode 100644 index b17355df42d09efeae43f7dee6fb1814e7bd4327..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/operator_kernel_type.html +++ /dev/null @@ -1,335 +0,0 @@ - - - - - - - - - - - - - Design Doc: The Keys of Operator Kernel Type — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: The Keys of Operator Kernel Type

                                        -
                                        -

                                        Problem

                                        -

                                        An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses OpKernelType as a key to identify a unique kernel. Before an operator runs, a certain type of kernel must be chosen via a key of OpKernelType. Currently, OpKernelType is defined as follows:

                                        -
                                        struct OpKernelType {
                                        -  platform::Place place_;
                                        -  proto::DataType data_type_;
                                        -};
                                        -
                                        -
                                        -

                                        For more details, please refer to codes in github.

                                        -

                                        It contains two keys, Place and DataType. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys do not provide enough information. We need a more complete representation of OpKernelType.

                                        -

                                        We often implement a kernel of an operator with some computing library on certain device(place). Please note that computing library and device do not have a one-to-one correspondence. A device can have a lot of computing libraries and a computing library can also support different devices.

                                        -

                                        For example, Eigen library supports Nvidia GPU/AMD GPU/CPU and MKLDNN library supports Intel CPU/Intel FPGA. Both Place and Library should be a key of OpKernelType.

                                        -

                                        Different DataTypes, such as fp64/fp32/int8, will obviously have different kernels. But different data layout of a Tensor will also lead to different implementations. Please refer to the batch norm operator kernels as an example. Data layout should also be taken into consideration.

                                        -
                                        -
                                        -

                                        Solution

                                        -

                                        There are four keys to determine a kernel type of an operator: Place/Library/DataType/Layout.

                                        -
                                        struct OpKernelType {
                                        -  platform::Place place_;
                                        -  platform::Library library_;
                                        -  proto::DataType data_type_;
                                        -  framework::Layout layout_;
                                        -};
                                        -
                                        -
                                        -

                                        The details are as follows:

                                        -
                                        -

                                        Place

                                        -

                                        Place is defined as:

                                        -
                                        typedef boost::variant<CUDAPlace, ROCmPlace, FPGAPlace, CPUPlace> Place;
                                        -
                                        -
                                        -

                                        Place represents the device memory where data is located.

                                        -
                                        -
                                        -

                                        Library

                                        -

                                        One operator kernel is usually implemented based on one library. Library is defined as a enum variable:

                                        -
                                        enum Library { Plain, MKLDNN, CUDNN };
                                        -
                                        -
                                        -

                                        We use Plain enumerator to represent default library. Since most operators in Fluid are implemented based on the Eigen library, we take Eigen library as the Plain enumerator. -A library usually has a corresponding DeviceContext which contains some handles needed for computation. Fluid now has two default DeviceContexts for CPU and CUDA, namely, CPUDeviceContext and CUDADeviceContext. CPUDeviceContext contains an Eigen library handle and CDUADeviceContext contains an Eigen library handle and a cuBLAS handle.

                                        -

                                        If we want to support new library, a new enumerator need to be added to Library and a corresponding new LibraryDeviceContext need to be created.

                                        -
                                        -
                                        -

                                        DataType

                                        -

                                        DataType is defined in framework.proto. Currently, int32/int64/fp32/fp64 are supported.

                                        -
                                        -
                                        -

                                        Layout

                                        -

                                        Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout.

                                        -

                                        Different layout leads to different implementation of the operator kernel. There are mainly 4 principles we have to follow to support layout in our Fluid framework.

                                        -
                                          -
                                        • We take layout as a data member of Tensor. Layout is actually a enum variable. If Fluid is built with MKLDNN, then the memory format in MKLDNN will also be added into this enum variable.
                                        • -
                                        • Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout for generating data. Of course, we can have some default layout, like NCHW.
                                        • -
                                        • The inference of Layout is at run-time, not at compile-time.
                                        • -
                                        • Every operator has to implement different kernels for different layouts. Let’s take MKLDNN as an example. If we want to implement an MKLDNN convolution operator, we have to implement all the kernels for different layouts, which are listed here. And we will have a special macro to register kernels for MKLDNN operators.
                                        • -
                                        -

                                        Layout is also defined as a enum variable:

                                        -
                                        enum Layout {
                                        -  kNCHW,
                                        -  kNHWC,
                                        -#ifdef PADDLE_WITH_MKLDNN
                                        -  knChw8c
                                        -  ...
                                        -#endif
                                        -};
                                        -
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/ops/rnn.html b/develop/doc_cn/design/ops/rnn.html deleted file mode 100644 index b577297325303a2bef77272a8a79173d84533d6b..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/ops/rnn.html +++ /dev/null @@ -1,394 +0,0 @@ - - - - - - - - - - - - - RNNOp design — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        RNNOp design

                                        -

                                        This document describes the RNN (Recurrent Neural Network) operator and how it is implemented in PaddlePaddle. The RNN op requires that all instances in a mini-batch have the same length. We will have a more flexible dynamic RNN operator in the future.

                                        -
                                        -

                                        RNN Algorithm Implementation

                                        -

                                        - -

                                        The above diagram shows an RNN unrolled into a full network.

                                        -

                                        There are several important concepts here:

                                        -
                                          -
                                        • step-net: the sub-graph that runs at each step.
                                        • -
                                        • memory, $h_t$, the state of the current step.
                                        • -
                                        • ex-memory, $h_{t-1}$, the state of the previous step.
                                        • -
                                        • initial memory value, the memory of the first (initial) step.
                                        • -
                                        -
                                        -

                                        Step-scope

                                        -

                                        There could be local variables defined in each step-net. PaddlePaddle runtime realizes these variables in step-scopes which are created for each step.

                                        -

                                        -
                                        -Figure 2 illustrates the RNN's data flow -

                                        Please be aware that every step runs the same step-net. Each step does the following:

                                        -
                                          -
                                        1. Creates the step-scope.
                                        2. -
                                        3. Initializes the local variables including step-outputs, in the step-scope.
                                        4. -
                                        5. Runs the step-net, which uses the above mentioned variables.
                                        6. -
                                        -

                                        The RNN operator will compose its output from step outputs in each of the step scopes.

                                        -
                                        -
                                        -

                                        Memory and Ex-memory

                                        -

                                        Let’s give more details about memory and ex-memory using a simple example:

                                        -

                                        $$ -h_t = U h_{t-1} + W x_t -$$,

                                        -

                                        where $h_t$ and $h_{t-1}$ are the memory and ex-memory (previous memory) of step $t$ respectively.

                                        -

                                        In the implementation, we can make an ex-memory variable either “refer to” the memory variable of the previous step, -or copy the memory value of the previous step to the current ex-memory variable.

                                        -
                                        -
                                        -

                                        Usage in Python

                                        -

                                        For more information on Block, please refer to the design doc.

                                        -

                                        We can define an RNN’s step-net using a Block:

                                        -
                                        import paddle as pd
                                        -
                                        -X = some_op() # x is some operator's output and is a LoDTensor
                                        -a = some_op()
                                        -
                                        -# declare parameters
                                        -W = pd.Variable(shape=[20, 30])
                                        -U = pd.Variable(shape=[20, 30])
                                        -
                                        -rnn = pd.create_rnn_op(output_num=1)
                                        -with rnn.stepnet():
                                        -    x = rnn.add_input(X)
                                        -    # declare a memory (rnn's step)
                                        -    h = rnn.add_memory(init=a)
                                        -    # h.pre_state(), the previous memory of rnn
                                        -    new_state = pd.add_two( pd.matmul(W, x) + pd.matmul(U, h.pre_state()))
                                        -    # update current memory
                                        -    h.update(new_state)
                                        -    # indicate that h variables in all step scopes should be merged
                                        -    rnn.add_outputs(h)
                                        -
                                        -out = rnn()
                                        -
                                        -
                                        -

                                        Python API functions in above example:

                                        -
                                          -
                                        • rnn.add_input: indicates that the parameter is a variable that will be segmented into step-inputs.
                                        • -
                                        • rnn.add_memory: creates a variable used as the memory.
                                        • -
                                        • rnn.add_outputs: marks the variables that will be concatenated across steps into the RNN output.
                                        • -
                                        -
                                        -
                                        -

                                        Nested RNN and LoDTensor

                                        -

                                        An RNN whose step-net includes other RNN operators is known as an nested RNN.

                                        -

                                        For example, we could have a 2-level RNN, where the top level corresponds to paragraphs, and the lower level corresponds to sentences. Each step of the higher level RNN also receives an input from the corresponding step of the lower level, and additionally the output from the previous time step at the same level.

                                        -

                                        The following figure illustrates feeding in text into the lower level, one sentence at a step, and the feeding in step outputs to the top level. The final top level output is about the whole text.

                                        -

                                        - -

                                        import paddle as pd
                                        -
                                        -W = pd.Variable(shape=[20, 30])
                                        -U = pd.Variable(shape=[20, 30])
                                        -
                                        -W0 = pd.Variable(shape=[20, 30])
                                        -U0 = pd.Variable(shape=[20, 30])
                                        -
                                        -# a is output of some op
                                        -a = some_op()
                                        -
                                        -# chapter_data is a set of 128-dim word vectors
                                        -# the first level of LoD is sentence
                                        -# the second level of LoD is a chapter
                                        -chapter_data = pd.Variable(shape=[None, 128], type=pd.lod_tensor, level=2)
                                        -
                                        -def lower_level_rnn(paragraph):
                                        -    '''
                                        -    x: the input
                                        -    '''
                                        -    rnn = pd.create_rnn_op(output_num=1)
                                        -    with rnn.stepnet():
                                        -        sentence = rnn.add_input(paragraph, level=0)
                                        -        h = rnn.add_memory(shape=[20, 30])
                                        -        h.update(
                                        -            pd.matmul(W, sentence) + pd.matmul(U, h.pre_state()))
                                        -        # get the last state as sentence's info
                                        -        rnn.add_outputs(h)
                                        -    return rnn
                                        -
                                        -top_level_rnn = pd.create_rnn_op(output_num=1)
                                        -with top_level_rnn.stepnet():
                                        -    paragraph_data = rnn.add_input(chapter_data, level=1)
                                        -    low_rnn = lower_level_rnn(paragraph_data)
                                        -    paragraph_out = low_rnn()
                                        -
                                        -    h = rnn.add_memory(init=a)
                                        -    h.update(
                                        -        pd.matmul(W0, paragraph_data) + pd.matmul(U0, h.pre_state()))
                                        -    top_level_rnn.add_outputs(h)
                                        -
                                        -# output the last step
                                        -chapter_out = top_level_rnn(output_all_steps=False)
                                        -
                                        -
                                        -

                                        In the above example, the construction of the top_level_rnn calls lower_level_rnn. The input is an LoD Tensor. The top level RNN segments input text data into paragraphs, and the lower level RNN segments each paragraph into sentences.

                                        -

                                        By default, the RNNOp will concatenate the outputs from all the time steps. -If the output_all_steps is set to False, it will only output the final time step.

                                        -

                                        - -

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/ops/sequence_decoder.html b/develop/doc_cn/design/ops/sequence_decoder.html deleted file mode 100644 index 639f9071873d104585e930b3219d6aa53651d188..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/ops/sequence_decoder.html +++ /dev/null @@ -1,469 +0,0 @@ - - - - - - - - - - - - - Design: Sequence Decoder Generating LoDTensors — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design: Sequence Decoder Generating LoDTensors

                                        -

                                        In tasks such as machine translation and visual captioning, -a sequence decoder is necessary to generate sequences, one word at a time.

                                        -

                                        This documentation describes how to implement the sequence decoder as an operator.

                                        -
                                        -

                                        Beam Search based Decoder

                                        -

                                        The beam search algorithm is necessary when generating sequences. It is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.

                                        -

                                        In the old version of PaddlePaddle, the C++ class RecurrentGradientMachine implements the general sequence decoder based on beam search, due to the complexity involved, the implementation relies on a lot of special data structures that are quite trivial and hard to be customized by users.

                                        -

                                        There are a lot of heuristic tricks in the sequence generation tasks, so the flexibility of sequence decoder is very important to users.

                                        -

                                        During the refactoring of PaddlePaddle, some new concepts are proposed such as: LoDTensor and TensorArray that can better support the sequence usage, and they can also help make the implementation of beam search based sequence decoder more transparent and modular .

                                        -

                                        For example, the RNN states, candidates IDs and probabilities of beam search can be represented all as LoDTensors; -the selected candidate’s IDs in each time step can be stored in a TensorArray, and Packed to the sentences translated.

                                        -
                                        -
                                        -

                                        Changing LoD’s absolute offset to relative offsets

                                        -

                                        The current LoDTensor is designed to store levels of variable-length sequences. It stores several arrays of integers where each represents a level.

                                        -

                                        The integers in each level represent the begin and end (not inclusive) offset of a sequence in the underlying tensor, -let’s call this format the absolute-offset LoD for clarity.

                                        -

                                        The absolute-offset LoD can retrieve any sequence very quickly but fails to represent empty sequences, for example, a two-level LoD is as follows

                                        -
                                        [[0, 3, 9]
                                        - [0, 2, 3, 3, 3, 9]]
                                        -
                                        -
                                        -

                                        The first level tells that there are two sequences:

                                        -
                                          -
                                        • the first’s offset is [0, 3)
                                        • -
                                        • the second’s offset is [3, 9)
                                        • -
                                        -

                                        while on the second level, there are several empty sequences that both begin and end at 3. -It is impossible to tell how many empty second-level sequences exist in the first-level sequences.

                                        -

                                        There are many scenarios that rely on empty sequence representation, for example in machine translation or visual captioning, one instance has no translation or the empty candidate set for a prefix.

                                        -

                                        So let’s introduce another format of LoD, -it stores the offsets of the lower level sequences and is called relative-offset LoD.

                                        -

                                        For example, to represent the same sequences of the above data

                                        -
                                        [[0, 3, 6]
                                        - [0, 2, 3, 3, 3, 9]]
                                        -
                                        -
                                        -

                                        the first level represents that there are two sequences, -their offsets in the second-level LoD is [0, 3) and [3, 5).

                                        -

                                        The second level is the same with the relative offset example because the lower level is a tensor. -It is easy to find out the second sequence in the first-level LoD has two empty sequences.

                                        -

                                        The following examples are based on relative-offset LoD.

                                        -
                                        -
                                        -

                                        Usage in a simple machine translation model

                                        -

                                        Let’s start from a simple machine translation model that is simplified from the machine translation chapter to draw a blueprint of what a sequence decoder can do and how to use it.

                                        -

                                        The model has an encoder that learns the semantic vector from a sequence, and a decoder which uses the sequence encoder to generate new sentences.

                                        -

                                        Encoder

                                        -
                                        import paddle as pd
                                        -
                                        -dict_size = 8000
                                        -source_dict_size = dict_size
                                        -target_dict_size = dict_size
                                        -word_vector_dim = 128
                                        -encoder_dim = 128
                                        -decoder_dim = 128
                                        -beam_size = 5
                                        -max_length = 120
                                        -
                                        -# encoder
                                        -src_word_id = pd.data(
                                        -    name='source_language_word',
                                        -    type=pd.data.integer_value_sequence(source_dict_dim))
                                        -src_embedding = pd.embedding(size=source_dict_size, size=word_vector_dim)
                                        -
                                        -src_word_vec = pd.lookup(src_embedding, src_word_id)
                                        -
                                        -encoder_out_seq = pd.gru(input=src_word_vec, size=encoder_dim)
                                        -
                                        -encoder_ctx = pd.last_seq(encoder_out_seq)
                                        -# encoder_ctx_proj is the learned semantic vector
                                        -encoder_ctx_proj = pd.fc(
                                        -    encoder_ctx, size=decoder_dim, act=pd.activation.Tanh(), bias=None)
                                        -
                                        -
                                        -

                                        Decoder

                                        -
                                        def generate():
                                        -    decoder = pd.while_loop()
                                        -    with decoder.step():
                                        -        decoder_mem = decoder.memory(init=encoder_ctx)  # mark the memory
                                        -        generated_ids = decoder.memory() # TODO init to batch_size <s>s
                                        -        generated_scores = decoder.memory() # TODO init to batch_size 1s or 0s
                                        -
                                        -        target_word = pd.lookup(trg_embedding, gendrated_ids)
                                        -        # expand encoder_ctx's batch to fit target_word's lod
                                        -        # for example
                                        -        # decoder_mem.lod is
                                        -        # [[0 1 3],
                                        -        #  [0 1 3 6]]
                                        -        # its tensor content is [a1 a2 a3 a4 a5]
                                        -        # which means there are 2 sentences to translate
                                        -        #   - the first sentence has 1 translation prefixes, the offsets are [0, 1)
                                        -        #   - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)
                                        -        # the target_word.lod is
                                        -        # [[0, 1, 6]
                                        -        #  [0, 2, 4, 7, 9 12]]
                                        -        # which means 2 sentences to translate, each has 1 and 5 prefixes
                                        -        # the first prefix has 2 candidates
                                        -        # the following has 2, 3, 2, 3 candidates
                                        -        # the encoder_ctx_expanded's content will be
                                        -        # [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5]
                                        -        encoder_ctx_expanded = pd.lod_expand(encoder_ctx, target_word)
                                        -        decoder_input = pd.fc(
                                        -            act=pd.activation.Linear(),
                                        -            input=[target_word, encoder_ctx_expanded],
                                        -            size=3 * decoder_dim)
                                        -        gru_out, cur_mem = pd.gru_step(
                                        -            decoder_input, mem=decoder_mem, size=decoder_dim)
                                        -        scores = pd.fc(
                                        -            gru_out,
                                        -            size=trg_dic_size,
                                        -            bias=None,
                                        -            act=pd.activation.Softmax())
                                        -        # K is an config
                                        -        topk_scores, topk_ids = pd.top_k(scores, K)
                                        -        topk_generated_scores = pd.add_scalar(topk_scores, generated_scores)
                                        -
                                        -        selected_ids, selected_generation_scores = decoder.beam_search(
                                        -            topk_ids, topk_generated_scores)
                                        -
                                        -        # update the states
                                        -        decoder_mem.update(cur_mem)  # tells how to update state
                                        -        generated_ids.update(selected_ids)
                                        -        generated_scores.update(selected_generation_scores)
                                        -
                                        -        decoder.output(selected_ids)
                                        -        decoder.output(selected_generation_scores)
                                        -
                                        -translation_ids, translation_scores = decoder()
                                        -
                                        -
                                        -

                                        The decoder.beam_search is an operator that, given the candidates and the scores of translations including the candidates, -returns the result of the beam search algorithm.

                                        -

                                        In this way, users can customize anything on the input or output of beam search, for example:

                                        -
                                          -
                                        1. Make the corresponding elements in topk_generated_scores zero or some small values, beam_search will discard this candidate.
                                        2. -
                                        3. Remove some specific candidate in selected_ids.
                                        4. -
                                        5. Get the final translation_ids, remove the translation sequence in it.
                                        6. -
                                        -

                                        The implementation of sequence decoder can reuse the C++ class: RNNAlgorithm, -so the python syntax is quite similar to that of an RNN.

                                        -

                                        Both of them are two-level LoDTensors:

                                        -
                                          -
                                        • The first level represents batch_size of (source) sentences.
                                        • -
                                        • The second level represents the candidate ID sets for translation prefix.
                                        • -
                                        -

                                        For example, 3 source sentences to translate, and has 2, 3, 1 candidates.

                                        -

                                        Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape, and an lod_expand operator is used to expand the LoD of the previous state to fit the current state.

                                        -

                                        For example, the previous state:

                                        -
                                          -
                                        • LoD is [0, 1, 3][0, 2, 5, 6]
                                        • -
                                        • content of tensor is a1 a2 b1 b2 b3 c1
                                        • -
                                        -

                                        the current state is stored in encoder_ctx_expanded:

                                        -
                                          -
                                        • LoD is [0, 2, 7][0 3 5 8 9 11 11]
                                        • -
                                        • the content is
                                            -
                                          • a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates)
                                          • -
                                          • a2 a2
                                          • -
                                          • b1 b1 b1
                                          • -
                                          • b2
                                          • -
                                          • b3 b3
                                          • -
                                          • None (c1 has 0 candidates, so c1 is dropped)
                                          • -
                                          -
                                        • -
                                        -

                                        The benefit from the relative offset LoD is that the empty candidate set can be represented naturally.

                                        -

                                        The status in each time step can be stored in TensorArray, and Packed to a final LoDTensor. The corresponding syntax is:

                                        -
                                        decoder.output(selected_ids)
                                        -decoder.output(selected_generation_scores)
                                        -
                                        -
                                        -

                                        The selected_ids are the candidate ids for the prefixes, and will be Packed by TensorArray to a two-level LoDTensor, where the first level represents the source sequences and the second level represents generated sequences.

                                        -

                                        Packing the selected_scores will get a LoDTensor that stores scores of each translation candidate.

                                        -

                                        Packing the selected_generation_scores will get a LoDTensor, and each tail is the probability of the translation.

                                        -
                                        -
                                        -

                                        LoD and shape changes during decoding

                                        -

                                        - -

                                        According to the image above, the only phase that changes the LoD is beam search.

                                        -
                                        -
                                        -

                                        Beam search design

                                        -

                                        The beam search algorithm will be implemented as one method of the sequence decoder and has 3 inputs:

                                        -
                                          -
                                        1. topk_ids, the top K candidate ids for each prefix.
                                        2. -
                                        3. topk_scores, the corresponding scores for topk_ids
                                        4. -
                                        5. generated_scores, the score of the prefixes.
                                        6. -
                                        -

                                        All of these are LoDTensors, so that the sequence affiliation is clear. Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.

                                        -

                                        It will return three variables:

                                        -
                                          -
                                        1. selected_ids, the final candidate beam search function selected for the next step.
                                        2. -
                                        3. selected_scores, the scores for the candidates.
                                        4. -
                                        5. generated_scores, the updated scores for each prefix (with the new candidates appended).
                                        6. -
                                        -
                                        -
                                        -

                                        Introducing the LoD-based Pack and Unpack methods in TensorArray

                                        -

                                        The selected_ids, selected_scores and generated_scores are LoDTensors that exist at each time step, -so it is natural to store them in arrays.

                                        -

                                        Currently, PaddlePaddle has a module called TensorArray which can store an array of tensors. It is better to store the results of beam search in a TensorArray.

                                        -

                                        The Pack and UnPack in TensorArray are used to pack tensors in the array to an LoDTensor or split the LoDTensor to an array of tensors. -It needs some extensions to support the packing or unpacking an array of LoDTensors.

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/optimizer.html b/develop/doc_cn/design/optimizer.html deleted file mode 100644 index 17dd53a9cbc0738ebd199ccb694bf689e59f0f94..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/optimizer.html +++ /dev/null @@ -1,349 +0,0 @@ - - - - - - - - - - - - - Optimizer Design — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Optimizer Design

                                        -
                                        -

                                        The Problem

                                        -

                                        A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works:

                                        -
                                          -
                                        1. the forward pass, which computes intermediate results and the cost(s),
                                        2. -
                                        3. the backward pass, which derives gradients from intermediate results and costs, and
                                        4. -
                                        5. the optimization pass, which update model parameters to optimize the cost(s).
                                        6. -
                                        -

                                        These works rely on three kinds of operators:

                                        -
                                          -
                                        1. forward operators,
                                        2. -
                                        3. gradient operators, and
                                        4. -
                                        5. optimization operators.
                                        6. -
                                        -

                                        It’s true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically.

                                        -

                                        In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass.

                                        -
                                        -
                                        -

                                        High-level Python API to describe the training process

                                        -
                                          -
                                        1. User write code to describe the network:

                                          -
                                          images = layer.data("images")
                                          -labels = layer.data("labels")
                                          -w1 = pd.var("w1")
                                          -b1 = pd.var("b1")
                                          -hidden = layer.fc(images, w=w1, b=b1)
                                          -cost = layer.mse(hidden, labels)
                                          -
                                          -
                                          -

                                          The above code snippet will create forward operators in Block.

                                          -
                                        2. -
                                        -
                                          -
                                        1. Users create a certain kind of Optimizer with some argument.

                                          -
                                          optimizer = AdagradOptimizer(learing_rate=0.001)
                                          -
                                          -
                                          -
                                        2. -
                                        3. Users use the optimizer to minimize a certain cost through updating parameters in parameter_list.

                                          -
                                          opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1])
                                          -
                                          -
                                          -

                                          The above code snippet will create gradient and optimization operators in Block. The return value of minimize() is list of optimization operators that will be run by session.

                                          -
                                        4. -
                                        5. Users use Session/Executor to run this opt_op_list as target to do training.

                                          -
                                          sess.run(target= opt_op_list, ...)
                                          -
                                          -
                                          -
                                        6. -
                                        -
                                        -

                                        Optimizer Python interface:

                                        -
                                        class Optimizer(object):
                                        -    """Optimizer Base class.
                                        -
                                        -    """
                                        -
                                        -    def __init__(self):
                                        -        pass
                                        -
                                        -    def create_optimization_pass(self, parameters_and_grads):
                                        -        """Add optimization operators to update gradients to variables.
                                        -
                                        -        Args:
                                        -          parameters_and_grads: a list of (variable, gradient) pair to update.
                                        -
                                        -        Returns:
                                        -          optmization_op_list: a list of optimization operator that will update parameter using gradient.
                                        -        """
                                        -        return None
                                        -
                                        -    def minimize(self, loss, parameter_list):
                                        -        """Add operations to minimize `loss` by updating `parameter_list`.
                                        -
                                        -        This method combines interface `append_backward()` and
                                        -        `create_optimization_pass()` into one.
                                        -        """
                                        -        params_grads = self.create_backward_pass(loss, parameter_list)
                                        -        update_ops = self.create_optimization_pass(params_grads)
                                        -        return update_ops
                                        -
                                        -
                                        -
                                        -

                                        Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer.

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/paddle_nccl.html b/develop/doc_cn/design/paddle_nccl.html deleted file mode 100644 index 0aecbabc18559866712f454b919f36ab419f1079..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/paddle_nccl.html +++ /dev/null @@ -1,327 +0,0 @@ - - - - - - - - - - - - - Design Doc: NCCL support in Paddle Fluid — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: NCCL support in Paddle Fluid

                                        -
                                        -

                                        Abstract

                                        -

                                        This Design Doc refers to the NCCL feature in paddle. We propose an approach to support NCCL library both on a single machine and multiple machines. We wrapper the NCCL primitives Broadcast, Allreduce, Reduce as operators to utilize Multi-GPU powers in one script.

                                        -
                                        -
                                        -

                                        Motivation

                                        -

                                        NCCL is a NVIDIA library support Multi-GPU communicating and optimized for NVIDIA GPUs, it provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, that can achieve high bandwidth over PCIe and NVLink high-speed interconnect. With NCCL library, we can easily accelerate the training in parallel.

                                        -
                                          -
                                        • Pros
                                        • -
                                        -
                                          -
                                        1. easily plug-in with NCCL2 library.
                                        2. -
                                        3. high performance in NVIDIA GPUs.
                                        4. -
                                        5. MPI like primitives, which have low learning cost for users.
                                        6. -
                                        -
                                          -
                                        • Cons
                                        • -
                                        -
                                          -
                                        1. Only design for NVIDIA GPUs, not a general multi-device solution.
                                        2. -
                                        3. Although NCCL1 is opensourced under BSD license, but NCCL2 is not opensourced anymore.
                                        4. -
                                        -

                                        At the beginning of training, the framework needs to distribute the same parameters to every GPU, and merge the gradients at any time user interests.

                                        -

                                        As a result, during training, we need the operations of peer to peer copy between different GPUs, aggregating gradients/parameters from GPUs, and broadcasting parameters to GPUs. Every GPU only need to run the operator with correct place information.

                                        -

                                        Besides, it needs interfaces to synchronize model update with each different GPU Cards.

                                        -
                                        -
                                        -

                                        Implementation

                                        -

                                        As mentioned above, we wrap the NCCL routines as several kinds of operators. Need to note that NCCL need to create Communicator between gpu at the beginning, so there is a NCCLInit operator created.

                                        -
                                        -

                                        Transpiler

                                        -

                                        To be compatible with parameter server design doc, the transpiler compiles the user defined operation graph into sub-graphs to be executed on different devices.

                                        -
                                          -
                                        1. The user-defined model will be a single device program

                                          -
                                        2. -
                                        3. Broadcast/Reduce operators between GPUs will be inserted into the program, even for the multi-node, may insert the Send, Recv operator.

                                          -

                                          Broadcast, AllReduce in a single machine. And Broadcast, AllReduce, Send, Recv in multiple machines

                                          -

                                          -
                                        4. -
                                        -

                                        After compiling, the graph as shows

                                        -

                                        -

                                        Operators are added to the sub-graphs. Every GPU assigned a role of rank0, rank1 etc.

                                        -
                                          -
                                        • Broadcast. Broadcast operator distribute initialized parameter to all the GPUs from the GPU who owns it. e.g. fromrank0 GPU.
                                        • -
                                        • AllReduce. AllReduce operator synchronizes parameters/gradients between GPUs. AllReduce implemented in the Ring-Based communicating method, avoid of the bottle neck in a single GPU.
                                        • -
                                        -

                                        Need to notice that AllReduce operator force GPUs synchronized at that point. The whole training process in asynchronous or synchronous mode depends on the AllReduce point in the graph.

                                        -

                                        As it shown in the picture, when each GPU compute the gradient of W, followed with a AllReduce operator, accumulate the dW to full batch of data, then run the optimize process individually and apply the gradient to its W.

                                        -
                                          -
                                        • AllReduce -Need to note that our AllReduce operator is a ring-base AllReduce implementation. If we use the NCCL2 AllReduce primitive, every GPU optimized full batch of data, wasted (n-1) GPU compute resources. In addition, NCCL2 built-in AllReduce will only utilize the communicating resource during synchronization, then update the gradient will be a subsequent phase. In fact, we can amortize the update gradient time cost into the communicating phase. The process is
                                        • -
                                        -
                                          -
                                        1. Every parameter has its root card. That card will responsible for aggregating the gradients from GPUs.
                                        2. -
                                        3. The whole model’s parameter will be hashed to different root card, ensure the load balance between GPUs.
                                        4. -
                                        5. Logically neighberhood card will start send parameter to the next one. After one round, the parameter main card will aggregate the full gradients.
                                        6. -
                                        7. Then the root card will optimize the parameter.
                                        8. -
                                        9. This parameter card will send its optimized result to its neighberhood, then the neighberhood will send parameter to its next one.
                                        10. -
                                        11. Finish the sychronization round.
                                        12. -
                                        -

                                        The total time cost will be 2 * (n-1) * per-parameter-send-time, we reach the goal of amortize the upgrade time into communicating phase.

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/parallel_do.html b/develop/doc_cn/design/parallel_do.html deleted file mode 100644 index 5f2a1ab1e4dca66db7a28d5e547b552511e41595..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/parallel_do.html +++ /dev/null @@ -1,418 +0,0 @@ - - - - - - - - - - - - - Design Doc: Parallel_Do in PaddlePaddle — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Parallel_Do in PaddlePaddle

                                        -

                                        In PaddlePaddle, we use parallel_do primitive to represent multithread data parallel processing.

                                        -
                                        -

                                        Design overview

                                        -

                                        The definition of a parallel_do op looks like the following

                                        -
                                        AddInput(kInputs, "Inputs needed to be split onto different devices").AsDuplicable();
                                        -AddInput(kParameters, "Parameters are duplicated over different devices")
                                        -    .AsDuplicable();
                                        -AddInput(kPlaces, "Devices used for parallel processing");
                                        -AddOutput(kOutputs, "Outputs needed to be merged from different devices").AsDuplicable();
                                        -AddOutput(kParallelScopes,
                                        -          "Scopes for all local variables in forward pass. One scope for each device");
                                        -AddAttr<framework::BlockDesc *>(kParallelBlock,
                                        -                                "List of operaters to be executed in parallel");
                                        -
                                        -
                                        -

                                        A vanilla implementation of parallel_do can be shown as the following (| means single thread and -|||| means multiple threads)

                                        -
                                        In the forward pass
                                        -  |      Split input onto different devices
                                        -  |      Copy parameter onto different devices
                                        -  ||||   Compute forward pass in parallel
                                        -  |      Merge output from different devices
                                        -
                                        -In the backward pass
                                        -  |      Split output@grad onto different devices
                                        -  ||||   Compute backward pass in parallel
                                        -  |      accumulate param@grad from different devices to the first device
                                        -  |      Merge input@grad from different devices
                                        -  |      Copy param@grad to the place of parallel_do_op
                                        -
                                        -
                                        -

                                        This implementation allows to write mixed device program like this

                                        -
                                        W1 = fluid.tensor(size=[100,20], parameter=true)
                                        -W2 = fluid.tensor(size=[20,15], parameter=true)
                                        -
                                        -data = layers.data()
                                        -
                                        -gpu_places = layers.get_place(use_gpu=True)
                                        -# parallel processing on multiple GPUs
                                        -pd = ParallelDo(gpu_places)
                                        -with pd.do(input=data):
                                        -    prediction = softmax(fc(fc(data, W1), W2))
                                        -    write_output(prediction)
                                        -prediction = pd()
                                        -loss = cross_entropy(prediction, label)
                                        -
                                        -
                                        -

                                        And the programDesc are like the following

                                        -
                                        # start_program will be run by executor(CPUPlace), all w1, w2 will be allocated on CPU
                                        -start_program
                                        -{
                                        -  vars: w1, w2
                                        -  ops: init(w1), init(w2)
                                        -}
                                        -
                                        -main_program
                                        -{
                                        -block0 {
                                        -  vars: data, places, w1, w2, w1_grad, w2_grad,
                                        -  ops: data, get_place, parallel_do(block1),
                                        -       parallel_do_grad(block2),
                                        -       sgd(w2, w2_grad),
                                        -       sgd(w1, w1_grad)
                                        -}
                                        -block1 { # the forward pass
                                        -  parent_block: 0
                                        -  vars: data, h1, h2, loss
                                        -  ops: fc, fc, softmax
                                        -}
                                        -block2 { # the backward pass
                                        -  parent_block: 1
                                        -  vars: data_grad, h1_grad, h2_grad, loss_gard, local_w1_grad, local_w2_grad
                                        -  ops: softmax_grad,
                                        -       fc_grad
                                        -       fc_grad
                                        -}
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Performance Imporvement

                                        -

                                        There are serial places we can make this parallel_do faster.

                                        -
                                        -

                                        forward: split input onto different devices

                                        -

                                        If the input of the parallel_do is independent from any prior opeartors, we can avoid this step by -prefetching the input onto different devices in a seperate background thread. And the python code -looks like this.

                                        -
                                        pd = ParallelDo(gpu_places)
                                        -with pd.do():
                                        -    feature = get_data_from_prefetch_queue(gpu_places)
                                        -    prediction = my_net(feature)
                                        -    write_output(activation)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        forward: Copy parameter to onto different devices

                                        -

                                        We can avoid this step by making each device have a copy of the parameter. This requires:

                                        -
                                          -
                                        1. fluid.default_start_up_program() to be run on all devices
                                        2. -
                                        3. In the backward, allreduce param@grad at different devices, this requires
                                            -
                                          1. backward.py add allreduce operators at parallel_do_grad
                                          2. -
                                          3. allreduce operators need to be called in async mode to achieve maximum throughput
                                          4. -
                                          -
                                        4. -
                                        5. apply gradients related op(i.e. cliping, normalization, decay, sgd) on different devices in parallel
                                        6. -
                                        -

                                        By doing so, we also avoided “backward: accumulate param@grad from different devices to the first device”. -And the ProgramDesc looks like the following

                                        -
                                        # w1, w2 will be allocated on all GPUs
                                        -start_program
                                        -{
                                        -block0 {
                                        -  parallel_do(block1)
                                        -}
                                        -block1 {
                                        -  parent_block: 0
                                        -  vars: w1, w2
                                        -  ops: init(w1), init(w2)
                                        -}
                                        -}
                                        -
                                        -main_program
                                        -{
                                        -block0 {
                                        -  vars: data, places, w1, w2
                                        -  ops: data, get_place, parallel_do(block1),
                                        -       parallel_do_grad(block2),      # append_backward
                                        -       parallel_do(block3)            # append_optimization
                                        -       
                                        -}
                                        -block1 {
                                        -  parent_block: 0
                                        -  vars: data, h1, h2, loss
                                        -  ops: fc, fc, softmax
                                        -}
                                        -block2 {
                                        -  parent_block: 1
                                        -  vars: data_grad, h1_grad, h2_grad, loss_gard, w1_grad, w2_grad
                                        -  ops: softmax_grad,
                                        -       fc_grad, allreduce(places, scopes, w1_grad),
                                        -       fc_grad, allreduce(places, scopes, w2_grad)
                                        -}
                                        -block3 {
                                        -  parent_block: 0
                                        -  vars: lr
                                        -  ops: sgd(w2, w2_grad),
                                        -       sgd(w1, w1_grad)
                                        -}
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/parameter_average.html b/develop/doc_cn/design/parameter_average.html deleted file mode 100644 index 04fcebf40ff35feeafe5464b217534ead98745b5..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/parameter_average.html +++ /dev/null @@ -1,339 +0,0 @@ - - - - - - - - - - - - - Averaging Parameter in PaddlePaddle — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Averaging Parameter in PaddlePaddle

                                        -
                                        -

                                        Why Averaging

                                        -

                                        In a large scale machine learning setup where the size of the training data is huge, it could take us a large number of iterations over the training data before we can achieve the optimal values of parameters of our model. Looking at the problem setup, it is desirable if we can obtain the optimal values of parameters by going through the data in as few passes as we can.

                                        -

                                        Polyak and Juditsky (1992) showed that the test performance of simple average of parameters obtained by Stochastic Gradient Descent (SGD) is as good as that of parameter values that are obtained by training the model over and over again, over the training dataset.

                                        -

                                        Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for
                                        . The averaging is done as follows:

                                        -


                                        -

                                        We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above.

                                        -
                                        -

                                        How to perform Parameter Averaging in PaddlePaddle

                                        -

                                        Parameter Averaging in PaddlePaddle works in the following way during training :

                                        -
                                          -
                                        1. It will take in an instance of a normal optimizer as an input, e.g. RMSPropOptimizer
                                        2. -
                                        3. The optimizer itself is responsible for updating the parameters.
                                        4. -
                                        5. The ParameterAverageOptimizer maintains a separate copy of the parameters for itself:
                                            -
                                          1. In concept, the values of this copy are the average of the values of the parameters in the most recent N batches.
                                          2. -
                                          3. However, saving all the N instances of the parameters in memory is not feasible.
                                          4. -
                                          5. Therefore, an approximation algorithm is used.
                                          6. -
                                          -
                                        6. -
                                        -

                                        Hence, overall we have have two copies of the parameters: one for the optimizer itself, and one for the ParameterAverageOptimizer. The former should be used in back propagation, while the latter should be used during testing and should be saved.

                                        -

                                        During the testing/ saving the model phase, we perform the following steps:

                                        -
                                          -
                                        1. Perform the delayed operations.
                                        2. -
                                        3. Save current values of the parameters to a temporary variable.
                                        4. -
                                        5. Replace the values of the parameters with the averaged values.
                                        6. -
                                        7. Perform testing and/or save the parameters.
                                        8. -
                                        9. Restore the values of the parameters once done.
                                        10. -
                                        -
                                        -
                                        -

                                        How to implement Averaging of Parameter in PaddlePaddle

                                        -

                                        We can add the ParameterAverageOptimizer op to the graph through Python API. Using this approach, we manually add this op to the graph and direct the output of the optimizer op to this op during training.

                                        -
                                        **Advantages**:
                                        -- Allows for greater flexibility to the users of PaddlePaddle. Using this approach, the users can plug different optimizers into ParameterAverageOptimizer by passing in the optimizer to the op.
                                        -- Makes it easy for the users to customize and extend the framework.
                                        -
                                        -**Disadvantages**:
                                        -- Implementation requires re-writing the averaging methodology in Python.  
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Low-Level implementation

                                        -

                                        In the new design, we propose to create a new operation for averaging parameter updates (ParameterAverageOptimizer). For now, we can add an op that takes in the following as input:

                                        -
                                          -
                                        • the optimizer
                                        • -
                                        • the window_size to keep the updates
                                        • -
                                        -

                                        The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for Operators. We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU.

                                        -

                                        The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the layer functions in Python API.

                                        -
                                        -
                                        -

                                        Python API implementation for ParameterAverageOptimizer

                                        -

                                        Based on Polyak and Juditsky (1992), we can generalize the averaging of updates to any optimizer. The input to the op would be the following:

                                        -
                                          -
                                        • Any optimizer (RMSProp , AdaGrad etc.)
                                        • -
                                        • A window size. The op keeps accumulating updated parameter values over a window of N batches and takes an average. Move the averaged value to a buffer when window is full to avoid loss of precision.
                                        • -
                                        -

                                        Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle Python API design, the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions. -We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other Optimizers

                                        -
                                        -

                                        Creation of the ParameterAverageOptimizer operator

                                        -

                                        There are two ways for creating the ParameterAverageOptimizer op:

                                        -
                                          -
                                        1. We create the op immediately while building the computation graph.
                                        2. -
                                        3. We add the op in a lazy manner, just before the backward pass, similar to the way the optimization ops are added.
                                        4. -
                                        -

                                        The proposal is to add the op immediately while building the computation graph.

                                        -
                                        -
                                        -

                                        High-level API

                                        -

                                        In PaddlePaddle Python API, users will primarily rely on layer functions to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions.

                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/parameters_in_cpp.html b/develop/doc_cn/design/parameters_in_cpp.html deleted file mode 100644 index 511079254de6247c46ebdf475b16c59c8c6a28d9..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/parameters_in_cpp.html +++ /dev/null @@ -1,298 +0,0 @@ - - - - - - - - - - - - - Design Doc: The C++ Class Parameters — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: The C++ Class Parameters

                                        -

                                        Parameters is a concept we designed in PaddlePaddle V2 API. Parameters is a container of parameters, which makes PaddlePaddle capable of sharing parameter between topologies. We described usages of Parameter in api.md.

                                        -

                                        We used Python to implement Parameters when designing V2 API before. There are several defects for the current implementation:

                                        -
                                          -
                                        • We just use memcpy to share Parameters between topologies, but this is very inefficient.
                                        • -
                                        • We did not support sharing Parameters while training. We just trigger memcpy when start training.
                                        • -
                                        -

                                        It is necessary that we implement Parameters in CPP side. However, it could result a code refactoring for PaddlePaddle, because PaddlePaddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current PaddlePaddle implementation, there are three concepts associated with Parameters:

                                        -
                                          -
                                        1. paddle::Parameter. A Parameters is a container for paddle::Parameter. -It is evident that we should use paddle::Parameter when developing Parameters. -However, the Parameter class contains many functions and does not have a clear interface. -It contains create/store Parameter, serialize/deserialize, optimize(i.e SGD), randomize/zero. -When we developing Parameters, we only use create/store Parameter functionality. -We should extract functionalities of Parameter into many classes to clean PaddlePaddle CPP implementation.
                                        2. -
                                        3. paddle::GradientMachine and its sub-classes, e.g., paddle::MultiGradientMachine, paddle::NeuralNetwork. -We should pass Parameters to paddle::GradientMachine when forward/backward to avoid memcpy between topologies. -Also, we should handle multi-GPU/CPU training, because forward and backward would perform on multi-GPUs and multi-CPUs. -Parameters should dispatch the parameter value to each device, and gather the parameter gradient from each device.
                                        4. -
                                        5. paddle::ParameterUpdater. The ParameterUpdater is used to update parameters in Paddle. -So Parameters should be used by paddle::ParameterUpdater, and paddle::ParameterUpdater should optimize Parameters (by SGD).
                                        6. -
                                        -

                                        The step by step approach for implementation Parameters in PaddlePaddle C++ core is listed below. Each step should be a PR and could be merged into PaddlePaddle one by one.

                                        -
                                          -
                                        1. Clean paddle::Parameter interface. Extract the functionalities of paddle::Parameter to prepare for the implementation of Parameters.
                                        2. -
                                        3. Implementation a Parameters class. It just stores the paddle::Parameter inside. Make GradientMachine uses Parameters as a class member.
                                        4. -
                                        5. Make Parameters support Multi-CPU and Multi-GPU training to prepare for sharing Parameter between topologies. -Because we need share Parameters between topologies, it is Parameters‘s response to exchange Parameters between GPUs. -GradientMachine should not handle how to exchange Parameters because GradientMachine only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one Parameters.
                                            -
                                          • We should use a global function to exchange Parameters between GPUs, not a member function in Parameters. The MultiGradientMachine invoke this function, which uses Parameters as this function inputs.
                                          • -
                                          • The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
                                          • -
                                          -
                                        6. -
                                        7. Make Parameters as an argument for forward/backward function, not a data member for GradientMachine. For example, forward could be forward(const Parameters& params, ...) and backward could be backward(Parameters* params, ...). After this step, Paddle could share Parameters between topologies.
                                        8. -
                                        9. ParameterUpdater is invoked by GradientMachine and Trainer, but it updates Parameters. In the end of this code refactoring, we could change ParameterUpdater directly uses Parameters to make ParameterUpdater‘s implementation clear.
                                        10. -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/profiler.html b/develop/doc_cn/design/profiler.html deleted file mode 100644 index ac5d8803e35191186e73475019ec09ffcb3c1fc2..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/profiler.html +++ /dev/null @@ -1,351 +0,0 @@ - - - - - - - - - - - - - Introduction — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Introduction

                                        -

                                        There are many performance analysis tools for different programming languages and different software frameworks. For most popular deep learning frameworks, they use several programming languages and adapt to heterogeneous platforms. Similar to most of the deep learning frameworks, PaddlePaddle also uses C++, CUDA and Python as the basic programming languages to adapt to run on CPU and GPU devices. The nvprof tools is usually used to analyse the CUDA program. We have a document to profile CPU and Python program by yep and Google’s perftools to profile only the CPU and Python program. But for PaddlePaddle fluid, the operator is the basic computing unit. The developers usually want to collect the time of each operator and locate bottlenecks. The nvprof usually collect the timeline of CUDA-related activities on both CPU and GPU, including kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels. And the yep and Google's perftools can’t collect the timeline for CUDA program. All these tools can’t collect time in the operator level. So we design this profiling tool.

                                        -
                                        -
                                        -

                                        Architecture

                                        -

                                        The work flow for most task is as follows. Each operator will run many times in the all iterations. So the profiler must collect the total time of each operator during the iteration. For more, sometimes, the developers may want to collect more detailed time span inside the operator or record time span for elsewhere, this requires that the profiler must support to record the nested time span. And in order to speedup training, all the deep learning frameworks support parallel computing, including multiple threads on CPU and multiple GPUs. So the profiler must be able to collect the timeline for each thread. In addition, the profiler also occupies certain resources. It must can be easily to be enabled or disabled by the developers. At last, the profiler should present a human-readable report.

                                        -
                                        for i in xrange(M):  # M is  the iteration number
                                        -  for op in operator_lists: # The `operator_lists` contains all the operators in the network.
                                        -    op.run();
                                        -
                                        -
                                        -

                                        In summary, the proflier should have following features:

                                        -
                                          -
                                        • records time span in loop.
                                        • -
                                        • supports nested time span.
                                        • -
                                        • supports multiple threads/multiple GPUs.
                                        • -
                                        • supports to be enabled and disabled by users.
                                        • -
                                        -

                                        But how to record the time for the mixed C++ and CUDA program? There many C++ APIs to get the current calendar time in host program. But for GPU, the CUDA kernels may be executed concurrently if they are in different streams and the CUDA kernels is asynchronous with the host program if there is no the synchronous aftern the CUDA kernels. CUDA provides event to monitor the device and perform accurate timing. Inspired by PyTorch and CUDA event, we also design and apply the events to record the timeline. Then summarize and present statistics based on these events.

                                        -

                                        The overall flow is shown as the following figure.

                                        -


                                        -
                                        -

                                        Event

                                        -

                                        In above work flow, a pair of events are needed before and after the piece of code to collect time. So the event has a flag to mark whether it is a starting event or an ending event. Except this two kinds of event, sometime, a only marker with a text message is needed, for example, a marker to specify the profiling start or end. There are three kinds of event:

                                        -
                                        enum EventKind {
                                        -  kMark,
                                        -  kPushRange,
                                        -  kPopRange};
                                        -
                                        -
                                        -
                                          -
                                        • kMark: only a marker without time range.
                                        • -
                                        • kPushRange: mark the starting event for time range.
                                        • -
                                        • kPopRange: mark the ending event for time range.
                                        • -
                                        -

                                        For the CPU code, the events only need to record the current time. For the CUDA code, the event management functions of CUDA are used. For many pieces of code, an event lists are used to record each piece.

                                        -
                                        class Event {
                                        - public:
                                        -  // The DeviceContext is used to get current  CUDA stream.
                                        -  Event(EventKind kind, std::string name, uint32_t thread_id,
                                        -        const platform::DeviceContext* dev_ctx = nullptr);
                                        -  double CpuElapsedUs(const Event& e) const;
                                        -  double CudaElapsedUs(const Event& e) const;
                                        -
                                        - private:
                                        -  EventKind kind_;
                                        -  std::string name_;
                                        -  uint32_t thread_id_;
                                        -  int64_t cpu_ns_;
                                        -#ifdef PADDLE_WITH_CUDA
                                        -  cudaEvent_t event_ = nullptr;
                                        -  int device_ = -1;
                                        -#endif
                                        -};
                                        -
                                        -struct EventList {
                                        -  std::forward_list<std::vector<Event>> event_blocks;
                                        -};
                                        -
                                        -
                                        -

                                        As mentioned above, there is no need to record the timeline when disabling the profiler. So there is a global state to enable or disable the profiler.

                                        -
                                        enum ProfilerState {
                                        -  kDisabled, 
                                        -  kCPU,
                                        -  kCUDA
                                        -};
                                        -ProfilerState g_state;
                                        -
                                        -
                                        -
                                          -
                                        • kDisabled: the disabled state.
                                        • -
                                        • kCPU: CPU profiling state.
                                        • -
                                        • kCUDA: GPU profiling state.
                                        • -
                                        -

                                        A pair of starting and ending events are pushed to event lists in constructor and destructor of RecordEvent. So the timeline is recorded for the code in the lifecycle of an object of RecordEvent.

                                        -
                                        struct RecordEvent {
                                        -  explicit RecordEvent(const std::string name,
                                        -                       platform::DeviceContext* dev_ctx = nullptr) {
                                        -    if (kState == ProfilerState::kDisabled) return;
                                        -    // push the starting event to the event lists.
                                        -  }
                                        -  ~RecordEvent() {
                                        -    if (kState == ProfilerState::kDisabled) return;
                                        -    // push the ending event to the event lists.
                                        -  }
                                        -};
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/program.html b/develop/doc_cn/design/program.html deleted file mode 100644 index c77c4405eb84a974cd005a843aa13f3976d788ae..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/program.html +++ /dev/null @@ -1,386 +0,0 @@ - - - - - - - - - - - - - Design Doc: PaddlePaddle Programs — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: PaddlePaddle Programs

                                        -
                                        -

                                        Compile and Execution

                                        -

                                        A PaddlePaddle program consists of two parts – the first generates a ProgramDesc protobuf message that describes the program, and the second runs this message using a C++ class Executor.

                                        -

                                        A simple example PaddlePaddle program can be found in graph.md:

                                        -
                                        x = layer.data("images")
                                        -l = layer.data("label")
                                        -y = layer.fc(x)
                                        -cost = layer.mse(y, l)
                                        -optimize(cost)
                                        -train(cost, reader=mnist.train())
                                        -
                                        -
                                        -

                                        The first five lines of the following PaddlePaddle program generates, or, compiles, the ProgramDesc message. The last line runs it.

                                        -
                                        -
                                        -

                                        Programs and Blocks

                                        -

                                        The basic structure of a PaddlePaddle program is some nested blocks, as a C++ or Java program.

                                        -
                                          -
                                        • program: some nested blocks
                                        • -
                                        • block:
                                            -
                                          • some local variable definitions, and
                                          • -
                                          • a sequence of operators
                                          • -
                                          -
                                        • -
                                        -

                                        The concept of block comes from usual programs. For example, the following C++ program has three blocks:

                                        -
                                        int main() { // block 0
                                        -  int i = 0;
                                        -  if (i < 10) { // block 1
                                        -    for (int j = 0; j < 10; j++) { // block 2
                                        -    }
                                        -  }
                                        -  return 0;
                                        -}
                                        -
                                        -
                                        -

                                        The following PaddlePaddle program has three blocks:

                                        -
                                        import paddle as pd  // block 0
                                        -
                                        -x = minibatch([10, 20, 30]) # shape=[None, 1]
                                        -y = var(1) # shape=[1], value=1
                                        -z = minibatch([10, 20, 30]) # shape=[None, 1]
                                        -cond = larger_than(x, 15) # [false, true, true]
                                        -
                                        -ie = pd.ifelse()
                                        -with ie.true_block():  // block 1
                                        -    d = pd.layer.add_scalar(x, y)
                                        -    ie.output(d, pd.layer.softmax(d))
                                        -with ie.false_block():  // block 2
                                        -    d = pd.layer.fc(z)
                                        -    ie.output(d, d+1)
                                        -o1, o2 = ie(cond)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        BlockDesc and ProgramDesc

                                        -

                                        All protobuf messages are defined in framework.proto.

                                        -

                                        BlockDesc is straight-forward – it includes local variable definitions, vars, and a sequence of operators, ops.

                                        -
                                        message BlockDesc {
                                        -  required int32 parent = 1;
                                        -  repeated VarDesc vars = 2;
                                        -  repeated OpDesc ops = 3;
                                        -}
                                        -
                                        -
                                        -

                                        The parent ID indicates the parent block so that operators in a block can refer to variables defined locally and also those defined in their ancestor blocks.

                                        -

                                        All hierarchical blocks in a program are flattened and stored in an array. The block ID is the index of the block in this array.

                                        -
                                        message ProgramDesc {
                                        -  repeated BlockDesc blocks = 1;
                                        -}
                                        -
                                        -
                                        -
                                        -

                                        Global Block

                                        -

                                        The global block is the first one in the above array.

                                        -
                                        -
                                        -
                                        -

                                        Operators that Use Blocks

                                        -

                                        In the above example, the operator IfElseOp has two blocks – the true branch and the false branch.

                                        -

                                        The definition of OpDesc shows that an operator could have some attributes:

                                        -
                                        message OpDesc {
                                        -  AttrDesc attrs = 1;
                                        -  ...
                                        -}
                                        -
                                        -
                                        -

                                        and an attribute could be of type block, which is, in fact, a block ID as described above:

                                        -
                                        message AttrDesc {
                                        -  required string name = 1;
                                        -
                                        -  enum AttrType {
                                        -    INT = 1,
                                        -    STRING = 2,
                                        -    ...
                                        -    BLOCK = ...
                                        -  }
                                        -  required AttrType type = 2;
                                        -
                                        -  optional int32 block = 10; // when type == BLOCK
                                        -  ...
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        InferShape

                                        -

                                        With this design, the InferShape function should take the following parameters:

                                        -
                                        void InferShape(int current_block,
                                        -                int current_operator,
                                        -                ProgramDesc* program // might change VarDesc values.
                                        -                ) {
                                        -  ...
                                        -}
                                        -
                                        -
                                        -

                                        where

                                        -
                                          -
                                        • current_block indices into ProgramDesc::blocks,
                                        • -
                                        • current_operator indices into BlockDesc::ops.
                                        • -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/prune.html b/develop/doc_cn/design/prune.html deleted file mode 100644 index 3cdfec2a6b2b98349d0760001d900693e43e7de8..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/prune.html +++ /dev/null @@ -1,320 +0,0 @@ - - - - - - - - - - - - - Prune — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Prune

                                        -
                                        -

                                        Motivation

                                        -

                                        We want to support running inference, training and checkpointing in one ProgramDesc. We implement -void Prune(const ProgramDesc* input, ProgramDesc* output) function, which takes a ProgramDesc -and generate a pruned ProgramDesc.

                                        -
                                        -
                                        -

                                        Challenge

                                        -

                                        Pruning need to support both variables and operators being evaluation targets. Consider the following -different situations.

                                        -
                                        # Case 1: run foward pass.
                                        -cost_np = session.run(target=cost)
                                        -# Case 2: run backward passing.
                                        -opts_np, _ = session.run(target=[cost, opt])
                                        -# Case 3: run checkpointing
                                        -_ = session.run(target=checkpoint)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Solution

                                        -

                                        To support evaluation of operators, we add is_target field in the OpDesc.

                                        -
                                        message OpDesc {
                                        -  required string type = 3;
                                        -  repeated Var inputs = 1;
                                        -  repeated Var outputs = 2;
                                        -  repeated Attr attrs = 4;
                                        -  optional bool is_target = 5 [ default = false ];
                                        -};
                                        -
                                        -
                                        -

                                        To support evaluation of variables, we add fetch_op. -For each variable in the target, we insert a fetch_op into the ProgramDesc with variable being -fetch_op‘s input. Then we also set fetch_op is a target.

                                        -
                                        -

                                        Algorithm

                                        -

                                        If an operator needs to be run, it must fall into one of the following cases:

                                        -
                                          -
                                        1. It is the target.
                                        2. -
                                        3. It is depended by some other ops, meaning its output is some other op’s input.
                                        4. -
                                        -

                                        The first case can be checked by op_desc.is_traget() . The second case can be implement as

                                        -
                                        bool HasDependentVar(const OpDesc& op_desc, const std::set<string>& dependent_vars) {
                                        -  for (auto& var : op_desc.outputs()) {
                                        -    for (auto& argu : var.arguments()) {
                                        -      if (dependent_vars.count(argu) != 0) {
                                        -        return true;
                                        -      }
                                        -    }
                                        -  }
                                        -  return false;
                                        -}
                                        -
                                        -
                                        -

                                        Then the whole algorithm can be implemented as the following code.

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/python_api.html b/develop/doc_cn/design/python_api.html deleted file mode 100644 index f73acddccb230a27b31331950174d0b913cabfb0..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/python_api.html +++ /dev/null @@ -1,534 +0,0 @@ - - - - - - - - - - - - - Design Doc: Python API — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Python API

                                        -

                                        Due to the refactorization of the PaddlePaddle core, we need Python classes to construct corresponding protobuf messages that describe a DL program.

                                        -

                                        | Python classes | Protobuf messages | -| — | — | -| Program | ProgramDesc | -| Block | BlockDesc | -| Operator | OpDesc | -| Variable | VarDesc |

                                        -

                                        Please be aware that these Python classes need to maintain some construction-time information, which are not part of the protobuf messages.

                                        -
                                        -

                                        Core Concepts

                                        -
                                        -

                                        Program

                                        -

                                        A ProgramDesc describes a DL program, which is composed of an array of BlockDescs. The BlockDescs in a ProgramDesc can have a tree-like hierarchical structure. However, the ProgramDesc onlys stores a flattened array of BlockDescs. A BlockDesc refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator need to be able to access variables in its ancestor blocks.

                                        -

                                        Whenever we create a block, we need to set its parent block to the current block, hence the Python class Program needs to maintain a data member current_block.

                                        -
                                        class Program(objects):
                                        -    def __init__(self):
                                        -        self.desc = core.NewProgram() # a C++ ProgramDesc pointer.
                                        -        self.blocks = vector<Block>()
                                        -        self.blocks.append(Block(self, -1)) # the global block
                                        -        self.current_block = 0          # initialized to the global block
                                        -
                                        -    def global_block():
                                        -        return self.blocks[0]
                                        -
                                        -    def current_block():
                                        -        return self.get_block(self.current_block)
                                        -
                                        -    def rollback():
                                        -        self.current_block = self.current_block().parent_idx
                                        -
                                        -    def create_block():
                                        -        new_block_idx = len(self.block)
                                        -        self.blocks.append(Block(self, self.current_block))
                                        -        self.current_block = new_block_idx
                                        -        return current_block()
                                        -
                                        -
                                        -

                                        Program is an accessor to the protobuf message ProgramDesc, which is created in C++ space, because the InferShape function is in C++, which manipulates VarDesc messages, which are in turn members of BlockDesc, which is a member of ProgramDesc.

                                        -

                                        Program creates the first block as the global block in its constructor. All parameters and their initializer operators are in the global block.

                                        -
                                        -
                                        -

                                        Block

                                        -

                                        A Block includes

                                        -
                                          -
                                        1. a map from variable names to an instance of the Python Variable class, and
                                        2. -
                                        3. a list of Operator instances.
                                        4. -
                                        -
                                        class Block(objects):
                                        -    def __init__(self, program, parent_idx):
                                        -        self.desc = core.NewBlock(program.desc)
                                        -        self.program = program
                                        -        self.vars = map<string, Variable>()
                                        -        self.ops = vector<Operator>()
                                        -        self.parent_idx = parent_idx
                                        -
                                        -    def create_var(self, ...):
                                        -        return Variable(self, ...)
                                        -
                                        -    def _create_global_var(self, ...):
                                        -        program.global_block().create_var(...)
                                        -
                                        -    def create_parameter(self, name, ...):
                                        -        # Parameter is a subclass of variable. See Parameter section for details.
                                        -        self.vars[name] = Parameter(self._create_global_var(...), ...)
                                        -        return self.vars[name]
                                        -
                                        -    def append_operator(self, ...):
                                        -        self.ops.append(Operator(self, ...))
                                        -
                                        -    def prepend_operator(self, ...): # Parameter's ctor prepands initialize operators.
                                        -       self.ops.prepend(Operator(self, ...))
                                        -
                                        -
                                        -

                                        create_parameter is necessary because parameters are global variables, defined in the global block, but can be created in some sub-blocks. For example, an FC layer in the step block of an RNN operator.

                                        -

                                        prepend_operator is necessary because the constructor of Parameter needs to create the initialize (or load) operator of the parameter, and would like to put it in the preamble of the global block.

                                        -
                                        -
                                        -

                                        Operator

                                        -

                                        The Operator class fills in the OpDesc message and calls the C++ function InferShape to infer the output shapes from the input shapes.

                                        -
                                        class Operator(object):
                                        -    def __init__(self,
                                        -                 block,  # Block
                                        -                 type,   # string
                                        -                 inputs, # dict<string, Variable>
                                        -                 outputs,# dict<stirng, Variable>
                                        -                 attrs   # dict<string, Any>
                                        -                 ):
                                        -        self.desc = core.NewOpDesc(block.desc, type, inputs, outputs, attrs)
                                        -        core.infer_shape(self.desc, inputs, outputs)
                                        -
                                        -    def type(self):
                                        -        return self.desc.type()
                                        -
                                        -
                                        -

                                        Operator creates the OpDesc message in C++ space, so that it can call the InferShape function, which is in C++.

                                        -
                                        -
                                        -

                                        Variable

                                        -

                                        Operators take Variables as its inputs and outputs.

                                        -
                                        class Variable(object):
                                        -    def __init__(self,
                                        -                 block=None,      # Block
                                        -                 name=None,       # string
                                        -                 shape,           # tuple
                                        -                 dtype="float32", # string
                                        -                 lod_level=None   # int
                                        -                 ):
                                        -        if name is None:
                                        -            name = unique_name_generator()
                                        -        self.name = name
                                        -        self.block = block
                                        -        self.desc = core.NewVarDesc(block.desc, name, shape, lod_level)
                                        -        self.writer = None
                                        -
                                        -
                                        -

                                        Please be aware of self.writer, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each write to a variable is represented by a Variable class. This is guaranteed by the fact that core.NewVarDesc must NOT create a new VarDesc message if its name already exists in the specified block.

                                        -
                                        -
                                        -

                                        Parameter

                                        -

                                        A parameter is a global variable with an initializer (or load) operator.

                                        -
                                        class Parameter(Variable):
                                        -    def __init__(self,
                                        -                 block=None,      # Block
                                        -                 name=None,       # string
                                        -                 shape,           # tuple
                                        -                 dtype="float32", # string
                                        -                 lod_level=None   # int
                                        -                 trainable,       # bool
                                        -                 initialize_op_attrs,
                                        -                 optimize_op_attrs):
                                        -        super(Parameter, self).__init__(block, name, shape, dtype, lod_level)
                                        -        self.trainable = trainable
                                        -        self.optimize_op_attrs = optimize_op_attrs
                                        -        block.prepend(Operator(block,  # Block
                                        -                               initialize_op_attrs['type'],   # string
                                        -                               None,   # no inputs
                                        -                               self,   # output is the parameter
                                        -                               initialize_op_attrs)
                                        -
                                        -
                                        -

                                        When users create a parameter, they can call

                                        -
                                        program.create_parameter(
                                        -  ...,
                                        -  init_attr={
                                        -    type: "uniform_random",
                                        -    min: -1.0,
                                        -    max: 1.0,
                                        -  })
                                        -)
                                        -
                                        -
                                        -

                                        In above example, init_attr.type names an initialize operator. It can also name the load operator

                                        -
                                        init_attr={
                                        - type: "load",
                                        - filename: "something.numpy",
                                        -}
                                        -
                                        -
                                        -

                                        optimize_op_attrs is not in the VarDesc message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator’s OpDesc, and will be in the OpDesc message.

                                        -
                                        -
                                        -
                                        -

                                        Layer Function

                                        -

                                        A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.

                                        -

                                        Layer functions take Variable and configuration parameters as its input and return the output variable(s).

                                        -

                                        For example, FullyConnected take one or more variable as its input. The input could be input data or another layer’s output. There are many configuration options for a FullyConnected layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The FullyConnected layer will return an output variable.

                                        -
                                        -

                                        Necessity for reusing code between layer functions

                                        -

                                        There are a lot of code that can be reused. Such as

                                        -
                                          -
                                        • Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with min = -1.0, max = 1.0. and default initialize strategy for bias is to fill zero.
                                        • -
                                        • Append the activation operator.
                                        • -
                                        • Create a temporary variable.
                                        • -
                                        • Create parameter.
                                        • -
                                        • Generate a unique name.
                                        • -
                                        • Add a bias.
                                        • -
                                        • ...
                                        • -
                                        -

                                        A mechanism to reuse code between layer functions is necessary. It will be around 150 lines of code if we write a FullyConnected layer without any helper functions.

                                        -
                                        -
                                        -

                                        Comparision between global functions and helper class

                                        -

                                        The FullyConnected layer will be as follow when we provide global functions:

                                        -
                                        def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None):
                                        -  if name is None:
                                        -    name = unique_name("fc")
                                        -  input = multiple_input(input)
                                        -  param_attr = default_param_attr(param_attr)
                                        -  param_attr = multiple_param_attr(param_attr, len(input))
                                        -
                                        -  # mul
                                        -  mul_results = []
                                        -  for ipt, attr in zip(input, param_attr):
                                        -    shape = ipt.shape[1:] + [size]
                                        -    w = g_program.global_block().create_parameter(shape, ipt.dtype, name, attr)
                                        -    tmp = create_tmp_var(name)
                                        -    g_program.current_block().append_op("mul", {ipt, w}, {tmp})
                                        -  mul_results.append(tmp)
                                        -
                                        -  # add sum
                                        -  ...
                                        -  # add bias
                                        -  ...
                                        -  # add activation
                                        -  ...
                                        -  return out
                                        -
                                        -
                                        -

                                        We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions:

                                        -
                                          -
                                        1. We need a namespace for these methods, then layer developers can quickly figure out what method they can use.
                                        2. -
                                        3. Global functions will force layer developers to pass its parameter time by time.
                                        4. -
                                        -

                                        So we provide a helper class, LayerHelper, to share code between layer functions. The FullyConnected Layer will be as follow.

                                        -
                                        def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None):
                                        -  helper = LayerHelper(locals())  # pass all parameter to LayerHelper
                                        -
                                        -  mul_results = []
                                        -  for ipt, param in helper.iter_multiple_input_and_param():
                                        -    w = helper.create_parameter(shape=ipt.shape[1:] + [size], dtype = ipt.dtype)
                                        -    tmp = helper.create_tmp_variable()
                                        -    helper.append_op('mul', {ipt, w}, {tmp})
                                        -    mul_results.append(tmp)
                                        -
                                        -  pre_bias = helper.add_sum(mul_results)
                                        -  pre_activation = helper.add_bias(pre_bias)
                                        -  return helper.add_activation(pre_activation)
                                        -
                                        -
                                        -

                                        We not only use the fewer lines of code to write fc_layer but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing helper. in a python editor.

                                        -
                                        -
                                        -

                                        Implementation of layer helper

                                        -

                                        We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The activation is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of add_activation are:

                                        -
                                        class LayerHelper(object):
                                        -  def __init__(self, **kwargs):  # kwargs is short for `keyword arguments`
                                        -    self.kwargs = kwargs
                                        -
                                        -  def add_activation(self, input_var):
                                        -    act = self.kwargs.get("act", None)  # default value is None
                                        -    if act is None:  # do nothing if no act
                                        -      return input_var
                                        -
                                        -    tmp = self.create_tmp_var(self)
                                        -    self.append_op(type=act, input=input_var, output=tmp)
                                        -    return tmp
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Return value of layer functions

                                        -

                                        The layer will return a Variable, which is also the output of an operator. However, outputs of a layer function have more attributes than an operator. There are parameter variables, and their gradient variables need to return. To return them is useful. For example,

                                        -
                                          -
                                        1. Users can debug the network by printing parameter gradients.
                                        2. -
                                        3. Users can append attributes to a parameter, such as, param.stop_gradient=True will make a parameter stop generate the gradient. We can fix the parameter value during training by using this attribute.
                                        4. -
                                        -

                                        However, it is good to return a Variable for layers, since all layers and operators use Variables as their parameters. We can just append a param field and a grad field for layer function since the Python is dynamic typing.

                                        -

                                        The sample usage is

                                        -
                                        data = fluid.layers.data(...)
                                        -hidden = fluid.layers.fc(data, ...)
                                        -...
                                        -
                                        -executor.run(fetch_list=[hidden.param, hidden.param.grad], ...)
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Optimizer

                                        -

                                        Optimizer Design Doc

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/reader/README.html b/develop/doc_cn/design/reader/README.html deleted file mode 100644 index f7edc8e5c968a3de5d63ff73f67d3ad86cf63aba..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/reader/README.html +++ /dev/null @@ -1,446 +0,0 @@ - - - - - - - - - - - - - Python Data Reader Design Doc — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Python Data Reader Design Doc

                                        -

                                        During the training and testing phases, PaddlePaddle programs need to read data. To help the users write code that performs reading input data, we define the following:

                                        -
                                          -
                                        • A reader: A function that reads data (from file, network, random number generator, etc) and yields the data items.
                                        • -
                                        • A reader creator: A function that returns a reader function.
                                        • -
                                        • A reader decorator: A function, which takes in one or more readers, and returns a reader.
                                        • -
                                        • A batch reader: A function that reads data (from reader, file, network, random number generator, etc) and yields a batch of data items.
                                        • -
                                        -

                                        and also provide a function which can convert a reader to a batch reader, frequently used reader creators and reader decorators.

                                        -
                                        -

                                        Data Reader Interface

                                        -

                                        Data reader doesn’t have to be a function that reads and yields data items. It can just be any function without any parameters that creates an iterable (anything can be used in for x in iterable) as follows:

                                        -
                                        iterable = data_reader()
                                        -
                                        -
                                        -

                                        The item produced from the iterable should be a single entry of data and not a mini batch. The entry of data could be a single item or a tuple of items. Item should be of one of the supported types (e.g., numpy 1d array of float32, int, list of int etc.)

                                        -

                                        An example implementation for single item data reader creator is as follows:

                                        -
                                        def reader_creator_random_image(width, height):
                                        -    def reader():
                                        -        while True:
                                        -            yield numpy.random.uniform(-1, 1, size=width*height)
                                        -    return reader
                                        -
                                        -
                                        -

                                        An example implementation for multiple item data reader creator is as follows:

                                        -
                                        def reader_creator_random_image_and_label(width, height, label):
                                        -    def reader():
                                        -        while True:
                                        -            yield numpy.random.uniform(-1, 1, size=width*height), label
                                        -    return reader
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Batch Reader Interface

                                        -

                                        Batch reader can be any function without any parameters that creates an iterable (anything can be used in for x in iterable). The output of the iterable should be a batch (list) of data items. Each item inside the list should be a tuple.

                                        -

                                        Here are some valid outputs:

                                        -
                                        # a mini batch of three data items. Each data item consist three columns of data, each of which is 1.
                                        -[(1, 1, 1),
                                        -(2, 2, 2),
                                        -(3, 3, 3)]
                                        -
                                        -# a mini batch of three data items, each data item is a list (single column).
                                        -[([1,1,1],),
                                        -([2,2,2],),
                                        -([3,3,3],)]
                                        -
                                        -
                                        -

                                        Please note that each item inside the list must be a tuple, below is an invalid output:

                                        -
                                         # wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],).
                                        - # Otherwise it is ambiguous whether [1,1,1] means a single column of data [1, 1, 1],
                                        - # or three columns of data, each of which is 1.
                                        -[[1,1,1],
                                        -[2,2,2],
                                        -[3,3,3]]
                                        -
                                        -
                                        -

                                        It is easy to convert from a reader to a batch reader:

                                        -
                                        mnist_train = paddle.dataset.mnist.train()
                                        -mnist_train_batch_reader = paddle.batch(mnist_train, 128)
                                        -
                                        -
                                        -

                                        It is also straight forward to create a custom batch reader:

                                        -
                                        def custom_batch_reader():
                                        -    while True:
                                        -        batch = []
                                        -        for i in xrange(128):
                                        -            batch.append((numpy.random.uniform(-1, 1, 28*28),)) # note that it's a tuple being appended.
                                        -        yield batch
                                        -
                                        -mnist_random_image_batch_reader = custom_batch_reader
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Usage

                                        -

                                        Following is how we can use the reader with PaddlePaddle: -The batch reader, a mapping from item(s) to data layer, the batch size and the number of total passes will be passed into paddle.train as follows:

                                        -
                                        # two data layer is created:
                                        -image_layer = paddle.layer.data("image", ...)
                                        -label_layer = paddle.layer.data("label", ...)
                                        -
                                        -# ...
                                        -batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128)
                                        -paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Data Reader Decorator

                                        -

                                        The Data reader decorator takes in a single reader or multiple data readers and returns a new data reader. It is similar to a python decorator, but it does not use @ in the syntax.

                                        -

                                        Since we have a strict interface for data readers (no parameters and return a single data item), a data reader can be used in a flexible way using data reader decorators. Following are a few examples:

                                        -
                                        -

                                        Prefetch Data

                                        -

                                        Since reading data may take some time and training can not proceed without data, it is generally a good idea to prefetch the data.

                                        -

                                        Use paddle.reader.buffered to prefetch data:

                                        -
                                        buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100)
                                        -
                                        -
                                        -

                                        buffered_reader will try to buffer (prefetch) 100 data entries.

                                        -
                                        -
                                        -

                                        Compose Multiple Data Readers

                                        -

                                        For example, if we want to use a source of real images (say reusing mnist dataset), and a source of random images as input for Generative Adversarial Networks.

                                        -

                                        We can do the following :

                                        -
                                        def reader_creator_random_image(width, height):
                                        -    def reader():
                                        -        while True:
                                        -            yield numpy.random.uniform(-1, 1, size=width*height)
                                        -    return reader
                                        -
                                        -def reader_creator_bool(t):
                                        -    def reader:
                                        -        while True:
                                        -            yield t
                                        -    return reader
                                        -
                                        -true_reader = reader_creator_bool(True)
                                        -false_reader = reader_creator_bool(False)
                                        -
                                        -reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader)
                                        -# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry.
                                        -# And we don't care about the second item at this time.
                                        -paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Shuffle

                                        -

                                        Given the shuffle buffer size n, paddle.reader.shuffle returns a data reader that buffers n data entries and shuffles them before a data entry is read.

                                        -

                                        Example:

                                        -
                                        reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512)
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Q & A

                                        -
                                        -

                                        Why does a reader return only a single entry, and not a mini batch?

                                        -

                                        Returning a single entry makes reusing existing data readers much easier (for example, if an existing reader returns 3 entries instead if a single entry, the training code will be more complicated because it need to handle cases like a batch size 2).

                                        -

                                        We provide a function: paddle.batch to turn (a single entry) reader into a batch reader.

                                        -
                                        -
                                        -

                                        Why do we need a batch reader, isn’t is sufficient to give the reader and batch_size as arguments during training ?

                                        -

                                        In most of the cases, it would be sufficient to give the reader and batch_size as arguments to the train method. However sometimes the user wants to customize the order of data entries inside a mini batch, or even change the batch size dynamically. For these cases using a batch reader is very efficient and helpful.

                                        -
                                        -
                                        -

                                        Why use a dictionary instead of a list to provide mapping?

                                        -

                                        Using a dictionary ({"image":0, "label":1}) instead of a list (["image", "label"]) gives the advantage that the user can easily reuse the items (e.g., using {"image_a":0, "image_b":0, "label":1}) or even skip an item (e.g., using {"image_a":0, "label":2}).

                                        -
                                        -
                                        -

                                        How to create a custom data reader creator ?

                                        -
                                        def image_reader_creator(image_path, label_path, n):
                                        -    def reader():
                                        -        f = open(image_path)
                                        -        l = open(label_path)
                                        -        images = numpy.fromfile(
                                        -            f, 'ubyte', count=n * 28 * 28).reshape((n, 28 * 28)).astype('float32')
                                        -        images = images / 255.0 * 2.0 - 1.0
                                        -        labels = numpy.fromfile(l, 'ubyte', count=n).astype("int")
                                        -        for i in xrange(n):
                                        -            yield images[i, :], labels[i] # a single entry of data is created each time
                                        -        f.close()
                                        -        l.close()
                                        -    return reader
                                        -
                                        -# images_reader_creator creates a reader
                                        -reader = image_reader_creator("/path/to/image_file", "/path/to/label_file", 1024)
                                        -paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...)
                                        -
                                        -
                                        -
                                        -
                                        -

                                        How is paddle.train implemented

                                        -

                                        An example implementation of paddle.train is:

                                        -
                                        def train(batch_reader, mapping, batch_size, total_pass):
                                        -    for pass_idx in range(total_pass):
                                        -        for mini_batch in batch_reader(): # this loop will never end in online learning.
                                        -            do_forward_backward(mini_batch, mapping)
                                        -
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/refactorization.html b/develop/doc_cn/design/refactorization.html deleted file mode 100644 index fe1515b9d315b98d75fc03897195d4755dcf6ebe..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/refactorization.html +++ /dev/null @@ -1,591 +0,0 @@ - - - - - - - - - - - - - Design Doc: Refactorization Overview — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Refactorization Overview

                                        -

                                        The goals of refactoring include:

                                        -
                                          -
                                        1. Making it easy for external contributors to write new elementary computation operations.
                                        2. -
                                        3. Making the codebase clean and readable.
                                        4. -
                                        5. Designing a new computation representation – a computation graph of operators and variables.
                                        6. -
                                        7. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs.
                                        8. -
                                        -
                                        -

                                        Computation Graphs

                                        -
                                          -
                                        1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs.
                                        2. -
                                        3. Please refer to computation graphs for a concrete example.
                                        4. -
                                        5. Users write Python programs to describe the graphs and run them (locally or remotely).
                                        6. -
                                        7. A graph is composed of variables and operators.
                                        8. -
                                        9. The description of graphs must be serializable/deserializable, so that:
                                            -
                                          1. It can be sent to the cloud for distributed execution, and
                                          2. -
                                          3. It can be sent to clients for mobile or enterprise deployment.
                                          4. -
                                          -
                                        10. -
                                        11. The Python program does two things
                                            -
                                          1. Compilation runs a Python program to generate a protobuf message representation of the graph and send it to
                                              -
                                            1. the C++ library libpaddle.so for local execution,
                                            2. -
                                            3. the master process of a distributed training job for training, or
                                            4. -
                                            5. the server process of a Kubernetes serving job for distributed serving.
                                            6. -
                                            -
                                          2. -
                                          3. Execution executes the graph by constructing instances of class Variable and OperatorBase, according to the protobuf message.
                                          4. -
                                          -
                                        12. -
                                        -
                                        -
                                        -

                                        Description and Realization of Computation Graph

                                        -

                                        At compile time, the Python program generates a protobuf message representation of the graph, or a description of the graph.

                                        -

                                        At runtime, the C++ program realizes the graph and runs it.

                                        -

                                        | | Representation (protobuf messages) | Realization (C++ class objects) | -|—|—|—| -|Data|VarDesc|Variable| -|Operation|OpDesc|Operator| -|Block|BlockDesc|Block|

                                        -

                                        The word graph is interchangeable with block in this document. A graph consists of computation steps and local variables similar to a C++/Java program block, or a pair of parentheses({ and }).

                                        -
                                        -
                                        -

                                        Compilation and Execution

                                        -
                                          -
                                        1. Run a Python program to describe the graph. In particular, the Python application program does the following:
                                            -
                                          1. Create VarDesc to represent local/intermediate variables,
                                          2. -
                                          3. Create operators and set attributes,
                                          4. -
                                          5. Validate attribute values,
                                          6. -
                                          7. Infer the type and the shape of variables,
                                          8. -
                                          9. Plan memory-reuse for variables,
                                          10. -
                                          11. Generate the backward graph
                                          12. -
                                          13. Add optimization operators to the computation graph.
                                          14. -
                                          15. Optionally, split the graph for distributed training.
                                          16. -
                                          -
                                        2. -
                                        3. The invocation of train or infer methods in the Python program does the following:
                                            -
                                          1. Create a new Scope instance in the scope hierarchy for each run of a block,
                                              -
                                            1. realize local variables defined in the BlockDesc message in the new scope,
                                            2. -
                                            3. a scope is similar to the stack frame in programming languages,
                                            4. -
                                            -
                                          2. -
                                          3. Create an instance of class Block, in which,
                                              -
                                            1. realize operators in the BlockDesc message,
                                            2. -
                                            -
                                          4. -
                                          5. Run the Block by calling
                                              -
                                            1. Block::Eval(vector<Variable>* targets) for forward and backward computations, or
                                            2. -
                                            3. Block::Eval(vector<Operator>* targets) for optimization.
                                            4. -
                                            -
                                          6. -
                                          -
                                        4. -
                                        -
                                        -
                                        -

                                        Intermediate Representation (IR)

                                        -
                                        Compile Time -> IR -> Runtime
                                        -
                                        -
                                        -
                                        -

                                        Benefits of IR

                                        -
                                          -
                                        • Optimization

                                          -
                                          Compile Time -> IR -> Optimized IR -> Runtime
                                          -
                                          -
                                          -
                                        • -
                                        • Automatically send partitioned IR to different nodes.

                                          -
                                            -
                                          • Automatic Data Parallelism

                                            -
                                            Compile Time
                                            -|-> Single GPU IR
                                            -    |-> [trainer-IR-0, trainer-IR-1, pserver-IR]
                                            -        |-> Node-0 (runs trainer-IR-0)
                                            -        |-> Node-1 (runs trainer-IR-1)
                                            -        |-> Node-2 (runs pserver-IR)
                                            -
                                            -
                                            -
                                          • -
                                          • Automatic Model Parallelism (planned for future)

                                            -
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Operator/OpWithKernel/OpKernel

                                        -

                                        class_diagram

                                        -
                                        -
                                        -
                                        -

                                        Operator

                                        -

                                        class_diagram

                                        -
                                          -
                                        • Operator is the fundamental building block of the user interface.
                                            -
                                          • Operator stores input/output variable names and attributes.
                                          • -
                                          • The InferShape interface is used to infer the shape of the output variables based on the shapes of the input variables.
                                          • -
                                          • Use Run to compute the output variables from the input variables.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        OpWithKernel/Kernel

                                        -

                                        class_diagram

                                        -
                                          -
                                        • OpWithKernel inherits Operator.
                                        • -
                                        • OpWithKernel contains a Kernel map.
                                            -
                                          • OpWithKernel::Run get device’s kernel, and invoke OpKernel::Compute.
                                          • -
                                          • OpKernelKey is the map key. Only device place now, but may be data type later.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Why separate Kernel and Operator

                                        -
                                          -
                                        • Separate GPU and CPU code.
                                            -
                                          • Make Paddle capable of running without GPU.
                                          • -
                                          -
                                        • -
                                        • Make one operator (which is a user interface) and create many implementations.
                                            -
                                          • For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Libraries for Kernel development

                                        -
                                          -
                                        • Eigen::Tensor contains basic math and element-wise functions.
                                            -
                                          • Note that Eigen::Tensor has broadcast implementation.
                                          • -
                                          • Limit the number of tensor.device(dev) = in your code.
                                          • -
                                          -
                                        • -
                                        • thrust::transform and std::transform.
                                            -
                                          • thrust has the same API as C++ standard library. Using transform, one can quickly implement customized element-wise kernels.
                                          • -
                                          • thrust, in addition, supports more complex APIs, like scan, reduce, reduce_by_key.
                                          • -
                                          -
                                        • -
                                        • Hand-writing GPUKernel and CPU code
                                            -
                                          • Do not write in header (.h) files. CPU Kernel should be in cpp source (.cc) and GPU kernels should be in cuda (.cu) files. (GCC cannot compile GPU code.)
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Operator Registration

                                        -
                                        -

                                        Why is registration necessary?

                                        -

                                        We need a method to build mappings between Op type names and Op classes.

                                        -
                                        -
                                        -

                                        How is registration implemented?

                                        -

                                        Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.

                                        -
                                        -
                                        -
                                        -
                                        -

                                        The Registry Map

                                        -
                                        -

                                        OpInfoMap

                                        -

                                        op_type(string) -> OpInfo

                                        -

                                        OpInfo:

                                        -
                                          -
                                        • creator: The Op constructor.
                                        • -
                                        • grad_op_type: The type of the gradient Op.
                                        • -
                                        • proto: The Op’s Protobuf, including inputs, outputs and required attributes.
                                        • -
                                        • checker: Used to check attributes.
                                        • -
                                        -
                                        -
                                        -
                                        - -
                                        -
                                        -

                                        Registration Process

                                        -
                                          -
                                        1. Write an Op class and its gradient Op class, if required.
                                        2. -
                                        3. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.
                                        4. -
                                        5. Invoke the macro REGISTER_OP. This macro will
                                            -
                                          1. Call maker class to complete proto and checker
                                          2. -
                                          3. Using the completed proto and checker, it will add a new key-value pair to the OpInfoMap
                                          4. -
                                          -
                                        6. -
                                        -
                                        -
                                        -
                                        -

                                        Backward Module (1/2)

                                        -
                                        -

                                        Create Backward Operator

                                        -
                                          -
                                        • Mapping from forward Op to backward Op -backward
                                        • -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Backward Module (2/2)

                                        -
                                        -

                                        Build Backward Network

                                        -
                                          -
                                        • Input: a graph of forward operators
                                        • -
                                        • Output: a graph of backward operators
                                        • -
                                        • Corner cases in construction
                                            -
                                          • Shared Variables => insert an Add operator to combine gradients
                                          • -
                                          • No Gradient => insert a fill_zero_grad operator
                                          • -
                                          • Recursive NetOp => call Backward recursively
                                          • -
                                          • RNN Op => recursively call Backward on stepnet
                                          • -
                                          • RNN Op => recursively call Backward on stepnet
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Scope, Variable, Tensor

                                        -
                                          -
                                        • Tensor is an n-dimension array with type.
                                            -
                                          • Only dims and data pointers are stored in Tensor.
                                          • -
                                          • All operations on Tensor are written in Operator or global functions.
                                          • -
                                          • Variable length Tensor design LoDTensor
                                          • -
                                          -
                                        • -
                                        • Variable instances are the inputs and the outputs of an operator, not just Tensor.
                                            -
                                          • step_scopes in RNN is a variable and not a tensor.
                                          • -
                                          -
                                        • -
                                        • Scope is where variables are stored.
                                            -
                                          • map<string var name, Variable>
                                          • -
                                          • Scope has a hierarchical structure. The local scope can get variables from its parent scope.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Block (in design)

                                        -
                                        -

                                        the difference between original RNNOp and Block

                                        -
                                          -
                                        • As an operator is more intuitive than RNNOp,
                                        • -
                                        • Offers a new interface Eval(targets) to deduce the minimal block to Run,
                                        • -
                                        • Fits the compile-time/ runtime separation design paradigm.
                                            -
                                          • During the compilation, SymbolTable stores VarDescs and OpDescs and serialize to a BlockDesc
                                          • -
                                          • When graph executes, a Block with BlockDesc is passed. It then creates Op and Var instances and then invokes Run.
                                          • -
                                          -
                                        • -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Milestone

                                        -
                                          -
                                        • Take Paddle/books as the main line, the requirement of the models motivates framework refactoring,
                                        • -
                                        • Model migration
                                            -
                                          • Framework development gives priority support to model migration, for example,
                                              -
                                            • the MNIST demo needs a Python interface,
                                            • -
                                            • the RNN models require the framework to support LoDTensor.
                                            • -
                                            -
                                          • -
                                          • Determine some timelines,
                                          • -
                                          • Frequently used Ops need to be migrated first,
                                          • -
                                          • Different models can be migrated in parallel.
                                          • -
                                          -
                                        • -
                                        • Improve the framework at the same time
                                        • -
                                        • Accept imperfection, concentrate on solving the specific problem at the right price.
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Control the migration quality

                                        -
                                          -
                                        • Compare the performance of migrated models with old ones.
                                        • -
                                        • Follow the google C++ style guide.
                                        • -
                                        • Build the automatic workflow of generating Python/C++ documentations.
                                            -
                                          • The documentation of layers and ops should be written inside the code.
                                          • -
                                          • Take the documentation quality into account when submitting pull requests.
                                          • -
                                          • Preview the documentations, read and improve them from a user’s perspective.
                                          • -
                                          -
                                        • -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/register_grad_op.html b/develop/doc_cn/design/register_grad_op.html deleted file mode 100644 index c0027e02054c87e2b54fea9ecb119454088b6a71..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/register_grad_op.html +++ /dev/null @@ -1,336 +0,0 @@ - - - - - - - - - - - - - Design Doc: Gradient Operators Registration — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Gradient Operators Registration

                                        -
                                        -

                                        The Problem Posed

                                        -

                                        Currently, for each C++ operator class definition, a gradient operator creator function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.

                                        -

                                        However, we noticed two problems with the current design:

                                        -
                                          -
                                        1. As we decided to separate the compilation and the execution phases, we need to change the creator to take an OpDesc protobuf message in a ProgramDesc and inserts corresponding OpDesc messages into the ProgramDesc message.
                                        2. -
                                        3. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of minus operator consists of two operators – an identity operator followed by a scale operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.
                                        4. -
                                        -
                                        -
                                        -

                                        The Current Implementation

                                        -

                                        Instances of the C++ class OpInfo are stored an associative map whose key is the operator type. The grad_op_type indicates the associated gradient operator type. An operator can create the gradient operator by invoking OpInfo::creator_ of the gradient operator. The pseudo code is as follows

                                        -
                                        struct OpInfo {
                                        -  std::function<OperatorBase*(...)> creator_;
                                        -  std::string grad_op_type_;
                                        -  ...
                                        -};
                                        -
                                        -map<string, OpInfo> OpInfoMap;
                                        -
                                        -OperatorBase* CreateGradientOperator(const OperatorBase& op) {
                                        -  return OpInfoMap.at(op.Type()).creator_(...);
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Proposed Solution

                                        -

                                        The mapping relationship between an operator and its gradient operators is a function. The interface of this function is:

                                        -
                                        // (OpDesc) --> vector<OpDesc>
                                        -std::function<std::vector<OpDescBind>(const OpDescBind&)>;
                                        -
                                        -
                                        -

                                        The function takes an OpDescBind of the forward operator and returns one or many gradient operator descriptions. OpDescBind is a C++ wrapper for the protobuf message OpDesc for rapid manipulation of OpDesc.

                                        -

                                        The GradOpDescMaker will be registered in OpInfo and will replace the grad_op_type_ field. The OpInfo should look like

                                        -
                                        struct OpInfo {
                                        -  std::function<std::vector<std::unique_ptr<OpDescBind>>(const OpDescBind&)>  grad_op_maker_;
                                        -  ...
                                        -};
                                        -
                                        -
                                        -

                                        The grad_op_maker_ is a nullptr if the operator does not have any associated gradient operators.

                                        -

                                        We propose a base class called GradOpDescMakerBase to let operator developers generate Gradient Operators easily. The public interface of that class is

                                        -
                                        class GradOpDescMakerBase {
                                        -public:
                                        -  GradOpDescMakerBase(const OpDescBind& );
                                        -  virtual std::vector<std::unique_ptr<OpDescBind>> operator()()const = 0;
                                        -};
                                        -
                                        -
                                        -

                                        We can convert GradOpDescMakerBase to std::function<std::vector<std::unique_ptr<OpDescBind>>(const OpDescBind&)> by

                                        -
                                        using GradOpMaker = ...;
                                        -std::function<std::vector<OpDescBind>(const OpDescBind&)> func;
                                        -func = [] (const OpDescBind& fwd_op) {
                                        -  GradOpMaker maker(fwd_op);
                                        -  return maker();
                                        -};
                                        -
                                        -
                                        -

                                        We can write many helper functions since the GradOpDescMakerBase is a class now. The basic helper functions get the variables of Input, Output, InputGradient and OutputGradient in the forwarding operator.

                                        -

                                        We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So REGISTER_OP just register one operator. If the REGISTER_OPERATOR contains OpProtoAndCheckerMaker and GradOpDescMaker, we just list them in the same macro. It can be done by a macro contains __VA_ARGS__.

                                        -

                                        The user interface should be

                                        -
                                        vector<OpDesc> MinusOpGradMaker(OpDesc) {...}
                                        -REGISTER_OPERATOR(minus, MinusOp, MinusOpProtoAndCheckerMaker, SumOpGradMaker);
                                        -// Developers can still manually implement gradient operator.
                                        -REGISTER_OPERATOR(minus_grad, MinusGradOp);
                                        -
                                        -
                                        -

                                        The interface of current REGISTER_OP macro could not be changed. In REGISTER_OP, it will invoke REGISTER_OPERATOR two times and generate GradOpDescMaker inside.

                                        -
                                        REGISTER_OP(minus, MinusOp, MinusOpProtoAndCheckerMaker, minus_grad, MinusGradOp);
                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/regularization.html b/develop/doc_cn/design/regularization.html deleted file mode 100644 index 0d13d93e31748b8ded6fb75851d5e57cea39ce0b..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/regularization.html +++ /dev/null @@ -1,328 +0,0 @@ - - - - - - - - - - - - - Regularization in PaddlePaddle — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Regularization in PaddlePaddle

                                        -
                                        -

                                        Introduction to Regularization

                                        -

                                        A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. A frequently faced problem is the problem of overfitting, where the model does not make reliable predictions on new unseen data. Regularization is the process of introducing additional information in order to prevent overfitting. This is usually done by adding extra penalties to the loss function that restricts the parameter spaces that an optimization algorithm can explore.

                                        -
                                        -

                                        Parameter Norm Penalties

                                        -

                                        Most common regularization approaches in deep learning are based on limiting the capacity of the models by adding a parameter norm penalty to the objective function J. This is given as follows:

                                        -


                                        -

                                        The parameter alpha is a hyperparameter that weights the relative contribution of the norm penalty term, omega, relative to the standard objective function J.

                                        -

                                        The most commonly used norm penalties are the L2 norm penalty and the L1 norm penalty. These are given as follows:

                                        -
                                        -

                                        L2 Regularization:

                                        -


                                        -
                                        -
                                        -

                                        L1 Regularization

                                        -


                                        -

                                        A much more detailed mathematical background of regularization can be found here.

                                        -
                                        -
                                        -
                                        -
                                        -

                                        Regularization Survey

                                        -

                                        A detailed survey of regularization in various deep learning frameworks can be found here.

                                        -
                                        -
                                        -

                                        Proposal for Regularization in PaddlePaddle

                                        -
                                        -

                                        Low-Level implementation

                                        -

                                        In the new design, we propose to create new operations for regularization. For now, we can add 2 ops that correspond to the most frequently used regularizations:

                                        -
                                          -
                                        • L2_regularization_op
                                        • -
                                        • L1_regularization_op
                                        • -
                                        -

                                        These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for Activation Ops. This abstraction pattern can make it very easy to implement new regularization schemes other than L1 and L2 norm penalties.

                                        -

                                        The idea of building ops for regularization is in sync with the refactored Paddle philosophy of using operators to represent any computation unit. The way these ops will be added to the computation graph, will be decided by the layer functions in Python API.

                                        -
                                        -
                                        -

                                        Computation Graph

                                        -

                                        Below is an example of a really simple feed forward neural network.

                                        -


                                        -

                                        The Python API will modify this computation graph to add regularization operators. The modified computation graph will look as follows:

                                        -


                                        -
                                        -
                                        -

                                        Python API implementation for Regularization

                                        -

                                        Using the low level ops, L2_regularization_op and L1_regularization_op, any user can add regularization to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support regularization. An example of such an API can be seen in Keras. As per the PaddlePaddle Python API design, the layer functions are responsible for creating operators, operator parameters and variables. Since regularization is a property of parameters, it makes sense to create these in the layer functions.

                                        -
                                        -

                                        Creation of Regularization ops

                                        -

                                        There are two possibilities for creating the regularization ops:

                                        -
                                          -
                                        1. We create these ops immediately while building the computation graph.
                                        2. -
                                        3. We add these ops in a lazy manner, just before the backward, similar to the way the optimization ops are added.
                                        4. -
                                        -

                                        The proposal is to add these ops in a lazy manner just before the backward pass.

                                        -
                                        -
                                        -

                                        Storage of Regularization attributes

                                        -

                                        Since we want to create the regularization ops in a lazy manner, the regularization attributes (type of regularization and weight of regularization penalty) can be stored as attributes of the Parameter class. This is because regularization is a property of the parameters and storing regularization properties with Parameters also allows for shared parameters.

                                        -
                                        -
                                        -

                                        High-level API

                                        -

                                        In PaddlePaddle Python API, users will primarily rely on layer functions to create neural network layers. Hence, we also need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in Keras and also by looking at Tensorflow in tf.contrib.layers.

                                        -
                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/releasing_process.html b/develop/doc_cn/design/releasing_process.html deleted file mode 100644 index 35ec97deac889dd74f5e142b8ed153dafbb48bf7..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/releasing_process.html +++ /dev/null @@ -1,371 +0,0 @@ - - - - - - - - - - - - - PaddlePaddle发行规范 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        PaddlePaddle发行规范

                                        -

                                        PaddlePaddle使用git-flow branching model做分支管理,使用Semantic Versioning标准表示PaddlePaddle版本号。

                                        -

                                        PaddlePaddle每次发新的版本,遵循以下流程:

                                        -
                                          -
                                        1. develop分支派生出新的分支,分支名为release/版本号。例如,release/0.10.0
                                        2. -
                                        3. 将新分支的版本打上tag,tag为版本号rc.Patch号。第一个tag为0.10.0rc1,第二个为0.10.0rc2,依次类推。
                                        4. -
                                        5. 对这个版本的提交,做如下几个操作:
                                        6. -
                                        -
                                          -
                                        • 使用Regression Test List作为检查列表,测试本次release的正确性。

                                          -
                                            -
                                          • 如果失败,记录下所有失败的例子,在这个release/版本号分支中,修复所有bug后,Patch号加一,到第二步

                                            -
                                          • -
                                          • 修改python/setup.py.in中的版本信息,并将istaged字段设为True

                                            -
                                          • -
                                          • 编译这个版本的python wheel包,并发布到pypi。

                                            -
                                              -
                                            • 由于pypi.python.org目前遵循严格的命名规范PEP 513,在使用twine上传之前,需要重命名wheel包中platform相关的后缀,比如将linux_x86_64修改成manylinux1_x86_64

                                              -
                                            • -
                                            • pypi上的package名称为paddlepaddle和paddlepaddle_gpu,如果要上传GPU版本的包,需要修改build/python/setup.py中,name: “paddlepaddle_gpu”并重新打包wheel包:python setup.py bdist_wheel

                                              -
                                            • -
                                            • 上传方法:

                                              -
                                              cd build/python
                                              -pip install twine
                                              -twine upload dist/[package to upload]
                                              -
                                              -
                                              -
                                            • -
                                            • 编译这个版本的Docker发行镜像,发布到dockerhub。如果失败,修复Docker编译镜像问题,Patch号加一,返回第二步

                                              -
                                            • -
                                            -
                                          • -
                                          -
                                        • -
                                        -
                                          -
                                        1. 第三步完成后,将release/版本号分支合入master分支,并删除release/版本号分支。将master分支的合入commit打上tag,tag为版本号。同时再将master分支合入develop分支。最后删除release/版本号分支。
                                        2. -
                                        3. 协同完成Release Note的书写
                                        4. -
                                        -

                                        需要注意的是:

                                        -
                                          -
                                        • release/版本号分支一旦建立,一般不允许再从develop分支合入release/版本号。这样保证release/版本号分支功能的封闭,方便测试人员测试PaddlePaddle的行为。
                                        • -
                                        • release/版本号分支存在的时候,如果有bugfix的行为,需要将bugfix的分支同时merge到master, developrelease/版本号这三个分支。
                                        • -
                                        -
                                        -

                                        发布wheel包到pypi

                                        -

                                        使用PaddlePaddle CI -完成自动化二进制编译,参考下图,选择需要发布的版本(通常包含一个CPU版本和一个GPU版本),点击”run”右侧的”...”按钮,可以 -弹出下面的选择框,在第二个tab (Changes)里选择需要发布的分支,这里选择0.11.0,然后点击”Run Build”按钮。等待编译完成后 -可以在此页面的”Artifacts”下拉框中找到生成的3个二进制文件,分别对应CAPI,cp27mcp27mu的版本。然后按照上述的方法 -使用twine工具上传即可。

                                        -

                                        -
                                          -
                                        • 注:CI环境使用 https://github.com/PaddlePaddle/buildtools 这里的DockerImage作为编译环境以支持更多的Linux -发型版,如果需要手动编译,也可以使用这些镜像。这些镜像也可以从 https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/ 下载得到。
                                        • -
                                        • pypi不支持覆盖上传,所以一个版本号的wheel包发布之后,不可以更改。下一个wheel包需要更新版本号才可以上传。
                                        • -
                                        -
                                        -
                                        -

                                        发布Docker镜像

                                        -

                                        上述PaddlePaddle CI编译wheel完成后会自动将Docker镜像push到DockerHub,所以,发布Docker镜像只需要对自动push的镜像打上 -版本号对应的tag即可:

                                        -
                                          -
                                        1. 进入 https://hub.docker.com/r/paddlepaddle/paddle/tags/ 查看latest tag的更新时间是否在上述编译wheel包完成后是否最新。
                                        2. -
                                        3. 执行 docker pull paddlepaddle/paddle:[latest tag],latest tag可以是latest或latest-gpu等。
                                        4. -
                                        5. 执行 docker tag paddlepaddle/paddle:[latest tag] paddlepaddle/paddle:[version]
                                        6. -
                                        7. 执行 docker push paddlepaddle/paddle:[version]
                                        8. -
                                        -
                                        -
                                        -

                                        PaddlePaddle 分支规范

                                        -

                                        PaddlePaddle开发过程使用git-flow分支规范,并适应github的特性做了一些区别。

                                        -
                                          -
                                        • PaddlePaddle的主版本库遵循git-flow分支规范。其中:
                                            -
                                          • master分支为稳定(stable branch)版本分支。每一个master分支的版本都是经过单元测试和回归测试的版本。
                                          • -
                                          • develop分支为开发(develop branch)版本分支。每一个develop分支的版本都经过单元测试,但并没有经过回归测试。
                                          • -
                                          • release/版本号分支为每一次Release时建立的临时分支。在这个阶段的代码正在经历回归测试。
                                          • -
                                          -
                                        • -
                                        • 其他用户的fork版本库并不需要严格遵守git-flow分支规范,但所有fork的版本库的所有分支都相当于特性分支。
                                            -
                                          • 建议,开发者fork的版本库使用develop分支同步主版本库的develop分支
                                          • -
                                          • 建议,开发者fork的版本库中,再基于develop版本fork出自己的功能分支。
                                          • -
                                          • 当功能分支开发完毕后,向PaddlePaddle的主版本库提交Pull Reuqest,进而进行代码评审。
                                              -
                                            • 在评审过程中,开发者修改自己的代码,可以继续在自己的功能分支提交代码。
                                            • -
                                            -
                                          • -
                                          -
                                        • -
                                        • BugFix分支也是在开发者自己的fork版本库维护,与功能分支不同的是,BugFix分支需要分别给主版本库的masterdevelop与可能有的release/版本号分支,同时提起Pull Request
                                        • -
                                        -
                                        -
                                        -

                                        PaddlePaddle回归测试列表

                                        -

                                        本列表说明PaddlePaddle发版之前需要测试的功能点。

                                        -
                                        -

                                        PaddlePaddle Book中所有章节

                                        -

                                        PaddlePaddle每次发版本首先要保证PaddlePaddle Book中所有章节功能的正确性。功能的正确性包括验证PaddlePaddle目前的paddle_trainer训练和纯使用Python训练模型正确性。

                                        -

                                        | | 新手入门章节 | 识别数字 | 图像分类 | 词向量 | 情感分析 | 语意角色标注 | 机器翻译 | 个性化推荐 | -| — | — | — | — | — | — | — | — | — | -| API.V2 + Docker + GPU | | | | | | | | | -| API.V2 + Docker + CPU | | | | | | | | | -| paddle_trainer + Docker + GPU | | | | | | | | | -| paddle_trainer + Docker + CPU | | | | | | | | | -| API.V2 + Ubuntu + GPU | | | | | | | | | -| API.V2 + Ubuntu + CPU | | | | | | | | | -| paddle_trainer + Ubuntu + GPU | | | | | | | | | -| paddle_trainer + Ubuntu + CPU | | | | | | | | |

                                        -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/scope.html b/develop/doc_cn/design/scope.html deleted file mode 100644 index feab2dd6c33779228255a9dc448ee6da91760b97..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/scope.html +++ /dev/null @@ -1,381 +0,0 @@ - - - - - - - - - - - - - Design of Scope in Paddle — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design of Scope in Paddle

                                        -
                                        -

                                        Overview

                                        -

                                        Scope is an important concept in programming languages, which defines a program region that a set of bindings between names and entities applies. In a specific scope, a valid name is uniquely associated with an entity, such as a variable. And in another scope, this name may refer to other entity or nothing at all. It clearly restricts the visibility and validity of names in a program. Hence Scope is introduced to PaddlePaddle to manage variables in context. But different from the original abstract concept, Scope now becomes an object with two important attributes:

                                        -
                                          -
                                        • Scope is an association of a name to variable.
                                        • -
                                        • Variables in a parent scope can be retrieved from local scope.
                                        • -
                                        -

                                        A detailed explanation of these two attributes goes as following.

                                        -
                                        -
                                        -

                                        Scope is an association of a name to variable.

                                        -

                                        Scope is an association of a name to variable. All variables belong to Scope. You need to specify a scope to run a Net, i.e., net.Run(&scope). One net can run in different scopes and update different variable in the scope.

                                        -
                                          -
                                        1. Scope only contains a map of a name to variable.

                                          -

                                          All parameters, data, states in a Net should be variables and stored inside a scope. Each op should get inputs and outputs to do computation from a scope, such as data buffer, state (momentum) etc.

                                          -
                                        2. -
                                        3. Variable can only be created by Scope and a variable can only be got from Scope. User cannot create or get a variable outside a scope. This is a constraints of our framework, and will keep our framework simple and clear.

                                          -
                                        4. -
                                        5. Scope only contains methods that are used to Create and Get Variables. Scope do not contain Operators and have no information to run them. -Net is designed to drive the computation and Scope only contains a map of variables. There is no computation logic inside a Scope. Scope just handles the lifetime management of variables.

                                          -
                                            -
                                          • Create is used to create a Variable by its name and add the mapping relation.
                                          • -
                                          • Get is used to find a Variable by name.
                                          • -
                                          -
                                        6. -
                                        7. Every variable only belongs to one certain Scope.

                                          -

                                          Variable can not belong to many scopes. If you want to use variables from parent scope, you can use parent scope.

                                          -
                                        8. -
                                        9. Scope should destruct all Variables inside it when itself is destructed. User can never store Variable pointer somewhere else.

                                          -

                                          Because Variable can only be got from Scope. When destroying Scope, we also need to destroy all the Variables in it. If user store Variable pointer to private data member or some global variable, the pointer will be an invalid pointer when associated Scope is destroyed.

                                          -
                                        10. -
                                        -
                                        class Scope {
                                        - public:
                                        -  Variable* Var(const std::string& name);
                                        -  const Variable* FindVar(const std::string& name) const;
                                        -
                                        - private:
                                        -    std::unordered_map<std::string, std::unique_ptr<Variable>> vars_;
                                        -};
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Parent scope and local scope

                                        -

                                        Just like scope in programming languages, Scope in the neural network can also be a local scope. There are two attributes about local scope.

                                        -
                                          -
                                        1. We can create local variables in a local scope. When that local scope is destroyed, all local variables should also be destroyed.
                                        2. -
                                        3. Variables in a parent scope can be retrieved from local scopes of that parent scope, i.e., when user get a variable from a scope, it will try to search this variable in current scope. If there is no such variable in the local scope, scope will keep searching from its parent, until the variable is found or there is no parent.
                                        4. -
                                        -
                                        class Scope {
                                        - public:
                                        -  Scope(const std::shared_ptr<Scope>& scope): parent_(scope) {}
                                        -
                                        -  Variable* FindVar(const std::string& name) const {
                                        -    auto it = vars_.find(name);
                                        -    if (it != vars_.end()) {
                                        -      return it->second.get();
                                        -    } else if (parent_ != nullptr) {
                                        -      return parent_->FindVar(name);
                                        -    } else {
                                        -      return nullptr;
                                        -    }
                                        -  }
                                        -
                                        - private:
                                        -  std::shared_ptr<Scope> parent_ {nullptr};
                                        -};
                                        -
                                        -
                                        -

                                        In Scope class, there is a private data member called parent_. parent_ is a smart pointer to its parent scope. When user Get a variable by its name, the name will be searched inside the current scope. If the variable cannot be found locally and parent scope is not a nullptr, the variable will be searched inside that parent scope. parent_ pointer’s default value is nullptr. It means that the scope is a global scope when parent_ is nullptr.

                                        -

                                        A local scope is very useful when we implement Recurrent Neural Network. Each timestep of an RNN should be a Net. Each Net of timestep (StepNet for short) should use an independent local scope. Just like variables in a while loop is inside a local scope in programming languages. By using a single StepNet and changing local scope, we can implement an RNN easily.

                                        -
                                        -
                                        -
                                        -

                                        Interface Design

                                        -
                                        class Variable {
                                        - private:
                                        -  Variable() = default;
                                        -  friend class Scope;
                                        -};
                                        -
                                        -class Scope {
                                        - private:
                                        -  Scope(const std::shared_ptr<Scope>& parent = nullptr);
                                        -
                                        - public:
                                        -  static std::shared_ptr<Scope> Create(const std::shared_ptr<Scope>& parent = nullptr);
                                        -
                                        -  // return nullptr if not found.
                                        -  Variable* FindVar(const std::string& name) const;
                                        -
                                        -  // return if already contains same name variable.
                                        -  Variable* Var(const std::string& name);
                                        -
                                        - private:
                                        -  std::shared_ptr<Scope> parent_;
                                        -  std::unordered_map<std::string, std::unique_ptr<Variable>> vars_;
                                        -};
                                        -
                                        -
                                        -
                                        -

                                        Only scope can create a variable

                                        -

                                        To ensure only scope can create a variable, we should mark Variable‘s constructor as a private member function, and Scope is a friend class of Variable. And then only Var can construct Variable.

                                        -
                                        -
                                        -

                                        When scope destroyed, all variables inside this scope should be destroyed together

                                        -

                                        The scope hold unique pointers for all variables. User can FindVar from scope, but he should not hold this pointer as a member variable. Because when scope is destroyed, all variables inside this scope will be destroyed together.

                                        -
                                        -
                                        -

                                        Sharing a parent scope

                                        -

                                        Local scope contains a parent_ pointer. It is a linked-list for scopes. Using a shared_ptr because when a local scope is using, its parents cannot be destroyed.

                                        -

                                        Also, as the parent scope is a shared_ptr, we can only Create() a scope shared pointer. We cannot construct a scope variable, because it cannot be passed to other scope as parent pointer.

                                        -
                                        -
                                        -

                                        Orthogonal interface

                                        -

                                        FindVar will return nullptr when name is not found. It can be used as Contains method. Var will return an Error when there is a name conflict locally. Combine FindVar and Var, we can implement Var easily.

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/selected_rows.html b/develop/doc_cn/design/selected_rows.html deleted file mode 100644 index 2504b99cd79d28cd9c6ec99f10375243c304ab3a..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/selected_rows.html +++ /dev/null @@ -1,327 +0,0 @@ - - - - - - - - - - - - - Design Doc: Selected Rows — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Selected Rows

                                        -

                                        SelectedRows is a type of sparse tensor data type, which is designed to support embedding operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure:

                                        -
                                        class SelectedRows {
                                        - private:
                                        -  vector<int> rows_;
                                        -  Tensor value_;
                                        -  int height_;
                                        -};
                                        -
                                        -
                                        -

                                        The field height_ is the first dimension of SelectedRows. The rows are the indices of the non-zero rows of SelectedRows. The value_ field is an N-dim tensor of shape [rows.size() /* NUM_ROWS */, ...], which supplies values for each row. The dimension of SelectedRows satisfies [height_] + value_.shape[1:].

                                        -

                                        Suppose that a SelectedRows-typed variable x has many rows, but only two of them have values – row 73 is [1, 2] and row 84 is [3, 4], the SelectedRows representation would be:

                                        -
                                        x = SelectedRow {
                                        -  rows = [73, 84],
                                        -  value = [[1, 2], [3,4]]
                                        -}
                                        -
                                        -
                                        -
                                        -

                                        SelectedRows in Protobuf

                                        -

                                        SelectedRows is a type of Variable. VarDesc in protobuf should describe the SelectedRows information. Only the tensor dimension of a SelectedRows will be described in compile-time because the rows_ and value_ are dependent on the training data. -So we use TensorDesc to unify data_type and dims. A LodTensorDesc contains a TensorDesc and lod_level. The description of SelectedRows is a Tensor description.

                                        -
                                        message TensorDesc {
                                        -  required DataType data_type = 1;
                                        -  repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
                                        -}
                                        -
                                        -message LodTensorDesc {
                                        -  required TensorDesc tensor = 1;
                                        -  optional int lod_level = 2;
                                        -}
                                        -
                                        -message VarDesc {
                                        -  required string name = 1;
                                        -  enum VarType { 
                                        -    LOD_TENSOR = 0;
                                        -    SELECTED_ROWS = 1;
                                        -  }
                                        -  required VarType type = 2;
                                        -  optional LodTensorDesc lod_desc = 3;
                                        -  optional TensorDesc selected_rows_desc = 4;
                                        -  optional bool persistable = 5 [ default = false ];
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        InferShape for Selected Rows

                                        -

                                        Just like LoD information, InferShape method will infer the output tensor type as well. The operator should decide whether its output is a SelectedRows or Dense tensor.

                                        -

                                        For example, the gradient operator of TableLookup will always generate SelectedRows. Its InferShape method should be like following

                                        -
                                        void TableLookupGrad::InferShape(context) {
                                        -  ...
                                        -  context.SetDataType("Embedding.Grad", kSelectedRows);
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Sparse Operators

                                        -

                                        There are several operators that need to be written to support SelectedRows. These are:

                                        -
                                          -
                                        1. Operators which generate SelectedRows gradient. e.g. Gradient of TableLookupOp.
                                        2. -
                                        3. Optimize operators which support SelectedRows gradient. e.g. SGD or AdaGrad for SelectedRows. However, there should be only one SGD operator. OpWithKernel::Run should select a suitable kernel for both dense tensor or SelectedRows.
                                        4. -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/simple_op_design.html b/develop/doc_cn/design/simple_op_design.html deleted file mode 100644 index a52b8731862ae5a577feac21f112c83875ded724..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/simple_op_design.html +++ /dev/null @@ -1,449 +0,0 @@ - - - - - - - - - - - - - Interaction between C++ and Python — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Interaction between C++ and Python

                                        -

                                        Users employ API in Python to describe their own network, however, the network construction actually happens in C++. so Protobuf is introduced to send the message between Python and C++.

                                        -

                                        The Interaction between Python and C++ can be simplified as two steps:

                                        -
                                          -
                                        1. C++ tells Python how many Ops there are, and what parameter do users need to offer to initialize a new Op. Python then builds API for each Op at compile time.
                                        2. -
                                        3. Users invoke APIs built by Python and provide necessary parameters. These parameters will be sent to C++ for finishing the Op construction task.
                                        4. -
                                        -
                                        -

                                        Message from C++ to Python

                                        -

                                        We define a Protobuf message class OpProto to hold message needed in the first step. What should an OpProto contain? This question is equivalent to “What message do we need to offer, to build a Python API which is legal and user oriented and can use to describe a whole Op.”

                                        -

                                        Following message are necessary:

                                        -
                                          -
                                        1. Op’s name, and its simple comment.
                                        2. -
                                        3. Input and output variable number; each variable’s name, type, and comment.
                                        4. -
                                        5. Op’s attributes; each attribute includes name, type, comment, default value and value range.
                                        6. -
                                        -

                                        So OpProto can be defined as follows:

                                        -
                                        enum AttrType {
                                        -    INT = 1;
                                        -    FLOAT = 2;
                                        -    STRING = 3;
                                        -    INTS = 4;
                                        -    FLOATS = 5;
                                        -    STRINGS = 6;
                                        -};
                                        -
                                        -message AttrValue {
                                        -    AttrType type = 1;
                                        -    optional int iv = 2;
                                        -    optional float fv = 3;
                                        -    optional string sv = 4;
                                        -    repeated int ivs = 5;
                                        -    repeated float fvs = 6;
                                        -    repeated string svs = 7;
                                        -};
                                        -
                                        -message AttrProto {
                                        -    required string name = 1;
                                        -    required string comment = 2;
                                        -    required AttrType type = 3;
                                        -};
                                        -
                                        -message VarProto {
                                        -    required string name = 1;
                                        -    required string comment = 2;
                                        -    required bool is_tensor = 3;
                                        -};
                                        -
                                        -message OpProto {
                                        -    repeated VarProto inputs = 1;
                                        -    repeated VarProto outputs = 2;
                                        -    repeated AttrProto attrs = 3;
                                        -    required string type = 4;
                                        -    required string comment = 5;
                                        -};
                                        -
                                        -
                                        -

                                        To generate Python code automatically:

                                        -
                                        def create_python_ops_creatation_functions():
                                        -    op_protos = paddle.framework.OpRegistry.get_all_op_proto()
                                        -    for type_name in op_protos:
                                        -        op_proto = op_protos[type_name]
                                        -        def __impl__(**kwargs):  # User must use key word args in Paddle API
                                        -            inputs = [kwargs.get(ipt.name, "") for ipt in op_proto.inputs]
                                        -            outputs = [kwargs.get(opt.name, "") for opt in op_proto.outputs]
                                        -            attrs = [cast_to_op_attr(attr, kwargs.get(attr.name, None)) for attr in op_proto.attrs]
                                        -            opdesc = (input, outputs, type_name, attrs)
                                        -            return paddle.framework.OpRegistry.CreateOp(opdesc)
                                        -        __impl__.__doc__ = create_doc_string(op_proto)
                                        -        globals()[type_name] = __impl__
                                        -
                                        -create_python_ops_creatation_functions()
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Message from Python to C++

                                        -

                                        To hold message needed in the above second step, we define Protobuf message class OpDesc. It is used to hold user-specified parameters in Op describing.

                                        -
                                        message OpDesc {
                                        -    required string type = 1;   
                                        -    repeated string inputs = 2;
                                        -    repeated string outputs = 3;
                                        -    map<string, AttrValue> attrs = 4;
                                        -};
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        OpProto Register

                                        -

                                        Every Op has its own OpProto. For using convenience, we need to register them and record all their messages. For each Op class, we define a corresponding OpMaker class, in whose constructor we implement the OpProto‘s building process. OpMaker‘s constructor will be invoked by another function OpRegistry::RegisterOp().

                                        -
                                        class OpProtoMaker {
                                        -public:
                                        -    OpProtoMaker(OpProto* proto): proto_(proto) {}
                                        -protected:
                                        -    OpProto* proto_;
                                        -    void AddInput(const std::string& name, const std::string& desc) {...}
                                        -    void AddAttr(const std::string& name, const std::string& desc, TypeId type) {...}
                                        -    void AddComment(const std::string& comment) { ... }
                                        -};
                                        -
                                        -class OpRegistry {
                                        -public:
                                        -    using OpCreator = std::function<OperatorBase* (OpDesc& desc)>;
                                        -    
                                        -    template <typename OpType, typename OpMaker>
                                        -    static void RegisterOp(const std::string& name) {
                                        -        gCreators_[name] = [](const OpDesc& desc) {
                                        -            return new OpType(desc);
                                        -        };
                                        -        OpProto& opProto = gProtos_[name];
                                        -        OpMaker()(&opProto);
                                        -    }
                                        -
                                        -    static map<string, OpCreator> gCreators_;
                                        -    static map<string, OpProto> gProtos_;
                                        -};
                                        -
                                        -template <typename OpType, typename OpMaker>
                                        -class OpRegister {
                                        -  public:
                                        -    OpRegister(std::string type) {
                                        -        OpRegistry::RegisterOp<OpType, OpMaker>(type);
                                        -    }
                                        -};
                                        -
                                        -#define REGISTER_OP(op_class, op_maker_class, type_name)         \
                                        -    class op_class##Register {                                   \
                                        -      private:                                                   \
                                        -        const static OpRegister<#op_class, #op_maker_class> reg; \
                                        -    };                                                           \
                                        -    const Register op_class##Register::reg(#type_name);
                                        -    
                                        -class CosineOp {
                                        -// ...
                                        -}
                                        -
                                        -struct CosineOpProtoMaker : public OpProtoMaker {
                                        -    CosineOpProtoMaker(OpProto* proto) : OpProtoMaker(proto) {
                                        -        AddInput("input", "input of cosine op");
                                        -        AddAttr("scale", "scale of cosine op", float).Default(1.0).GreaterThan(0.0);
                                        -        AddType("cos");
                                        -        AddComment("This is cos op");
                                        -    }
                                        -}
                                        -
                                        -REGISTER_OP(CosineOp, CosineOpProtoMaker, cos);
                                        -
                                        -
                                        -

                                        In REGISTER_OP(CosineOp, CosineOpProtoMaker, cos), we register not only CosineOp but also CosineOpProto. As fields of CosineOpProto, the default value and value range of scale are also registered here.

                                        -
                                        -
                                        -

                                        Python API

                                        -

                                        Python APIs are divided into two types, high-level API and low-level API.

                                        -
                                        -

                                        High-Level API

                                        -

                                        High-level API is called by users directly, so it should keep its style consistent with existing V2 APIs.

                                        -

                                        Here is a sample about how a define a fc layer:

                                        -
                                        hd = fc_layer(input=data, size=56, with_bias=True, activation="sigmoid");
                                        -
                                        -
                                        -

                                        hd is the output of fc_layer and it’s a variable. It can be further sent into other layers as input.

                                        -

                                        The definition of fc_layer():

                                        -
                                        def fc_layer(input, size, with_bias, activation):
                                        -    attr_map = {"size":size}
                                        -    check_attrs(attr_map)
                                        -    w = make_variable('w')
                                        -    if with_bias:
                                        -        b = make_variable('b')
                                        -    else:
                                        -        b = None
                                        -    fc_output = make_variable('fc_output');
                                        -    fc_op(input, w, b, fc_output, attr_map)
                                        -    act_output = make_variable('sigmod_output');
                                        -    if activation == "sigmod":
                                        -        sigmod_op(fc_output, act_output);
                                        -    elif:
                                        -        # ...
                                        -    return act_output;
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Low Leval API

                                        -

                                        In above sample, fc_op and sigmod_op are low-level API. They build OpDesc and invoke corresponding C++ code.

                                        -

                                        TODO

                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/speech/deep_speech_2.html b/develop/doc_cn/design/speech/deep_speech_2.html deleted file mode 100644 index 9062b47c87cb57174863c4750cc19384e35c0d72..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/speech/deep_speech_2.html +++ /dev/null @@ -1,468 +0,0 @@ - - - - - - - - - - - - - DeepSpeech2 on PaddlePaddle: Design Doc — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        DeepSpeech2 on PaddlePaddle: Design Doc

                                        -

                                        We are planning to build Deep Speech 2 (DS2) [1], a powerful Automatic Speech Recognition (ASR) engine, on PaddlePaddle. For the first-stage plan, we have the following short-term goals:

                                        -
                                          -
                                        • Release a basic distributed implementation of DS2 on PaddlePaddle.
                                        • -
                                        • Contribute a chapter of Deep Speech to PaddlePaddle Book.
                                        • -
                                        -

                                        Intensive system optimization and low-latency inference library (details in [1]) are not yet covered in this first-stage plan.

                                        - -
                                        -

                                        Tasks

                                        -

                                        We roughly break down the project into 14 tasks:

                                        -
                                          -
                                        1. Develop an audio data provider:
                                            -
                                          • Json filelist generator.
                                          • -
                                          • Audio file format transformer.
                                          • -
                                          • Spectrogram feature extraction, power normalization etc.
                                          • -
                                          • Batch data reader with SortaGrad.
                                          • -
                                          • Data augmentation (optional).
                                          • -
                                          • Prepare (one or more) public English data sets & baseline.
                                          • -
                                          -
                                        2. -
                                        3. Create a simplified DS2 model configuration:
                                            -
                                          • With only fixed-length (by padding) audio sequences (otherwise need Task 3).
                                          • -
                                          • With only bidirectional-GRU (otherwise need Task 4).
                                          • -
                                          • With only greedy decoder (otherwise need Task 5, 6).
                                          • -
                                          -
                                        4. -
                                        5. Develop to support variable-shaped dense-vector (image) batches of input data.
                                            -
                                          • Update DenseScanner in dataprovider_converter.py, etc.
                                          • -
                                          -
                                        6. -
                                        7. Develop a new lookahead-row-convolution layer (See [1] for details):
                                            -
                                          • Lookahead convolution windows.
                                          • -
                                          • Within-row convolution, without kernels shared across rows.
                                          • -
                                          -
                                        8. -
                                        9. Build KenLM language model (5-gram) for beam search decoder:
                                            -
                                          • Use KenLM toolkit.
                                          • -
                                          • Prepare the corpus & train the model.
                                          • -
                                          • Create infererence interfaces (for Task 6).
                                          • -
                                          -
                                        10. -
                                        11. Develop a beam search decoder with CTC + LM + WORDCOUNT:
                                            -
                                          • Beam search with CTC.
                                          • -
                                          • Beam search with external custom scorer (e.g. LM).
                                          • -
                                          • Try to design a more general beam search interface.
                                          • -
                                          -
                                        12. -
                                        13. Develop a Word Error Rate evaluator:
                                            -
                                          • update ctc_error_evaluator(CER) to support WER.
                                          • -
                                          -
                                        14. -
                                        15. Prepare internal dataset for Mandarin (optional):
                                            -
                                          • Dataset, baseline, evaluation details.
                                          • -
                                          • Particular data preprocessing for Mandarin.
                                          • -
                                          • Might need cooperating with the Speech Department.
                                          • -
                                          -
                                        16. -
                                        17. Create standard DS2 model configuration:
                                            -
                                          • With variable-length audio sequences (need Task 3).
                                          • -
                                          • With unidirectional-GRU + row-convolution (need Task 4).
                                          • -
                                          • With CTC-LM beam search decoder (need Task 5, 6).
                                          • -
                                          -
                                        18. -
                                        19. Make it run perfectly on clusters.
                                        20. -
                                        21. Experiments and benchmarking (for accuracy, not efficiency):
                                            -
                                          • With public English dataset.
                                          • -
                                          • With internal (Baidu) Mandarin dataset (optional).
                                          • -
                                          -
                                        22. -
                                        23. Time profiling and optimization.
                                        24. -
                                        25. Prepare docs.
                                        26. -
                                        27. Prepare PaddlePaddle Book chapter with a simplified version.
                                        28. -
                                        -
                                        -
                                        -

                                        Task Dependency

                                        -

                                        Tasks parallelizable within phases:

                                        -

                                        Roadmap | Description | Parallelizable Tasks -———– | :———————————— | :——————– -Phase I | Simplified model & components | Task 1 ~ Task 8 -Phase II | Standard model & benchmarking & profiling | Task 9 ~ Task 12 -Phase III | Documentations | Task13 ~ Task14

                                        -

                                        Issue for each task will be created later. Contributions, discussions and comments are all highly appreciated and welcomed!

                                        -
                                        -
                                        -

                                        Design Details

                                        -
                                        -

                                        Overview

                                        -

                                        Traditional ASR (Automatic Speech Recognition) pipelines require great human efforts devoted to elaborately tuning multiple hand-engineered components (e.g. audio feature design, accoustic model, pronuncation model and language model etc.). Deep Speech 2 (DS2) [1], however, trains such ASR models in an end-to-end manner, replacing most intermediate modules with only a single deep network architecture. With scaling up both the data and model sizes, DS2 achieves a very significant performance boost.

                                        -

                                        Please read Deep Speech 2 [1,2] paper for more background knowledge.

                                        -

                                        The classical DS2 network contains 15 layers (from bottom to top):

                                        -
                                          -
                                        • Two data layers (audio spectrogram, transcription text)
                                        • -
                                        • Three 2D convolution layers
                                        • -
                                        • Seven uni-directional simple-RNN layers
                                        • -
                                        • One lookahead row convolution layers
                                        • -
                                        • One fully-connected layers
                                        • -
                                        • One CTC-loss layer
                                        • -
                                        -
                                        -
                                        -Figure 1. Archetecture of Deep Speech 2 Network. -

                                        We don’t have to persist on this 2-3-7-1-1-1 depth [2]. Similar networks with different depths might also work well. As in [1], authors use a different depth (e.g. 2-2-3-1-1-1) for final experiments.

                                        -

                                        Key ingredients about the layers:

                                        -
                                          -
                                        • Data Layers:
                                            -
                                          • Frame sequences data of audio spectrogram (with FFT).
                                          • -
                                          • Token sequences data of transcription text (labels).
                                          • -
                                          • These two type of sequences do not have the same lengthes, thus a CTC-loss layer is required.
                                          • -
                                          -
                                        • -
                                        • 2D Convolution Layers:
                                            -
                                          • Not only temporal convolution, but also frequency convolution. Like a 2D image convolution, but with a variable dimension (i.e. temporal dimension).
                                          • -
                                          • With striding for only the first convlution layer.
                                          • -
                                          • No pooling for all convolution layers.
                                          • -
                                          -
                                        • -
                                        • Uni-directional RNNs
                                            -
                                          • Uni-directional + row convolution: for low-latency inference.
                                          • -
                                          • Bi-direcitional + without row convolution: if we don’t care about the inference latency.
                                          • -
                                          -
                                        • -
                                        • Row convolution:
                                            -
                                          • For looking only a few steps ahead into the feature, instead of looking into a whole sequence in bi-directional RNNs.
                                          • -
                                          • Not nessesary if with bi-direcitional RNNs.
                                          • -
                                          • Row” means convolutions are done within each frequency dimension (row), and no convolution kernels shared across.
                                          • -
                                          -
                                        • -
                                        • Batch Normalization Layers:
                                            -
                                          • Added to all above layers (except for data and loss layer).
                                          • -
                                          • Sequence-wise normalization for RNNs: BatchNorm only performed on input-state projection and not state-state projection, for efficiency consideration.
                                          • -
                                          -
                                        • -
                                        -

                                        Required Components | PaddlePaddle Support | Need to Develop -:————————————- | :————————————– | :———————– -Data Layer I (Spectrogram) | Not supported yet. | TBD (Task 3) -Data Layer II (Transcription) | paddle.data_type.integer_value_sequence | - -2D Convolution Layer | paddle.layer.image_conv_layer | - -DataType Converter (vec2seq) | paddle.layer.block_expand | - -Bi-/Uni-directional RNNs | paddle.layer.recurrent_group | - -Row Convolution Layer | Not supported yet. | TBD (Task 4) -CTC-loss Layer | paddle.layer.warp_ctc | - -Batch Normalization Layer | paddle.layer.batch_norm | - -CTC-Beam search | Not supported yet. | TBD (Task 6)

                                        -
                                        -
                                        -

                                        Row Convolution

                                        -

                                        TODO by Assignees

                                        -
                                        -
                                        -

                                        Beam Search with CTC and LM

                                        -
                                        -
                                        -Figure 2. Algorithm for CTC Beam Search Decoder. -
                                          -
                                        • The Beam Search Decoder for DS2 CTC-trained network follows the similar approach in [3] as shown in Figure 2, with two important modifications for the ambiguous parts:
                                            -
                                            1. -
                                            2. in the iterative computation of probabilities, the assignment operation is changed to accumulation for one prefix may comes from different paths;
                                            3. -
                                            -
                                          • -
                                            1. -
                                            2. the if condition if l^+ not in A_prev then after probabilities’ computation is deprecated for it is hard to understand and seems unnecessary.
                                            3. -
                                            -
                                          • -
                                          -
                                        • -
                                        • An external scorer would be passed into the decoder to evaluate a candidate prefix during decoding whenever a white space appended in English decoding and any character appended in Mandarin decoding.
                                        • -
                                        • Such external scorer consists of language model, word count or any other custom scorers.
                                        • -
                                        • The language model is built from Task 5, with parameters should be carefully tuned to achieve minimum WER/CER (c.f. Task 7)
                                        • -
                                        • This decoder needs to perform with high efficiency for the convenience of parameters tuning and speech recognition in reality.
                                        • -
                                        -
                                        -
                                        -
                                        -

                                        Future Work

                                        -
                                          -
                                        • Efficiency Improvement
                                        • -
                                        • Accuracy Improvement
                                        • -
                                        • Low-latency Inference Library
                                        • -
                                        • Large-scale benchmarking
                                        • -
                                        -
                                        - -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/support_new_device.html b/develop/doc_cn/design/support_new_device.html deleted file mode 100644 index 2d73d81dfb9e5370cbed39b9770b1a429b870fed..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/support_new_device.html +++ /dev/null @@ -1,461 +0,0 @@ - - - - - - - - - - - - - Design Doc: Supporting new Device/Library — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Supporting new Device/Library

                                        -
                                        -

                                        Background

                                        -

                                        Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries in a flexible and efficient manner.

                                        -

                                        On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example, Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library.

                                        -

                                        On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, Layer is exposed in Python, and Operator is exposed in C++. Both Layer and Operator are hardware independent.

                                        -

                                        So, how to support a new Device/Library in Fluid becomes a challenge.

                                        -
                                        -
                                        -

                                        Basic: Integrate A New Device/Library

                                        -

                                        For a general overview of fluid, please refer to the overview doc.

                                        -

                                        There are mainly three parts that we have to consider while integrating a new device/library:

                                        -
                                          -
                                        • Place and DeviceContext: indicate the device id and manage hardware resources
                                        • -
                                        • Memory and Tensor: malloc/free data on certain device
                                        • -
                                        • Math Functor and OpKernel: implement computing unit on certain devices/libraries
                                        • -
                                        -
                                        -

                                        Place and DeviceContext

                                        -

                                        Please note that device and computing library are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices.

                                        -
                                        -

                                        Place

                                        -

                                        Fluid uses class Place to represent the device memory where data is located. If we add another device, we have to add the corresponding DevicePlace.

                                        -
                                                |   CPUPlace
                                        -Place --|   CUDAPlace
                                        -        |   FPGAPlace
                                        -
                                        -
                                        -

                                        And Place is defined as follows:

                                        -
                                        typedef boost::variant<CUDAPlace, CPUPlace, FPGAPlace> Place;
                                        -
                                        -
                                        -
                                        -
                                        -

                                        DeviceContext

                                        -

                                        Fluid uses class DeviceContext to manage the resources in different libraries, such as CUDA stream in CDUADeviceContext. There are also inheritance relationships between different kinds of DeviceContext.

                                        -
                                                        /->  CPUDeviceContext   
                                        -DeviceContext ---->  CUDADeviceContext  
                                        -                \->  FPGADeviceContext
                                        -
                                        -
                                        -

                                        An example of Nvidia GPU is as follows:

                                        -
                                          -
                                        • DeviceContext
                                        • -
                                        -
                                        class DeviceContext {
                                        -  virtual Place GetPlace() const = 0;
                                        -};  
                                        -
                                        -
                                        -
                                          -
                                        • CUDADeviceContext
                                        • -
                                        -
                                        class CUDADeviceContext : public DeviceContext {
                                        -  Place GetPlace() const override { return place_; }
                                        -private:
                                        -  CUDAPlace place_;
                                        -  cudaStream_t stream_; 
                                        -  cublasHandle_t cublas_handle_;
                                        -  std::unique_ptr<Eigen::GpuDevice> eigen_device_;  // binds with stream_
                                        -};
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Memory and Tensor

                                        -
                                        -

                                        memory module

                                        -

                                        Fluid provides the following memory interfaces:

                                        -
                                        template <typename Place>
                                        -void* Alloc(Place place, size_t size);
                                        -
                                        -template <typename Place>
                                        -void Free(Place place, void* ptr);
                                        -
                                        -template <typename Place>
                                        -size_t Used(Place place);
                                        -
                                        -
                                        -

                                        To implement these interfaces, we have to implement MemoryAllocator for different Devices.

                                        -
                                        -
                                        -

                                        Tensor

                                        -

                                        Tensor holds data with some shape in a specific Place.

                                        -
                                        class Tensor {
                                        - public:
                                        -  /*! Return a pointer to mutable memory block. */
                                        -  template <typename T>
                                        -  inline T* data();
                                        -
                                        -  /**
                                        -   * @brief   Return a pointer to mutable memory block.
                                        -   * @note    If not exist, then allocation.
                                        -   */
                                        -  template <typename T>
                                        -  inline T* mutable_data(platform::Place place);
                                        -
                                        -  /**
                                        -   * @brief     Return a pointer to mutable memory block.
                                        -   *
                                        -   * @param[in] dims    The dimensions of the memory block.
                                        -   * @param[in] place   The place of the memory block.
                                        -   *
                                        -   * @note      If not exist, then allocation.
                                        -   */
                                        -  template <typename T>
                                        -  inline T* mutable_data(DDim dims, platform::Place place);
                                        -
                                        -  /*! Resize the dimensions of the memory block. */
                                        -  inline Tensor& Resize(const DDim& dims);
                                        -
                                        -  /*! Return the dimensions of the memory block. */
                                        -  inline const DDim& dims() const;
                                        -
                                        - private:
                                        -  /*! holds the memory block if allocated. */
                                        -  std::shared_ptr<Placeholder> holder_;
                                        -
                                        -  /*! points to dimensions of memory block. */
                                        -  DDim dim_;
                                        -};
                                        -
                                        -
                                        -

                                        Placeholder is used to delay memory allocation; that is, we can first define a tensor, using Resize to configurate its shape, and then call mutuable_data to allocate the actual memory.

                                        -
                                        paddle::framework::Tensor t;
                                        -paddle::platform::CPUPlace place;
                                        -// set size first
                                        -t.Resize({2, 3});
                                        -// allocate memory on CPU later
                                        -t.mutable_data(place);
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Math Functor and OpKernel

                                        -

                                        Fluid implements computing units based on different DeviceContexts. Some computing units are shared between operators. This common part will be put in operators/math directory as basic Functors.

                                        -

                                        Let’s take MaxOutFunctor as an example:

                                        -

                                        The interface is defined in the header file.

                                        -
                                        template <typename DeviceContext, typename T>
                                        -class MaxOutFunctor {
                                        - public:
                                        -  void operator()(const DeviceContext& context, const framework::Tensor& input,
                                        -                  framework::Tensor* output, int groups);
                                        -};
                                        -
                                        -
                                        -

                                        CPU implementation is in .cc file

                                        -
                                        template <typename T>
                                        -class MaxOutFunctor<platform::CPUDeviceContext, T> {
                                        -  public:
                                        -  void operator()(const platform::CPUDeviceContext& context,
                                        -                  const framework::Tensor& input, framework::Tensor* output,
                                        -                  int groups) {
                                        -                  ...
                                        -                  }
                                        -};
                                        -
                                        -
                                        -

                                        CUDA implementation is in .cu file

                                        -
                                        template <typename T>
                                        -class MaxOutFunctor<platform::CUDADeviceContext, T> {
                                        - public:
                                        -  void operator()(const platform::CUDADeviceContext& context,
                                        -                  const framework::Tensor& input, framework::Tensor* output,
                                        -                  int groups) {
                                        -                  ...
                                        -                  }
                                        -};                  
                                        -
                                        -
                                        -

                                        We first obtain the computing handle from a concrete DeviceContext and then compute on tensors.

                                        -

                                        The implementation of OpKernel is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map.

                                        -

                                        Fluid provides different register interfaces in op_registry.h

                                        -

                                        Let’s take Crop operator as an example:

                                        -

                                        In .cc file:

                                        -
                                        REGISTER_OP_CPU_KERNEL(crop, ops::CropKernel<float>);
                                        -REGISTER_OP_CPU_KERNEL(
                                        -    crop_grad, ops::CropGradKernel<paddle::platform::CPUDeviceContext, float>);
                                        -
                                        -
                                        -

                                        In .cu file:

                                        -
                                        REGISTER_OP_CUDA_KERNEL(crop, ops::CropKernel<float>);
                                        -REGISTER_OP_CUDA_KERNEL(
                                        -    crop_grad, ops::CropGradKernel<paddle::platform::CUDADeviceContext, float>);
                                        -
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Advanced topics: How to switch between different Device/Library

                                        -

                                        Generally, we will implement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not suitable on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run on GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.

                                        -

                                        For more details, please refer to following docs:

                                        -
                                          -
                                        • operator kernel type doc
                                        • -
                                        • switch kernel doc
                                        • -
                                        -
                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/switch.html b/develop/doc_cn/design/switch.html deleted file mode 100644 index 18ebc94adb13a58263bba940b6624fca49aaacca..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/switch.html +++ /dev/null @@ -1,291 +0,0 @@ - - - - - - - - - - - - - Design Doc: Switch — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design Doc: Switch

                                        -
                                        -
                                        -

                                        Background

                                        -

                                        Many programming languages provide switch as a generalization of if-elif-else. We want to add it to Fluid.

                                        -

                                        The following example shows the usage of fluid.switch.

                                        -
                                        a = fluid.Var(10)
                                        -b = fluid.Var(0)
                                        -
                                        -with switch() as switch:
                                        -    with switch.case(fluid.less_equal(a, 10)):
                                        -        fluid.print("Case 1")
                                        -    with switch.case(fluid.larger(a, 0)):
                                        -        fluid.print("Case 2")
                                        -    with switch.default():
                                        -        fluid.print("Case 3")
                                        -
                                        -
                                        -
                                        -
                                        -

                                        The Semantics

                                        -
                                          -
                                        1. A switch control-flow checks cases one-by-one.
                                        2. -
                                        3. The condition of each case is a boolean value, which is a scalar, and differs from the fluid.if_else control-flow, which condition could be a vector of boolean values.
                                        4. -
                                        5. It runs the first matched case, or the default case if there is one.
                                        6. -
                                        7. Once it matches a case, it runs the corresponding branch and only that branch. It’s like there is a C’s break keyword at the end of each case.
                                        8. -
                                        -

                                        The above program should print and print only “Case 1”.

                                        -

                                        The implementation of the backward pass of the switch control-flow is easier than the backward of the if_else, because switch runs at most one branch, whereas if-else could run more than one branches.

                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/tensor_array.html b/develop/doc_cn/design/tensor_array.html deleted file mode 100644 index 072d92123687fc358103a44ab7be67da0aa1c05b..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/tensor_array.html +++ /dev/null @@ -1,511 +0,0 @@ - - - - - - - - - - - - - Design for TensorArray — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Design for TensorArray

                                        -

                                        This design doc presents the necessity of a new C++ class TensorArray. -In addition to the very simple C++ implementation

                                        -
                                        class TensorArray {
                                        - public:
                                        -  explicit TensorArray(const LoDTensor&);
                                        -  explicit TensorArray(size_t size);
                                        -
                                        - private:
                                        -  vector<LoDTensor> values_;
                                        -};
                                        -
                                        -
                                        -

                                        We also need to expose it to PaddlePaddle’s Python API, -because users would want to use it with our very flexible operators WhileLoop. -An example for a RNN based on dynamic operators is

                                        -
                                        input = pd.data(...)
                                        -num_steps = Var(12)
                                        -
                                        -TensorArray states(size=num_steps)
                                        -TensorArray step_inputs(unstack_from=input)
                                        -TensorArray step_outputs(size=num_steps)
                                        -
                                        -W = Tensor(...)
                                        -U = Tensor(...)
                                        -default_state = some_op()
                                        -
                                        -step = Var(1)
                                        -
                                        -wloop = paddle.create_whileloop(loop_vars=[step])
                                        -with wloop.frame():
                                        -    wloop.break_if(pd.equal(step, num_steps)
                                        -    pre_state = states.read(step-1, default_state)
                                        -    step_input = step_inputs.read(step)
                                        -    state = pd.sigmoid(pd.matmul(U, pre_state) + pd.matmul(W, step_input))
                                        -    states.write(step, state)
                                        -    step_outputs.write(step, state) # output state
                                        -    step.update(state+1)
                                        -
                                        -output = step_outputs.stack()
                                        -
                                        -
                                        -
                                        -

                                        Background

                                        -

                                        Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call states[step_id] will get the state in step_idth time step.

                                        -

                                        An RNN can be implemented with the following pseudocode

                                        -
                                        Array states;
                                        -Array input_segments;
                                        -Array output_segments;
                                        -Parameter W, U;
                                        -
                                        -step = 1
                                        -seq_len = 12
                                        -while_loop {
                                        -   if (step == seq_len) break;
                                        -    states[step] = sigmoid(W * states[step-1] + U * input_segments[step]);
                                        -    output_segments[step] = states[step] // take state as output
                                        -   step++;
                                        -}
                                        -
                                        -
                                        -

                                        According to the RNN roadmap, there are several different RNNs that PaddlePaddle will eventually support.

                                        -

                                        Currently, the basic RNN implementation supported by PaddlePaddle is the recurrent_op which takes tensors as input and splits them into input_segments.

                                        -

                                        Since a tensor cannot store variable-length sequences directly, PaddlePaddle implements the tensor with level of details (LoDTensor for short). -Segmenting the LoDTensor is much more complicated than splitting a tensor, that makes it necessary to refactor the recurrent_op with LoDTensor segmenting support.

                                        -

                                        As the next step in RNN support, dynamic_recurrent_op should be introduced to handle inputs with variable-length sequences.

                                        -

                                        The implementation is similar to recurrent_op. -The key difference is the way the original input LoDTensors and outupts are split to get the input_segments and the output_segments.

                                        -

                                        Though it can’t be built over recurrent_op or dynamic_recurrent_op directly, -the logic behind splitting a tensor or a LoD tensor into input_segments remains the same.

                                        -
                                        -
                                        -

                                        Why TensorArray

                                        -

                                        The logic behind splitting the inputs to segments, states and outputs is similar and can be shared in a seperate module.

                                        -

                                        The array of states, input_segments and output_segments would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes.

                                        -

                                        So there should be an array-like container, which can store the segments of a tensor or LoD tensor.

                                        -

                                        This container can store an array of tensors and provides several methods to split a tensor or a LoD tensor . -This is where the notion of TensorArray comes from.

                                        -
                                        -
                                        -

                                        Introduce TensorArray to uniform all the three RNNs

                                        -

                                        TensorArray as a new concept is borrowed from TensorFlow, -it is meant to be used with dynamic iteration primitives such as while_loop and map_fn.

                                        -

                                        This concept can be used to support our new design of dynamic operations, and help to refactor some existing variant-sentence-related layers, -such as recurrent_op, RecurrentGradientMachine.

                                        -

                                        In our design for dynamic RNN, -TensorArray is used to segment inputs and store states in all time steps. -By providing some methods similar to a C++ array, -the definition of some state-based dynamic models such as RNN can be more natural and highly flexible.

                                        -
                                        -
                                        -

                                        Dynamic-operations on TensorArray

                                        -

                                        TensorArray will be used directly when defining dynamic models, so some operators listed below should be implemented

                                        -
                                        # several helper operators for TensorArray
                                        -def tensor_array_stack(ta, tensor):
                                        -    '''
                                        -    get a tensor array `ta`, return a packed `tensor`.
                                        -    '''
                                        -    pass
                                        -
                                        -def tensor_array_unstack(tensor, ta):
                                        -    '''
                                        -    get a `tensor`, unstack it and get a tensor array `ta`.
                                        -    '''
                                        -    pass
                                        -
                                        -def tensor_array_write(ta, index, tensor, data_shared):
                                        -    '''
                                        -    get a `tensor` and a scalar tensor `index`, write `tensor` into index-th
                                        -    value of the tensor array `ta`.
                                        -    `data_shared` is an attribute that specifies whether to copy or reference the tensors.
                                        -    '''
                                        -    pass
                                        -
                                        -def tensor_array_read(ta, index, tensor):
                                        -    '''
                                        -    get a tensor array `ta`, a scalar tensor `index`, read the index-th value of
                                        -    `ta` and return as the `tensor`.
                                        -    '''
                                        -    pass
                                        -
                                        -def tensor_array_size(ta, tensor):
                                        -    '''
                                        -    get a tensor array `ta`, return the size of `ta` and return as the scalar `tensor`.
                                        -    '''
                                        -    pass
                                        -
                                        -
                                        -

                                        It is trivial for users to use so many low-level operators, so some helper methods should be proposed in python wrapper to make TensorArray easier to use, -for example

                                        -
                                        class TensorArray:
                                        -    def __init__(self, name):
                                        -        self.name = name
                                        -        self.desc = TensorArrayDesc()
                                        -
                                        -    def stack(self, name=None):
                                        -        '''
                                        -        Pack the values in a `TensorArray` into a tensor with rank one higher
                                        -        than each tensor in `values`.
                                        -        `stack` can be used to split tensor into time steps for RNN or whileloop.
                                        -
                                        -        @name: str
                                        -            the name of the variable to output.
                                        -        '''
                                        -        tensor = Var(name)
                                        -        tensor_array_stack(self.name, tensor)
                                        -        return tensor
                                        -
                                        -    def unstack(self, input):
                                        -        '''
                                        -        Unpacks the given dimension of a rank-`R` tensor into rank-`(R-1)` tensors.
                                        -        `unstack` can be used to concatenate all the time steps for RNN or whileloop.
                                        -
                                        -        @input: str
                                        -            the name of input tensor
                                        -        '''
                                        -        tensor_array_unstack(tensor, self.name)
                                        -
                                        -    def write(self, index, value, data_shared=True):
                                        -        '''
                                        -        Write value into index of the TensorArray.
                                        -        If `data_shared` is set to True, than the index-th value in TensorArray will
                                        -        be shared with the tensor passed in.
                                        -
                                        -        @index: str
                                        -            name of a scalar tensor
                                        -        @value: str
                                        -            name of a tensor
                                        -        @data_shared: bool
                                        -        '''
                                        -        tensor_array_write(self.name, index, value, data_shared)
                                        -
                                        -    def read(self, index, output):
                                        -        '''
                                        -        Read the value at location `index` in the `TensorArray`.
                                        -
                                        -        @index: str
                                        -            name of a scalar tensor
                                        -        @output:
                                        -            name of a output variable
                                        -        '''
                                        -        tensor_array_read(self.name, index, output)
                                        -
                                        -
                                        -    def size(self, output):
                                        -        '''
                                        -        Return the number of values.
                                        -
                                        -        @output: str
                                        -            name of a scalar tensor
                                        -        '''
                                        -        tensor_array_size(self.name, output)
                                        -
                                        -
                                        -
                                        - -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/design/var_desc.html b/develop/doc_cn/design/var_desc.html deleted file mode 100644 index 239004a6a15a7faaf6593b6e264acc04780045aa..0000000000000000000000000000000000000000 --- a/develop/doc_cn/design/var_desc.html +++ /dev/null @@ -1,335 +0,0 @@ - - - - - - - - - - - - - Background — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        - - - - -
                                        - - - - - - -
                                        -
                                        - - - - - - -
                                        - -
                                        -
                                        -
                                        -
                                        - -
                                        -

                                        Background

                                        -

                                        PaddlePaddle divides the description of neural network computation into two stages: compile time and runtime. At compile time, the neural network computation is described as a ProgramDesc whereas at runtime an Executor interprets the ProgramDesc to compute the operations.

                                        -

                                        PaddlePaddle uses proto message to describe compile time program because :

                                        -
                                          -
                                        1. The computation program description must be serializable and saved in a file.
                                        2. -
                                        3. During distributed training, the serialized program will be sent to multiple workers. It should also be possible to break the program into different components, each of which can be executed on a different worker.
                                        4. -
                                        -

                                        The computation Program consists of nested Blocks. Each Block will consist of data(i.e. Variable) and Operations. The concept to represent them is in the table below.

                                        -

                                        | |compile time|runtime| -|—|—|—| -|Data|VarDesc(proto)|Variable(cpp)| -|Operation|OpDesc(proto)|Operator(cpp)|

                                        -
                                        -
                                        -

                                        Definition of VarType

                                        -

                                        A VarDesc should have a name, type and whether or not it is persistable. The are different kinds of variable types supported in PaddlePaddle, apart from the POD_Types like: LOD_TENSOR, SELECTED_ROWS, FEED_MINIBATCH, FETCH_LIST, STEP_SCOPES, LOD_RANK_TABLE, LOD_TENSOR_ARRAY, PLACE_LIST, READER and CHANNEL. These are declared inside VarType. A VarDesc then looks as the following:

                                        -
                                        message VarDesc {
                                        -  required string name = 1;
                                        -  required VarType type = 2;
                                        -  optional bool persistable = 3 [ default = false ];
                                        -}
                                        -
                                        -
                                        -
                                        -
                                        -

                                        Definition of TensorDesc

                                        -
                                        message TensorDesc {
                                        -  // Should only be PODType. Is enforced in C++
                                        -  required Type data_type = 1;
                                        -  repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
                                        -}
                                        -
                                        -
                                        -

                                        The Type here comes from the enum defined inside of VarType :

                                        -
                                        enum Type {
                                        -  // Pod Types
                                        -  BOOL = 0;
                                        -  INT16 = 1;
                                        -  INT32 = 2;
                                        -  INT64 = 3;
                                        -  FP16 = 4;
                                        -  FP32 = 5;
                                        -  FP64 = 6;
                                        -
                                        -  // Other types that may need additional descriptions
                                        -  LOD_TENSOR = 7;
                                        -  SELECTED_ROWS = 8;
                                        -  FEED_MINIBATCH = 9;
                                        -  FETCH_LIST = 10;
                                        -  STEP_SCOPES = 11;
                                        -  LOD_RANK_TABLE = 12;
                                        -  LOD_TENSOR_ARRAY = 13;
                                        -  PLACE_LIST = 14;
                                        -  READER = 15;
                                        -  CHANNEL = 16;
                                        -}
                                        -
                                        -
                                        -

                                        A TensorDesc describes SelectedRows and LoDTensor. For details of SelectedRows, please reference SelectedRows.

                                        -
                                        -
                                        -

                                        Definition of LodTensorDesc

                                        -
                                        message LoDTensorDesc {
                                        -  required TensorDesc tensor = 1;
                                        -  optional int32 lod_level = 2 [ default = 0 ];
                                        -}
                                        -
                                        -
                                        -

                                        A LoDTensorDesc contains a tensor and a lod_level.

                                        -
                                        -
                                        -

                                        Definition of Variable in Python

                                        -

                                        For Variable in Python, please reference Python API.

                                        -
                                        - - -
                                        -
                                        -
                                        - - -
                                        - -
                                        -

                                        - © Copyright 2016, PaddlePaddle developers. - -

                                        -
                                        - Built with Sphinx using a theme provided by Read the Docs. - -
                                        - -
                                        -
                                        - -
                                        - -
                                        - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/dev/contribute_to_paddle_cn.html b/develop/doc_cn/dev/contribute_to_paddle_cn.html index ce34ba56210462f990b38fae54f34522c8403247..fa6fa9db07b5f9a100872cba1fd96b0ed180b509 100644 --- a/develop/doc_cn/dev/contribute_to_paddle_cn.html +++ b/develop/doc_cn/dev/contribute_to_paddle_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                      • 开发标准
                                      • FAQ
                                          diff --git a/develop/doc_cn/dev/index_cn.html b/develop/doc_cn/dev/index_cn.html index 510faf9c4c6baab341a2cf405ca50f02a5172607..ef865931a046d328f6465077757ff22af8d05725 100644 --- a/develop/doc_cn/dev/index_cn.html +++ b/develop/doc_cn/dev/index_cn.html @@ -143,6 +143,7 @@ var _hmt = _hmt || [];
                                        • 开发标准
                                        • FAQ
                                            @@ -203,6 +204,7 @@ var _hmt = _hmt || []; diff --git a/develop/doc_cn/dev/new_layer_cn.html b/develop/doc_cn/dev/new_layer_cn.html index 67bdecb088d79f26248bddb6dfdf70069f1b6a96..142d71723441ac69f979612bd73ed859dbccb7ba 100644 --- a/develop/doc_cn/dev/new_layer_cn.html +++ b/develop/doc_cn/dev/new_layer_cn.html @@ -10,7 +10,7 @@ - 实现新的网络层 — PaddlePaddle 文档 + 如何实现新的网络层 — PaddlePaddle 文档 @@ -35,7 +35,10 @@ - + + + + - - - - - - - - - -
                                            - - - - -
                                            - - - - - - -
                                            -
                                            - - - - - - -
                                            - -
                                            -
                                            -
                                            -
                                            - -
                                            -

                                            如何写新的Operator

                                            - -
                                            -

                                            概念简介

                                            -

                                            简单介绍需要用到基类,详细介绍请参考设计文档。

                                            -
                                              -
                                            • framework::OperatorBase: Operator(简写,Op)基类。
                                            • -
                                            • framework::OpKernel: Op计算函数的基类,称作Kernel。
                                            • -
                                            • framework::OperatorWithKernel:继承自OperatorBase,Op有计算函数,称作有Kernel。
                                            • -
                                            • class OpProtoAndCheckerMaker:描述该Op的输入、输出、属性、注释,主要用于Python API接口生成
                                            • -
                                            -

                                            依据是否包含kernel,可以将Op分为两种:包含Kernel的Op和不包含kernel的Op,前者Op的定义继承自OperatorWithKernel,后者继承自OperatorBase。本教程主要介绍带Kernel的Op如何写,简单总结Op需要包含的内容如下:

                                            -

                                            内容 | 定义位置 -————– | :———————- -OpProtoMake定义 | .cc文件,Backward Op不需要定义OpProtoMake -Op定义 | .cc文件 -Kernel实现 | CPU、CUDA共享Kernel实现在.h文件中,否则,CPU 实现在.cc文件中,CUDA 实现在.cu文件中。 -注册Op | Op注册实现在.cc文件;Kernel注册CPU实现在.cc文件中,CUDA实现在.cu文件中

                                            -

                                            实现新的op都添加至目录paddle/operators下,文件命名以*_op.h(如有) 、 *_op.cc*_op.cu(如有)结尾。系统会根据文件名自动构建op和其对应的Python扩展。

                                            -

                                            下面以矩阵乘操作,即MulOp为例来介绍如何写带Kernel的Operator。

                                            -
                                            -
                                            -

                                            实现C++类

                                            -
                                            -

                                            定义ProtoMaker类

                                            -

                                            矩阵乘法的公式:$Out = X * Y$, 可见该计算由两个输入,一个输出组成。

                                            -

                                            首先定义ProtoMaker来描述该Op的输入、输出,并添加注释:

                                            -
                                            class MulOpMaker : public framework::OpProtoAndCheckerMaker {
                                            - public:
                                            -  MulOpMaker(OpProto *proto, OpAttrChecker *op_checker)
                                            -      : OpProtoAndCheckerMaker(proto, op_checker) {
                                            -    AddInput("X", "(Tensor), 2D tensor of size (M x K)");
                                            -    AddInput("Y", "(Tensor), 2D tensor of size (K x N)");
                                            -    AddOutput("Out", "(Tensor), 2D tensor of size (M x N)");
                                            -    AddComment(R"DOC(
                                            -Two Element Mul Operator.
                                            -The equation is: Out = X * Y
                                            -)DOC");
                                            -  }
                                            -};
                                            -
                                            -
                                            -

                                            MulOpMaker继承自framework::OpProtoAndCheckerMaker,构造函数含有2个参数:

                                            -
                                              -
                                            • framework::OpProto : 前者存储Op的输入输出和参数属性,将用于Python API接口的生成。
                                            • -
                                            • framework::OpAttrChecker :后者用于检查参数属性的合法性。
                                            • -
                                            -

                                            构造函数里通过AddInput添加输入参数,通过AddOutput添加输出参数,通过AddComment添加Op的注释。这些函数会将对应内容添加到OpProto中。

                                            -

                                            上面的代码在MulOp中添加两个输入XY,添加了一个输出Out,并解释了各自含义,命名请遵守命名规范

                                            -

                                            再以ScaleOp为例:

                                            -
                                            template <typename AttrType>
                                            -class ScaleOpMaker : public framework::OpProtoAndCheckerMaker {
                                            - public:
                                            -  ScaleOpMaker(OpProto *proto, OpAttrChecker *op_checker)
                                            -      : OpProtoAndCheckerMaker(proto, op_checker) {
                                            -    AddInput("X", "The input tensor of scale operator.").NotInGradient();
                                            -    AddOutput("Out", "The output tensor of scale operator.").NotInGradient();
                                            -    AddComment(R"DOC(Scale operator
                                            -The equation is: Out = scale*X
                                            -)DOC");
                                            -    AddAttr<AttrType>("scale", "scale of scale operator.").SetDefault(1.0);
                                            -  }
                                            -};
                                            -
                                            -
                                            -

                                            这个例子有两处不同:

                                            -
                                              -
                                            • AddInput("X","...").NotInGradient() : 表示X这个输入不参与ScaleOp对应的梯度Op计算之中,如果Op的某个输入不参与反向梯度的计算,请显示地调用.NotInGradient()进行设置。
                                            • -
                                            • AddAttr<AttrType>("scale", "...").SetDefault(1.0); : 增加scale系数,作为参数属性,并且设置默认值为1.0。
                                            • -
                                            -
                                            -
                                            -

                                            定义Operator类

                                            -

                                            下面的点实现了MulOp的定义:

                                            -
                                            class MulOp : public framework::OperatorWithKernel {
                                            - public:
                                            -  using framework::OperatorWithKernel::OperatorWithKernel;
                                            -
                                            - protected:
                                            -  void InferShape(const framework::InferShapeContext &ctx) const override {
                                            -    auto dim0 = ctx.Input<Tensor>("X")->dims();
                                            -    auto dim1 = ctx.Input<Tensor>("Y")->dims();
                                            -    PADDLE_ENFORCE_EQ(dim0.size(), 2,
                                            -                      "input X(%s) should be a tensor with 2 dims, a matrix",
                                            -                      ctx.op_.Input("X"));
                                            -    PADDLE_ENFORCE_EQ(dim1.size(), 2,
                                            -                      "input Y(%s) should be a tensor with 2 dims, a matrix",
                                            -                      ctx.op_.Input("Y"));
                                            -    PADDLE_ENFORCE_EQ(
                                            -        dim0[1], dim1[0],
                                            -        "First matrix's width must be equal with second matrix's height.");
                                            -    ctx.Output<Tensor>("Out")->Resize({dim0[0], dim1[1]});
                                            -  }
                                            -};
                                            -
                                            -
                                            -

                                            MulOp继承自OperatorWithKernelpublic成员:

                                            -
                                            using framework::OperatorWithKernel::OperatorWithKernel;
                                            -
                                            -
                                            -

                                            这句表示使用基类OperatorWithKernel的构造函数,也可写成:

                                            -
                                            MulOp(const std::string &type, const framework::VariableNameMap &inputs,
                                            -      const framework::VariableNameMap &outputs,
                                            -      const framework::AttributeMap &attrs)
                                            -  : OperatorWithKernel(type, inputs, outputs, attrs) {}
                                            -
                                            -
                                            -

                                            还需要重写InferShape接口。InferShape为const函数,不能修改Op的成员变量,参数为const framework::InferShapeContext &ctx,通过该参数可获取到输入输出以及属性。它的功能是:

                                            -
                                              -
                                            • 1). 做检查, 尽早报错:检查输入数据维度、类型等是否合法。
                                            • -
                                            • 2). 设置输出Tensor的形状。
                                            • -
                                            -

                                            通常OpProtoMakerOp类的定义写在.cc文件中,和下面将要介绍的注册函数一起放在.cc

                                            -
                                            -
                                            -

                                            定义OpKernel类

                                            -

                                            MulKernel继承自framework::OpKernel,带有下面两个模板参数:

                                            -
                                              -
                                            • typename DeviceContext: 表示设备类型,不同设备(CPU、CUDA)共享同一个Kernel时,需加该模板参数,不共享则不加,一个不共享的例子是OnehotCrossEntropyOpKernel
                                            • -
                                            • typename T : 表示数据类型,如float, double等。
                                            • -
                                            -

                                            需要为MulKernel类重写Compute接口。

                                            -
                                              -
                                            • Compute接受一个输入参数:const framework::ExecutionContext& context
                                            • -
                                            • InferShapeContext相比,ExecutionContext增加了设备类型,同样可获取到输入输出和属性参数。
                                            • -
                                            • Compute函数里实现OpKernel的具体计算逻辑。
                                            • -
                                            -

                                            下面是 MulKernel Compute的实现:

                                            -
                                            template <typename DeviceContext, typename T>
                                            -class MulKernel : public framework::OpKernel {
                                            -public:
                                            -void Compute(const framework::ExecutionContext& context) const override {
                                            -  auto* X = context.Input<Tensor>("X");
                                            -  auto* Y = context.Input<Tensor>("Y");
                                            -  auto* Z = context.Output<Tensor>("Out");
                                            -  Z->mutable_data<T>(context.GetPlace());
                                            -  auto& device_context = context.template device_context<DeviceContext>();
                                            -  math::matmul<DeviceContext, T>(*X, false, *Y, false, 1, Z, 0, device_context);
                                            -}
                                            -};
                                            -
                                            -
                                            -

                                            需要注意:不同设备(CPU、CUDA)共享一个Op定义,是否则共享同一个OpKernel,取决于Compute调用的函数是否支持不同设备。

                                            -

                                            MulOp的CPU、CUDA实现共享同一个KernelOpKernel不共享的例子可以参考:OnehotCrossEntropyOpKernel

                                            -

                                            为了使OpKernel的计算过程书写更加简单,并且CPU、CUDA的代码可以复用,我们通常借助 Eigen unsupported Tensor模块来实现Compute接口。关于在PaddlePaddle中如何使用Eigen库,请参考使用文档

                                            -

                                            到此,前向Op实现完成。接下来,需要在.cc文件中注册该op和kernel。 -反向Op类的定义,反向OpKernel的定义与前向Op类似,这里不再赘述。但需注意反向Op没有ProtoMaker

                                            -
                                            -
                                            -

                                            注册Operator

                                            -
                                              -
                                            • .cc文件中注册前向、反向Op类,注册CPU Kernel。

                                              -
                                              namespace ops = paddle::operators;
                                              -REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad);
                                              -REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUDeviceContext, float>);
                                              -REGISTER_OP_CPU_KERNEL(mul_grad,
                                              -              ops::MulGradKernel<paddle::platform::CPUDeviceContext, float>);
                                              -
                                              -
                                              -

                                              在上面的代码中:

                                              -
                                                -
                                              • REGISTER_OP : 注册ops::MulOp类,类型名为mul,该类的ProtoMakerops::MulOpMaker,注册ops::MulOpGrad,类型名为mul_grad
                                              • -
                                              • REGISTER_OP_WITHOUT_GRADIENT : 用于注册没有反向的Op。
                                              • -
                                              • REGISTER_OP_CPU_KERNEL :注册ops::MulKernel类,并特化模板参数为paddle::platform::CPUPlacefloat类型,同理,注册ops::MulGradKernel类。
                                              • -
                                              -
                                            • -
                                            -
                                              -
                                            • .cu文件中注册CUDA Kernel。

                                              -
                                                -
                                              • 请注意,如果CUDA Kernel的实现基于Eigen unsupported模块,那么在 .cu的开始请加上宏定义 #define EIGEN_USE_GPU,代码示例如下:
                                              • -
                                              -
                                              // if use Eigen unsupported module before include head files
                                              -#define EIGEN_USE_GPU
                                              -
                                              -namespace ops = paddle::operators;
                                              -REGISTER_OP_CUDA_KERNEL(mul, ops::MulKernel<paddle::platform::CUDADeviceContext, float>);
                                              -REGISTER_OP_CUDA_KERNEL(mul_grad,
                                              -                       ops::MulGradKernel<paddle::platform::CUDADeviceContext, float>);
                                              -
                                              -
                                              -
                                            • -
                                            -
                                            -
                                            -

                                            编译

                                            -

                                            运行下面命令可以进行编译:

                                            -
                                            make mul_op
                                            -
                                            -
                                            -
                                            -
                                            -
                                            -

                                            绑定Python

                                            -

                                            系统会对新增的op自动绑定Python,并链接到生成的lib库中。

                                            -
                                            -
                                            -

                                            实现单元测试

                                            -

                                            单测包括对比前向Op不同设备(CPU、CUDA)的实现、对比反向OP不同设备(CPU、CUDA)的实现、反向Op的梯度测试。下面介绍介绍MulOp的单元测试

                                            -
                                            -

                                            前向Operator单测

                                            -

                                            Op单元测试继承自OpTest。各项更加具体的单元测试在TestMulOp里完成。测试Operator,需要:

                                            -
                                              -
                                            1. setUp函数定义输入、输出,以及相关的属性参数。
                                            2. -
                                            3. 生成随机的输入数据。
                                            4. -
                                            5. 在Python脚本中实现与前向operator相同的计算逻辑,得到输出值,与operator前向计算的输出进行对比。
                                            6. -
                                            7. 反向计算已经自动集成进测试框架,直接调用相应接口即可。
                                            8. -
                                            -
                                            import unittest
                                            -import numpy as np
                                            -from op_test import OpTest
                                            -
                                            -
                                            -class TestMulOp(OpTest):
                                            -    def setUp(self):
                                            -        self.op_type = "mul"
                                            -        self.inputs = {
                                            -            'X': np.random.random((32, 84)).astype("float32"),
                                            -            'Y': np.random.random((84, 100)).astype("float32")
                                            -        }
                                            -        self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
                                            -
                                            -    def test_check_output(self):
                                            -        self.check_output()
                                            -
                                            -    def test_check_grad_normal(self):
                                            -        self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)
                                            -
                                            -    def test_check_grad_ingore_x(self):
                                            -        self.check_grad(
                                            -            ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X"))
                                            -
                                            -    def test_check_grad_ingore_y(self):
                                            -        self.check_grad(
                                            -            ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y'))
                                            -
                                            -
                                            -

                                            上面的代码首先导入依赖的包,下面是对setUp函数中操作的重要变量的详细解释:

                                            -
                                              -
                                            • self.op_type = "mul" : 定义类型,与operator注册时注册的类型一致。
                                            • -
                                            • self.inputs : 定义输入,类型为numpy.array,并初始化。
                                            • -
                                            • self.outputs : 定义输出,并在Python脚本中完成与operator同样的计算逻辑,返回Python端的计算结果。
                                            • -
                                            -
                                            -
                                            -

                                            反向operator单测

                                            -

                                            而反向测试中:

                                            -
                                              -
                                            • test_check_grad_normal中调用check_grad使用数值法检测梯度正确性和稳定性。
                                                -
                                              • 第一个参数["X", "Y"] : 指定对输入变量XY做梯度检测。
                                              • -
                                              • 第二个参数"Out" : 指定前向网络最终的输出目标变量Out
                                              • -
                                              • 第三个参数max_relative_error:指定检测梯度时能容忍的最大错误值。
                                              • -
                                              -
                                            • -
                                            • test_check_grad_ingore_xtest_check_grad_ingore_y分支用来测试只需要计算一个输入梯度的情况。
                                            • -
                                            -
                                            -
                                            -

                                            编译和执行

                                            -

                                            python/paddle/v2/framework/tests 目录下新增的 test_*.py 单元测试会被自动加入工程进行编译。

                                            -

                                            请注意,不同于Op的编译测试,运行单元测试测时需要编译整个工程,并且编译时需要打开WITH_TESTING, 即cmake paddle_dir -DWITH_TESTING=ON。编译成功后,执行下面的命令来运行单元测试:

                                            -
                                            make test ARGS="-R test_mul_op -V"
                                            -
                                            -
                                            -

                                            或者:

                                            -
                                            ctest -R test_mul_op
                                            -
                                            -
                                            -
                                            -
                                            -
                                            -

                                            注意事项

                                            -
                                              -
                                            • 为每个Op创建单独的*_op.h(如有)、*_op.cc*_op.cu(如有)。不允许一个文件中包含多个Op,这将会导致编译出错。
                                            • -
                                            • 注册Op时的类型名,需要和该Op的名字一样。即不允许在A_op.cc里面,注册REGISTER_OP(B, ...)等,这将会导致单元测试出错。
                                            • -
                                            • 如果Op没有实现CUDA Kernel,请不要创建空的*_op.cu,这将会导致单元测试出错。
                                            • -
                                            • 如果多个Op依赖一些共用的函数,可以创建非*_op.*格式的文件来存放,如gather.h文件。
                                            • -
                                            -
                                            -
                                            - - -
                                            -
                                            -
                                            - - -
                                            - -
                                            -

                                            - © Copyright 2016, PaddlePaddle developers. - -

                                            -
                                            - Built with Sphinx using a theme provided by Read the Docs. - -
                                            - -
                                            -
                                            - -
                                            - -
                                            - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/dev/use_eigen_cn.html b/develop/doc_cn/dev/use_eigen_cn.html deleted file mode 100644 index 4397f646227f6f8ee2a60800bdea9ba88236547b..0000000000000000000000000000000000000000 --- a/develop/doc_cn/dev/use_eigen_cn.html +++ /dev/null @@ -1,388 +0,0 @@ - - - - - - - - - - - - - 在Paddle中如何使用Eigen — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                            - - - - -
                                            - - - - - - -
                                            -
                                            - - - - - - -
                                            - -
                                            -
                                            -
                                            -
                                            - -
                                            -

                                            在Paddle中如何使用Eigen

                                            -

                                            神经网络本质上是一个计算图,计算需要的数据存放在Tensor中,而计算过程是由Operartor来描述的。在执行时,Operator调用对应OpKernel中的Compute接口,实现对Tensor的操作。

                                            -
                                            -

                                            Eigen Tensor模块

                                            -

                                            Eigen Tensor模块对element-wise计算提供了强大的支持,并且书写一份代码,可以同时在CPU、GPU执行。但Eigen Tensor是一个正在开发中的模块,因此可能测试不够完备,文档较少。

                                            -

                                            关于Eigen Tensor模块的详细介绍请参考文档1文档2

                                            -
                                            -
                                            -

                                            paddle::framework::Tensor

                                            -

                                            Paddle Tensor定义在framework目录下,其主要接口如下:

                                            -
                                            class Tensor {
                                            - public:
                                            -  /*! Return a pointer to mutable memory block. */
                                            -  template <typename T>
                                            -  inline T* data();
                                            -  
                                            -  /**
                                            -   * @brief   Return a pointer to mutable memory block.
                                            -   * @note    If not exist, then allocation.
                                            -   */
                                            -  template <typename T>
                                            -  inline T* mutable_data(platform::Place place);
                                            -  
                                            -  /**
                                            -   * @brief     Return a pointer to mutable memory block.
                                            -   *
                                            -   * @param[in] dims    The dimensions of the memory block.
                                            -   * @param[in] place   The place of the memory block.
                                            -   *
                                            -   * @note      If not exist, then allocation.
                                            -   */
                                            -  template <typename T>
                                            -  inline T* mutable_data(DDim dims, platform::Place place);
                                            -  
                                            -  /*! Resize the dimensions of the memory block. */
                                            -  inline Tensor& Resize(const DDim& dims);
                                            -  
                                            -  /*! Return the dimensions of the memory block. */
                                            -  inline const DDim& dims() const;
                                            -
                                            - private:  
                                            -  /*! holds the memory block if allocated. */
                                            -  std::shared_ptr<Placeholder> holder_;
                                            -  
                                            -  /*! points to dimensions of memory block. */
                                            -  DDim dim_;
                                            -};
                                            -
                                            -
                                            -

                                            Placeholder的作用是延迟分配内存,即我们可以先定义一个Tensor,然后使用Resize接口设置Tensor的大小,最后再调用mutable_data接口分配实际的内存。

                                            -
                                            paddle::framework::Tensor t;
                                            -paddle::platform::CPUPlace place;
                                            -// set size first
                                            -t.Resize({2, 3});
                                            -// allocate memory on CPU later
                                            -t.mutable_data(place);
                                            -
                                            -
                                            -
                                            -
                                            -

                                            paddle::framework::Tensor使用样例

                                            -

                                            下面以AddOp为例说明Tensor的使用过程:

                                            -
                                              -
                                            • InferShape
                                            • -
                                            -

                                            在运行神经网络计算图时,我们先调用每个OperatorInferShape接口,根据输入Tensor的大小来设置输出Tensor的大小,Resize接口会被调用。

                                            -
                                            void InferShape(const framework::InferShapeContext &ctx) const override {
                                            -  PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("X")->dims(),
                                            -                    ctx.Input<Tensor>("Y")->dims(),
                                            -                    "Two input of Add Op's dimension must be same.");
                                            -  ctx.Output<Tensor>("Out")->Resize(ctx.Input<Tensor>("X")->dims());
                                            -}
                                            -
                                            -
                                            -
                                              -
                                            • Run
                                            • -
                                            -

                                            OperatorRun接口最终会调用对应OpKernelCompute接口,在这时真正的分配内存,mutable_data接口会被调用。

                                            -
                                            void Compute(const framework::ExecutionContext& context) const override {
                                            -  auto* input0 = context.Input<Tensor>("X");
                                            -  auto* input1 = context.Input<Tensor>("Y");
                                            -  auto* output = context.Output<Tensor>("Out");
                                            -
                                            -  output->mutable_data<T>(context.GetPlace());
                                            -
                                            -  auto x = EigenVector<T>::Flatten(*input0);
                                            -  auto y = EigenVector<T>::Flatten(*input1);
                                            -  auto z = EigenVector<T>::Flatten(*output);
                                            -
                                            -  auto place = context.GetEigenDevice<Place>();
                                            -
                                            -  z.device(place) = x + y;
                                            -}
                                            -
                                            -
                                            -
                                            -
                                            -

                                            paddle::framework::Tensor到EigenTensor的转换

                                            -

                                            如上一小节所示,在具体的计算中,我们需要先把输入Tensor和输出Tensor转换为Eigen支持的格式。我们在eigen.h中提供了一些全局函数用来实现paddle::framework::Tensor到EigenTensor/EigenMatrix/EigenVector/EigenScalar的转换。

                                            -

                                            以EigenTensor为例,做一个介绍

                                            -
                                            Tensor t;
                                            -float* p = t.mutable_data<float>(make_ddim({1, 2, 3}), platform::CPUPlace());
                                            -for (int i = 0; i < 1 * 2 * 3; i++) {
                                            -  p[i] = static_cast<float>(i);
                                            -}
                                            -
                                            -EigenTensor<float, 3>::Type et = EigenTensor<float, 3>::From(t);
                                            -
                                            -
                                            -

                                            From是EigenTensor模板提供的一个接口,可以实现从paddle::framework::Tensor到对EigenTensor的转换。由于Tensor的rank是模板参数,因此在转换时需要显示的指定。

                                            -

                                            在Eigen中,不同rank的Tensor是不同类型,Vector是rank为1的Tensor。需要额外注意的是,EigenVector::From方法是把paddle中的一维Tensor转为Eigen的一维Tensor,在这里用EigenVector来表示;而EigenVector::Flatten方法是把paddle中的一个Tensor进行reshape操作,压扁成为Eigen的一维Tensor,类型仍然为EigenVector。

                                            -

                                            更多的转换方法请参考eigen_test.cc中的单元测试

                                            -
                                            -
                                            -

                                            实现计算

                                            -

                                            当需要完成计算时,我们需要等式左边的EigenTensor调用device接口。在这里需要注意的是,这里的EigenTensor之间的运算只是改变了原有Tensor中的数据,而不会改变原有Tensor的shape信息。

                                            -
                                            auto x = EigenVector<T>::Flatten(*input0);
                                            -auto y = EigenVector<T>::Flatten(*input1);
                                            -auto z = EigenVector<T>::Flatten(*output);
                                            -auto place = context.GetEigenDevice<Place>();
                                            -z.device(place) = x + y;
                                            -
                                            -
                                            -

                                            在这段代码中,input0/input1/output可以是任意维度的Tensor。我们调用了EigenVector的Flatten接口,把任意维度的Tensor转为了一维的EigenVector。而在计算结束之后,input0/input1/output的原有shape信息不变。如果想改变原有Tensor的shape信息,可以调用Resize接口进行改变。

                                            -

                                            由于Eigen Tensor模块的文档较少,我们可以参考TensorFlow的kernels模块下的相关OpKernel的计算代码。

                                            -
                                            -
                                            - - -
                                            -
                                            -
                                            - - -
                                            - -
                                            -

                                            - © Copyright 2016, PaddlePaddle developers. - -

                                            -
                                            - Built with Sphinx using a theme provided by Read the Docs. - -
                                            - -
                                            -
                                            - -
                                            - -
                                            - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/dev/write_docs_cn.html b/develop/doc_cn/dev/write_docs_cn.html index ec59b7cafad182bae61987f7892e1b317c61cb03..2abea256e4294a8e7328b0af0630de2d94fe4d8e 100644 --- a/develop/doc_cn/dev/write_docs_cn.html +++ b/develop/doc_cn/dev/write_docs_cn.html @@ -37,7 +37,7 @@ - + - - - - - - - - - -
                                            - - - - -
                                            - - - - - - -
                                            -
                                            - - - - - - -
                                            - -
                                            -
                                            -
                                            -
                                            - -

                                            此教程会介绍如何使用Python的cProfile包、Python库yep、Google perftools来进行性能分析 (profiling) 与调优(performance tuning)。

                                            -

                                            Profling 指发现性能瓶颈。系统中的瓶颈可能和程序员开发过程中想象的瓶颈相去甚远。Tuning 指消除瓶颈。性能优化的过程通常是不断重复地 profiling 和 tuning。

                                            -

                                            PaddlePaddle 用户一般通过调用 Python API 编写深度学习程序。大部分 Python API 调用用 C++ 写的 libpaddle.so。所以 PaddlePaddle 的性能分析与调优分为两个部分:

                                            -
                                              -
                                            • Python 代码的性能分析
                                            • -
                                            • Python 与 C++ 混合代码的性能分析
                                            • -
                                            -
                                            -

                                            Python代码的性能分析

                                            -
                                            -

                                            生成性能分析文件

                                            -

                                            Python标准库中提供了性能分析的工具包,cProfile。生成Python性能分析的命令如下:

                                            -
                                            python -m cProfile -o profile.out main.py
                                            -
                                            -
                                            -

                                            其中 main.py 是我们要分析的程序,-o标识了一个输出的文件名,用来存储本次性能分析的结果。如果不指定这个文件,cProfile会打印到标准输出。

                                            -
                                            -
                                            -

                                            查看性能分析文件

                                            -

                                            cProfile 在main.py 运行完毕后输出profile.out。我们可以使用cprofilev来查看性能分析结果。cprofilev是一个Python的第三方库。使用它会开启一个HTTP服务,将性能分析结果以网页的形式展示出来:

                                            -
                                            cprofilev -a 0.0.0.0 -p 3214 -f profile.out main.py
                                            -
                                            -
                                            -

                                            其中-a标识HTTP服务绑定的IP。使用0.0.0.0允许外网访问这个HTTP服务。-p标识HTTP服务的端口。-f标识性能分析的结果文件。main.py标识被性能分析的源文件。

                                            -

                                            用Web浏览器访问对应网址,即可显示性能分析的结果:

                                            -
                                               ncalls  tottime  percall  cumtime  percall filename:lineno(function)
                                            -        1    0.284    0.284   29.514   29.514 main.py:1(<module>)
                                            -     4696    0.128    0.000   15.748    0.003 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/executor.py:20(run)
                                            -     4696   12.040    0.003   12.040    0.003 {built-in method run}
                                            -        1    0.144    0.144    6.534    6.534 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/__init__.py:14(<module>)
                                            -
                                            -
                                            -

                                            每一列的含义是:

                                            -

                                            | 列名 | 含义 | -| — | — | -| ncalls | 函数的调用次数 | -| tottime | 函数实际使用的总时间。该时间去除掉本函数调用其他函数的时间 | -| percall | tottime的每次调用平均时间 | -| cumtime | 函数总时间。包含这个函数调用其他函数的时间 | -| percall | cumtime的每次调用平均时间 | -| filename:lineno(function) | 文件名, 行号,函数名 |

                                            -
                                            -
                                            -

                                            寻找性能瓶颈

                                            -

                                            通常tottimecumtime是寻找瓶颈的关键指标。这两个指标代表了某一个函数真实的运行时间。

                                            -

                                            将性能分析结果按照tottime排序,效果如下:

                                            -
                                                 4696   12.040    0.003   12.040    0.003 {built-in method run}
                                            -   300005    0.874    0.000    1.681    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/dataset/mnist.py:38(reader)
                                            -   107991    0.676    0.000    1.519    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:219(__init__)
                                            -     4697    0.626    0.000    2.291    0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp)
                                            -        1    0.618    0.618    0.618    0.618 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/__init__.py:1(<module>)
                                            -
                                            -
                                            -

                                            可以看到最耗时的函数是C++端的run函数。这需要联合我们第二节PythonC++混合代码的性能分析来进行调优。而sync_with_cpp函数的总共耗时很长,每次调用的耗时也很长。于是我们可以点击sync_with_cpp的详细信息,了解其调用关系。

                                            -
                                            Called By:
                                            -
                                            -   Ordered by: internal time
                                            -   List reduced from 4497 to 2 due to restriction <'sync_with_cpp'>
                                            -
                                            -Function                                                                                                 was called by...
                                            -                                                                                                             ncalls  tottime  cumtime
                                            -/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:428(sync_with_cpp)  <-    4697    0.626    2.291  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp)
                                            -/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:562(sync_with_cpp)  <-    4696    0.019    2.316  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:487(clone)
                                            -                                                                                                                  1    0.000    0.001  /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/fluid/framework.py:534(append_backward)
                                            -
                                            -
                                            -Called:
                                            -
                                            -   Ordered by: internal time
                                            -   List reduced from 4497 to 2 due to restriction <'sync_with_cpp'>
                                            -
                                            -
                                            -

                                            通常观察热点函数间的调用关系,和对应行的代码,就可以了解到问题代码在哪里。当我们做出性能修正后,再次进行性能分析(profiling)即可检查我们调优后的修正是否能够改善程序的性能。

                                            -
                                            -
                                            -
                                            -

                                            Python与C++混合代码的性能分析

                                            -
                                            -

                                            生成性能分析文件

                                            -

                                            C++的性能分析工具非常多。常见的包括gprof, valgrind, google-perftools。但是调试Python中使用的动态链接库与直接调试原始二进制相比增加了很多复杂度。幸而Python的一个第三方库yep提供了方便的和google-perftools交互的方法。于是这里使用yep进行Python与C++混合代码的性能分析

                                            -

                                            使用yep前需要安装google-perftoolsyep包。ubuntu下安装命令为

                                            -
                                            apt update
                                            -apt install libgoogle-perftools-dev
                                            -pip install yep
                                            -
                                            -
                                            -

                                            安装完毕后,我们可以通过

                                            -
                                            python -m yep -v main.py
                                            -
                                            -
                                            -

                                            生成性能分析文件。生成的性能分析文件为main.py.prof

                                            -

                                            命令行中的-v指定在生成性能分析文件之后,在命令行显示分析结果。我们可以在命令行中简单的看一下生成效果。因为C++与Python不同,编译时可能会去掉调试信息,运行时也可能因为多线程产生混乱不可读的性能分析结果。为了生成更可读的性能分析结果,可以采取下面几点措施:

                                            -
                                              -
                                            1. 编译时指定-g生成调试信息。使用cmake的话,可以将CMAKE_BUILD_TYPE指定为RelWithDebInfo
                                            2. -
                                            3. 编译时一定要开启优化。单纯的Debug编译性能会和-O2或者-O3有非常大的差别。Debug模式下的性能测试是没有意义的。
                                            4. -
                                            5. 运行性能分析的时候,先从单线程开始,再开启多线程,进而多机。毕竟单线程调试更容易。可以设置OMP_NUM_THREADS=1这个环境变量关闭openmp优化。
                                            6. -
                                            -
                                            -
                                            -

                                            查看性能分析文件

                                            -

                                            在运行完性能分析后,会生成性能分析结果文件。我们可以使用pprof来显示性能分析结果。注意,这里使用了用Go语言重构后的pprof,因为这个工具具有web服务界面,且展示效果更好。

                                            -

                                            安装pprof的命令和一般的Go程序是一样的,其命令如下:

                                            -
                                            go get github.com/google/pprof
                                            -
                                            -
                                            -

                                            进而我们可以使用如下命令开启一个HTTP服务:

                                            -
                                            pprof -http=0.0.0.0:3213 `which python`  ./main.py.prof
                                            -
                                            -
                                            -

                                            这行命令中,-http指开启HTTP服务。which python会产生当前Python二进制的完整路径,进而指定了Python可执行文件的路径。./main.py.prof输入了性能分析结果。

                                            -

                                            访问对应的网址,我们可以查看性能分析的结果。结果如下图所示:

                                            -

                                            result

                                            -
                                            -
                                            -

                                            寻找性能瓶颈

                                            -

                                            与寻找Python代码的性能瓶颈类似,寻找Python与C++混合代码的性能瓶颈也是要看tottimecumtime。而pprof展示的调用图也可以帮助我们发现性能中的问题。

                                            -

                                            例如下图中,

                                            -

                                            kernel_perf

                                            -

                                            在一次训练中,乘法和乘法梯度的计算占用2%-4%左右的计算时间。而MomentumOp占用了17%左右的计算时间。显然,MomentumOp的性能有问题。

                                            -

                                            pprof中,对于性能的关键路径都做出了红色标记。先检查关键路径的性能问题,再检查其他部分的性能问题,可以更有次序的完成性能的优化。

                                            -
                                            -
                                            - - -
                                            -
                                            -
                                            - - -
                                            - -
                                            -

                                            - © Copyright 2016, PaddlePaddle developers. - -

                                            -
                                            - Built with Sphinx using a theme provided by Read the Docs. - -
                                            - -
                                            -
                                            - -
                                            - -
                                            - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/howto/optimization/gpu_profiling_cn.html b/develop/doc_cn/howto/optimization/gpu_profiling_cn.html index 05d2b1aab0db47ac09a9a6b4d835a3a1bb7ba308..c3b8520dba15028308195fc98664c9d41a9d65cd 100644 --- a/develop/doc_cn/howto/optimization/gpu_profiling_cn.html +++ b/develop/doc_cn/howto/optimization/gpu_profiling_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                          • 开发标准
                                          • FAQ
                                              diff --git a/develop/doc_cn/howto/read_source.html b/develop/doc_cn/howto/read_source.html deleted file mode 100644 index a957661475b3763d484c3c41200890cb0aef0b39..0000000000000000000000000000000000000000 --- a/develop/doc_cn/howto/read_source.html +++ /dev/null @@ -1,342 +0,0 @@ - - - - - - - - - - - - - PaddlePaddle Fluid Source Code Overview — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                              - - - - -
                                              - - - - - - -
                                              -
                                              - - - - - - -
                                              - -
                                              -
                                              -
                                              -
                                              - -
                                              -

                                              PaddlePaddle Fluid Source Code Overview

                                              -

                                              Examples: https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/tests/book

                                              -

                                              Core: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework

                                              -

                                              Operator: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators

                                              -

                                              Memory: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory

                                              -

                                              Platform: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/platform

                                              -
                                              -
                                              -

                                              Compile Time

                                              -

                                              The following defines the NN. The definition goes into this protocol buffer.

                                              -
                                              x = fluid.layers.data(name='x', shape=[13], dtype='float32')
                                              -y = fluid.layers.data(name='y', shape=[1], dtype='float32')
                                              -
                                              -y_predict = fluid.layers.fc(input=x, size=1, act=None)
                                              -cost = fluid.layers.square_error_cost(input=y_predict, label=y)
                                              -avg_cost = fluid.layers.mean(x=cost)
                                              -
                                              -sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
                                              -sgd_optimizer.minimize(avg_cost)
                                              -
                                              -
                                              - -
                                              -
                                              -

                                              Run Time

                                              -

                                              The following evaluates the NN. Instantiates all the variables, operators.

                                              -
                                              place = fluid.CPUPlace()
                                              -feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
                                              -exe = fluid.Executor(place)
                                              -
                                              -# Allocate memory. Initialize Parameter.
                                              -exe.run(fluid.default_startup_program())
                                              -
                                              -# Allocate memory. Do computation.
                                              -exe.run(fluid.default_main_program(),
                                              -        feed=feeder.feed(data),
                                              -        fetch_list=[avg_cost])
                                              -
                                              -
                                              - -
                                              - - -
                                              -
                                              -
                                              - - -
                                              - -
                                              -

                                              - © Copyright 2016, PaddlePaddle developers. - -

                                              -
                                              - Built with Sphinx using a theme provided by Read the Docs. - -
                                              - -
                                              -
                                              - -
                                              - -
                                              - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/howto/rnn/hierarchical_layer_cn.html b/develop/doc_cn/howto/rnn/hierarchical_layer_cn.html index a6117bdc10f55f0a8438187aaf836e3a26473bd9..10500f8ade89301d489de0d6b2ccd471557a90a7 100644 --- a/develop/doc_cn/howto/rnn/hierarchical_layer_cn.html +++ b/develop/doc_cn/howto/rnn/hierarchical_layer_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                            • 开发标准
                                            • FAQ
                                                diff --git a/develop/doc_cn/howto/rnn/hrnn_rnn_api_compare_cn.html b/develop/doc_cn/howto/rnn/hrnn_rnn_api_compare_cn.html index bf7450d9f7cfaf591da0d8754f50448d3fcf8d14..c395ce4fcbe8d0445f955bb428f6b7521f0d2e04 100644 --- a/develop/doc_cn/howto/rnn/hrnn_rnn_api_compare_cn.html +++ b/develop/doc_cn/howto/rnn/hrnn_rnn_api_compare_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                              • 开发标准
                                              • FAQ
                                                  @@ -219,10 +220,76 @@ var _hmt = _hmt || [];
                                                  • 本例中的原始数据一共有10个样本。每个样本由两部分组成,一个label(此处都为2)和一个已经分词后的句子。这个数据也被单层RNN网络直接使用。
                                                  +
                                                  2  	酒店 有 很 舒适 的 床垫 子 , 床上用品 也 应该 是 一人 一 换 , 感觉 很 利落 对 卫生 很 放心 呀 。
                                                  +2  	很 温馨 , 也 挺 干净 的 * 地段 不错 , 出来 就 有 全家 , 离 地铁站 也 近 , 交通 很方便 * 就是 都 不 给 刷牙 的 杯子 啊 , 就 第一天 给 了 一次性杯子 *
                                                  +2  	位置 方便 , 强烈推荐 , 十一 出去玩 的 时候 选 的 , 对面 就是 华润万家 , 周围 吃饭 的 也 不少 。
                                                  +2  	交通便利 , 吃 很 便利 , 乾 浄 、 安静 , 商务 房 有 电脑 、 上网 快 , 价格 可以 , 就 早餐 不 好吃 。 整体 是 不错 的 。 適 合 出差 來 住 。
                                                  +2  	本来 准备 住 两 晚 , 第 2 天 一早 居然 停电 , 且 无 通知 , 只有 口头 道歉 。 总体来说 性价比 尚可 , 房间 较 新 , 还是 推荐 .
                                                  +2  	这个 酒店 去过 很多 次 了 , 选择 的 主要原因 是 离 客户 最 便宜 相对 又 近 的 酒店
                                                  +2  	挺好 的 汉庭 , 前台 服务 很 热情 , 卫生 很 整洁 , 房间 安静 , 水温 适中 , 挺好 !
                                                  +2  	HowardJohnson 的 品质 , 服务 相当 好 的 一 家 五星级 。 房间 不错 、 泳池 不错 、 楼层 安排 很 合理 。 还有 就是 地理位置 , 简直 一 流 。 就 在 天一阁 、 月湖 旁边 , 离 天一广场 也 不远 。 下次 来 宁波 还会 住 。
                                                  +2  	酒店 很干净 , 很安静 , 很 温馨 , 服务员 服务 好 , 各方面 都 不错 *
                                                  +2  	挺好 的 , 就是 没 窗户 , 不过 对 得 起 这 价格
                                                  +
                                                  +
                                                  • 双层序列数据一共有4个样本。 每个样本间用空行分开,整体数据和原始数据完全一样。但于双层序列的LSTM来说,第一个样本同时encode两条数据成两个向量。这四条数据同时处理的句子数量为[2, 3, 2, 3]
                                                  +
                                                  2  	酒店 有 很 舒适 的 床垫 子 , 床上用品 也 应该 是 一人 一 换 , 感觉 很 利落 对 卫生 很 放心 呀 。
                                                  +2  	很 温馨 , 也 挺 干净 的 * 地段 不错 , 出来 就 有 全家 , 离 地铁站 也 近 , 交通 很方便 * 就是 都 不 给 刷牙 的 杯子 啊 , 就 第一天 给 了 一次性杯子 *
                                                  +
                                                  +2  	位置 方便 , 强烈推荐 , 十一 出去玩 的 时候 选 的 , 对面 就是 华润万家 , 周围 吃饭 的 也 不少 。
                                                  +2  	交通便利 , 吃 很 便利 , 乾 浄 、 安静 , 商务 房 有 电脑 、 上网 快 , 价格 可以 , 就 早餐 不 好吃 。 整体 是 不错 的 。 適 合 出差 來 住 。
                                                  +2  	本来 准备 住 两 晚 , 第 2 天 一早 居然 停电 , 且 无 通知 , 只有 口头 道歉 。 总体来说 性价比 尚可 , 房间 较 新 , 还是 推荐 .
                                                  +
                                                  +2  	这个 酒店 去过 很多 次 了 , 选择 的 主要原因 是 离 客户 最 便宜 相对 又 近 的 酒店
                                                  +2  	挺好 的 汉庭 , 前台 服务 很 热情 , 卫生 很 整洁 , 房间 安静 , 水温 适中 , 挺好 !
                                                  +
                                                  +2  	HowardJohnson 的 品质 , 服务 相当 好 的 一 家 五星级 。 房间 不错 、 泳池 不错 、 楼层 安排 很 合理 。 还有 就是 地理位置 , 简直 一 流 。 就 在 天一阁 、 月湖 旁边 , 离 天一广场 也 不远 。 下次 来 宁波 还会 住 。
                                                  +2  	酒店 很干净 , 很安静 , 很 温馨 , 服务员 服务 好 , 各方面 都 不错 *
                                                  +2  	挺好 的 , 就是 没 窗户 , 不过 对 得 起 这 价格
                                                  +
                                                  +

                                                  其次,对于两种不同的输入数据类型,不同DataProvider对比如下(sequenceGen.py):

                                                  +
                                                   1
                                                  + 2
                                                  + 3
                                                  + 4
                                                  + 5
                                                  + 6
                                                  + 7
                                                  + 8
                                                  + 9
                                                  +10
                                                  +11
                                                  +12
                                                  +13
                                                  +14
                                                  +15
                                                  +16
                                                  +17
                                                  +18
                                                  +19
                                                      settings.word_dict = dict_file
                                                  +    settings.input_types = [
                                                  +        integer_value_sequence(len(settings.word_dict)), integer_value(3)
                                                  +    ]
                                                  +    settings.logger.info('dict len : %d' % (len(settings.word_dict)))
                                                  +
                                                  +
                                                  +@provider(init_hook=hook, should_shuffle=False)
                                                  +def process(settings, file_name):
                                                  +    with open(file_name, 'r') as fdata:
                                                  +        for line in fdata:
                                                  +            label, comment = line.strip().split('\t')
                                                  +            label = int(''.join(label.split()))
                                                  +            words = comment.split()
                                                  +            words = [
                                                  +                settings.word_dict[w] for w in words if w in settings.word_dict
                                                  +            ]
                                                  +            yield words, label
                                                  +
                                                  +
                                                  +
                                                  • 这是普通的单层时间序列的DataProvider代码,其说明如下:
                                                    • DataProvider共返回两个数据,分别是words和label。即上述代码中的第19行。
                                                        @@ -233,6 +300,65 @@ var _hmt = _hmt || [];
                                                    +
                                                     1
                                                    + 2
                                                    + 3
                                                    + 4
                                                    + 5
                                                    + 6
                                                    + 7
                                                    + 8
                                                    + 9
                                                    +10
                                                    +11
                                                    +12
                                                    +13
                                                    +14
                                                    +15
                                                    +16
                                                    +17
                                                    +18
                                                    +19
                                                    +20
                                                    +21
                                                    +22
                                                    +23
                                                    +24
                                                    +25
                                                    +26
                                                    +27
                                                    +28
                                                    +29
                                                    def hook2(settings, dict_file, **kwargs):
                                                    +    settings.word_dict = dict_file
                                                    +    settings.input_types = [
                                                    +        integer_value_sub_sequence(len(settings.word_dict)),
                                                    +        integer_value_sequence(3)
                                                    +    ]
                                                    +    settings.logger.info('dict len : %d' % (len(settings.word_dict)))
                                                    +
                                                    +
                                                    +@provider(init_hook=hook2, should_shuffle=False)
                                                    +def process2(settings, file_name):
                                                    +    with open(file_name) as fdata:
                                                    +        labels = []
                                                    +        sentences = []
                                                    +        for line in fdata:
                                                    +            if (len(line)) > 1:
                                                    +                label, comment = line.strip().split('\t')
                                                    +                label = int(''.join(label.split()))
                                                    +                words = comment.split()
                                                    +                words = [
                                                    +                    settings.word_dict[w] for w in words
                                                    +                    if w in settings.word_dict
                                                    +                ]
                                                    +                labels.append(label)
                                                    +                sentences.append(words)
                                                    +            else:
                                                    +                yield sentences, labels
                                                    +                labels = []
                                                    +                sentences = []
                                                    +
                                                    +
                                                    • 对于同样的数据,双层时间序列的DataProvider的代码。其说明如下:
                                                      • DataProvider共返回两组数据,分别是sentences和labels。即在双层序列的原始数据中,每一组内的所有句子和labels
                                                      • @@ -245,6 +371,57 @@ var _hmt = _hmt || [];

                                                        模型配置的模型配置

                                                        首先,我们看一下单层RNN的配置。代码中9-15行(高亮部分)即为单层RNN序列的使用代码。这里使用了PaddlePaddle预定义好的RNN处理函数。在这个函数中,RNN对于每一个时间步通过了一个LSTM网络。

                                                        +
                                                         1
                                                        + 2
                                                        + 3
                                                        + 4
                                                        + 5
                                                        + 6
                                                        + 7
                                                        + 8
                                                        + 9
                                                        +10
                                                        +11
                                                        +12
                                                        +13
                                                        +14
                                                        +15
                                                        +16
                                                        +17
                                                        +18
                                                        +19
                                                        +20
                                                        +21
                                                        +22
                                                        +23
                                                        +24
                                                        +25
                                                        data = data_layer(name="word", size=dict_dim)
                                                        +
                                                        +emb = embedding_layer(input=data, size=word_dim)
                                                        +
                                                        +# (lstm_input + lstm) is equal to lstmemory 
                                                        +with mixed_layer(size=hidden_dim * 4) as lstm_input:
                                                        +    lstm_input += full_matrix_projection(input=emb)
                                                        +
                                                        +lstm = lstmemory_group(
                                                        +    input=lstm_input,
                                                        +    size=hidden_dim,
                                                        +    act=TanhActivation(),
                                                        +    gate_act=SigmoidActivation(),
                                                        +    state_act=TanhActivation())
                                                        +
                                                        +lstm_last = last_seq(input=lstm)
                                                        +
                                                        +with mixed_layer(
                                                        +        size=label_dim, act=SoftmaxActivation(), bias_attr=True) as output:
                                                        +    output += full_matrix_projection(input=lstm_last)
                                                        +
                                                        +outputs(
                                                        +    classification_cost(
                                                        +        input=output, label=data_layer(
                                                        +            name="label", size=1)))
                                                        +
                                                        +

                                                        其次,我们看一下语义相同的双层RNN的网络配置:

                                                        • PaddlePaddle中的许多layer并不在意输入是否是时间序列,例如embedding_layer。在这些layer中,所有的操作都是针对每一个时间步来进行的。
                                                        • @@ -256,6 +433,61 @@ var _hmt = _hmt || [];
                                                        • 与单层RNN的配置类似,我们只需要使用LSTM encode成的最后一个向量。所以对recurrent_group进行了last_seq操作。但和单层RNN不同,我们是对每一个子序列取最后一个元素,因此agg_level=AggregateLevel.TO_SEQUENCE
                                                        • 至此,lstm_last便和单层RNN配置中的lstm_last具有相同的结果了。
                                                        +
                                                         1
                                                        + 2
                                                        + 3
                                                        + 4
                                                        + 5
                                                        + 6
                                                        + 7
                                                        + 8
                                                        + 9
                                                        +10
                                                        +11
                                                        +12
                                                        +13
                                                        +14
                                                        +15
                                                        +16
                                                        +17
                                                        +18
                                                        +19
                                                        +20
                                                        +21
                                                        +22
                                                        +23
                                                        +24
                                                        +25
                                                        +26
                                                        +27
                                                        data = data_layer(name="word", size=dict_dim)
                                                        +
                                                        +emb_group = embedding_layer(input=data, size=word_dim)
                                                        +
                                                        +
                                                        +# (lstm_input + lstm) is equal to lstmemory 
                                                        +def lstm_group(lstm_group_input):
                                                        +    with mixed_layer(size=hidden_dim * 4) as group_input:
                                                        +        group_input += full_matrix_projection(input=lstm_group_input)
                                                        +
                                                        +    lstm_output = lstmemory_group(
                                                        +        input=group_input,
                                                        +        name="lstm_group",
                                                        +        size=hidden_dim,
                                                        +        act=TanhActivation(),
                                                        +        gate_act=SigmoidActivation(),
                                                        +        state_act=TanhActivation())
                                                        +    return lstm_output
                                                        +
                                                        +
                                                        +lstm_nest_group = recurrent_group(
                                                        +    input=SubsequenceInput(emb_group), step=lstm_group, name="lstm_nest_group")
                                                        +# hasSubseq ->(seqlastins) seq
                                                        +lstm_last = last_seq(
                                                        +    input=lstm_nest_group, agg_level=AggregateLevel.TO_SEQUENCE)
                                                        +
                                                        +# seq ->(expand) hasSubseq
                                                        +
                                                        +
                                                        @@ -271,6 +503,21 @@ var _hmt = _hmt || [];
                                                        • 单层RNN:过了一个很简单的recurrent_group。每一个时间步,当前的输入y和上一个时间步的输出rnn_state做了一个全链接。
                                                        +
                                                        def step(y):
                                                        +    mem = memory(name="rnn_state", size=hidden_dim)
                                                        +    out = fc_layer(input=[y, mem],
                                                        +                    size=hidden_dim,
                                                        +                    act=TanhActivation(),
                                                        +                    bias_attr=True,
                                                        +                    name="rnn_state")
                                                        +    return out
                                                        +
                                                        +out = recurrent_group(
                                                        +    name="rnn",
                                                        +    step=step,
                                                        +    input=emb)
                                                        +
                                                        +
                                                        • 双层RNN,外层memory是一个元素:
                                                          • 内层inner_step的recurrent_group和单层序列的几乎一样。除了boot_layer=outer_mem,表示将外层的outer_mem作为内层memory的初始状态。外层outer_step中,outer_mem是一个子句的最后一个向量,即整个双层group是将前一个子句的最后一个向量,作为下一个子句memory的初始状态。
                                                          • @@ -278,6 +525,36 @@ var _hmt = _hmt || [];
                                                        +
                                                        def outer_step(x):
                                                        +    outer_mem = memory(name="outer_rnn_state", size=hidden_dim)
                                                        +    def inner_step(y):
                                                        +        inner_mem = memory(name="inner_rnn_state",
                                                        +                           size=hidden_dim,
                                                        +                           boot_layer=outer_mem)
                                                        +        out = fc_layer(input=[y, inner_mem],
                                                        +                        size=hidden_dim,
                                                        +                        act=TanhActivation(),
                                                        +                        bias_attr=True,
                                                        +                        name="inner_rnn_state")
                                                        +        return out
                                                        +
                                                        +    inner_rnn_output = recurrent_group(
                                                        +        step=inner_step,
                                                        +        name="inner",
                                                        +        input=x)
                                                        +    last = last_seq(input=inner_rnn_output, name="outer_rnn_state")
                                                        +
                                                        +    # "return last" won't work, because recurrent_group only support the input 
                                                        +    # sequence type is same as return sequence type.
                                                        +    return inner_rnn_output
                                                        +
                                                        +out = recurrent_group(
                                                        +    name="outer",
                                                        +    step=outer_step,
                                                        +    input=SubsequenceInput(emb))
                                                        +
                                                        +
                                                        +

                                                        警告

                                                        PaddlePaddle目前只支持在每个时间步中,Memory的时间序列长度一致的情况。

                                                        @@ -300,9 +577,127 @@ var _hmt = _hmt || [];
                                                        • 单层RNN:
                                                        +
                                                         1
                                                        + 2
                                                        + 3
                                                        + 4
                                                        + 5
                                                        + 6
                                                        + 7
                                                        + 8
                                                        + 9
                                                        +10
                                                        +11
                                                        +12
                                                        +13
                                                        +14
                                                        +15
                                                        +16
                                                        +17
                                                        +18
                                                            def calrnn(y):
                                                        +        mem = memory(name='rnn_state_' + y.name, size=hidden_dim)
                                                        +        out = fc_layer(
                                                        +            input=[y, mem],
                                                        +            size=hidden_dim,
                                                        +            act=TanhActivation(),
                                                        +            bias_attr=True,
                                                        +            name='rnn_state_' + y.name)
                                                        +        return out
                                                        +
                                                        +    encoder1 = calrnn(x1)
                                                        +    encoder2 = calrnn(x2)
                                                        +    return [encoder1, encoder2]
                                                        +
                                                        +
                                                        +encoder1_rep, encoder2_rep = recurrent_group(
                                                        +    name="stepout", step=step, input=[emb1, emb2])
                                                        +
                                                        +
                                                        +
                                                        • 双层RNN:
                                                        +
                                                         1
                                                        + 2
                                                        + 3
                                                        + 4
                                                        + 5
                                                        + 6
                                                        + 7
                                                        + 8
                                                        + 9
                                                        +10
                                                        +11
                                                        +12
                                                        +13
                                                        +14
                                                        +15
                                                        +16
                                                        +17
                                                        +18
                                                        +19
                                                        +20
                                                        +21
                                                        +22
                                                        +23
                                                        +24
                                                        +25
                                                        +26
                                                        +27
                                                        +28
                                                        +29
                                                        +30
                                                        +31
                                                        +32
                                                        +33
                                                        +34
                                                        +35
                                                        +36
                                                        +37
                                                        +38
                                                        +39
                                                        +40
                                                        
                                                        +    def inner_step(ipt):
                                                        +        index[0] += 1
                                                        +        i = index[0]
                                                        +        outer_mem = memory(name="outer_rnn_state_%d" % i, size=hidden_dim)
                                                        +
                                                        +        def inner_step_impl(y):
                                                        +            inner_mem = memory(
                                                        +                name="inner_rnn_state_" + y.name,
                                                        +                size=hidden_dim,
                                                        +                boot_layer=outer_mem)
                                                        +            out = fc_layer(
                                                        +                input=[y, inner_mem],
                                                        +                size=hidden_dim,
                                                        +                act=TanhActivation(),
                                                        +                bias_attr=True,
                                                        +                name='inner_rnn_state_' + y.name)
                                                        +            return out
                                                        +
                                                        +        encoder = recurrent_group(
                                                        +            step=inner_step_impl, name='inner_%d' % i, input=ipt)
                                                        +        last = last_seq(name="outer_rnn_state_%d" % i, input=encoder)
                                                        +        return encoder, last
                                                        +
                                                        +    encoder1, sentence_last_state1 = inner_step(ipt=x1)
                                                        +    encoder2, sentence_last_state2 = inner_step(ipt=x2)
                                                        +
                                                        +    encoder1_expand = expand_layer(
                                                        +        input=sentence_last_state1, expand_as=encoder2)
                                                        +
                                                        +    return [encoder1_expand, encoder2]
                                                        +
                                                        +
                                                        +encoder1_rep, encoder2_rep = recurrent_group(
                                                        +    name="outer",
                                                        +    step=outer_step,
                                                        +    input=[SubsequenceInput(emb1), SubsequenceInput(emb2)],
                                                        +    targetInlink=emb2)
                                                        +
                                                        +encoder1_last = last_seq(input=encoder1_rep)
                                                        +
                                                        +

                                                        在上面代码中,单层和双层序列的使用和示例2中的示例类似,区别是同时处理了两个输入。而对于双层序列,两个输入的子序列长度也并不相同。但是,我们使用了targetInlink参数设置了外层recurrent_group的输出格式。所以外层输出的序列形状,和emb2的序列形状一致。

                                                        diff --git a/develop/doc_cn/howto/rnn/index_cn.html b/develop/doc_cn/howto/rnn/index_cn.html index b124a6a09b27d52a7853524a15cd58993aba59da..e373cd28ec5ea0bf5c5397fdba432762815ddbf7 100644 --- a/develop/doc_cn/howto/rnn/index_cn.html +++ b/develop/doc_cn/howto/rnn/index_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                                      • 开发标准
                                                      • FAQ
                                                          diff --git a/develop/doc_cn/howto/rnn/recurrent_group_cn.html b/develop/doc_cn/howto/rnn/recurrent_group_cn.html index f36a19826cfc0be0b6101fcb674e6a2d35f277c7..369a74a3d627d7f894352515e9115444ab21e6e7 100644 --- a/develop/doc_cn/howto/rnn/recurrent_group_cn.html +++ b/develop/doc_cn/howto/rnn/recurrent_group_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                                        • 开发标准
                                                        • FAQ
                                                            diff --git a/develop/doc_cn/howto/rnn/rnn_config_cn.html b/develop/doc_cn/howto/rnn/rnn_config_cn.html index 3e1893b0823047288be6ca140c3374a66d22b079..d321025d3db2e567e40e39fcafb5ccc625fe63bf 100644 --- a/develop/doc_cn/howto/rnn/rnn_config_cn.html +++ b/develop/doc_cn/howto/rnn/rnn_config_cn.html @@ -144,6 +144,7 @@ var _hmt = _hmt || [];
                                                          • 开发标准
                                                          • FAQ
                                                              diff --git a/develop/doc_cn/index_cn.html b/develop/doc_cn/index_cn.html index ac742b052915f050331415f0a4a8c3cfec4e60c6..312d46458d7a9aef6a0e9693aaa463792bdca906 100644 --- a/develop/doc_cn/index_cn.html +++ b/develop/doc_cn/index_cn.html @@ -142,6 +142,7 @@ var _hmt = _hmt || [];
                                                            • 开发标准
                                                            • FAQ
                                                                diff --git a/develop/doc_cn/mobile/cross_compiling_for_android_cn.html b/develop/doc_cn/mobile/cross_compiling_for_android_cn.html deleted file mode 100644 index ba4ee95e9e9e6205ea74b81302e86d63c8ec9112..0000000000000000000000000000000000000000 --- a/develop/doc_cn/mobile/cross_compiling_for_android_cn.html +++ /dev/null @@ -1,442 +0,0 @@ - - - - - - - - - - - - - Android平台编译指南 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                                                - - - - -
                                                                - - - - - - -
                                                                -
                                                                - - - - - - -
                                                                - -
                                                                -
                                                                -
                                                                -
                                                                - -
                                                                -

                                                                Android平台编译指南

                                                                -

                                                                用户可通过如下两种方式,交叉编译Android平台上适用的PaddlePaddle库:

                                                                - -
                                                                -

                                                                基于Docker容器的编译方式

                                                                -

                                                                Docker能在所有主要操作系统(包括Linux,Mac OS X和Windows)上运行,因此,使用基于Docker容器的编译方式,用户可在自己熟悉的开发平台上编译Android平台上适用的PaddlePaddle库。

                                                                -
                                                                -

                                                                构建PaddlePaddle的Android开发镜像

                                                                -

                                                                我们把PaddlePaddle的交叉编译环境打包成一个镜像,称为开发镜像,里面涵盖了交叉编译Android版PaddlePaddle库需要的所有编译工具。

                                                                -
                                                                $ git clone https://github.com/PaddlePaddle/Paddle.git
                                                                -$ cd Paddle
                                                                -$ docker build -t username/paddle-android:dev . -f Dockerfile.android
                                                                -
                                                                -
                                                                -

                                                                用户也可以使用PaddlePaddle提供的官方开发镜像:

                                                                -
                                                                $ docker pull paddlepaddle/paddle:latest-dev-android
                                                                -
                                                                -
                                                                -

                                                                对于国内用户,我们提供了加速访问的镜像源:

                                                                -
                                                                $ docker pull docker.paddlepaddlehub.com/paddle:latest-dev-android
                                                                -
                                                                -
                                                                -
                                                                -
                                                                -

                                                                编译PaddlePaddle C-API库

                                                                -

                                                                构建好开发镜像后,即可使用开发镜像来编译Android版PaddlePaddle C-API库。 -Android的Docker开发镜像向用户提供两个可配置的参数:

                                                                - -- - - - - - - - - - - - - - - - - - - - - - - -
                                                                ArgumentOptional ValuesDefault
                                                                ANDROID_ABIarmeabi-v7a, arm64-v8aarmeabi-v7a
                                                                ANDROID_API>= 1621
                                                                  -
                                                                • 编译armeabi-v7aAndroid API 21的PaddlePaddle库
                                                                • -
                                                                -
                                                                $ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=armeabi-v7a" -e "ANDROID_API=21" username/paddle-android:dev
                                                                -
                                                                -
                                                                -
                                                                  -
                                                                • 编译arm64-v8aAndroid API 21的PaddlePaddle库
                                                                • -
                                                                -
                                                                $ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=arm64-v8a" -e "ANDROID_API=21" username/paddle-android:dev
                                                                -
                                                                -
                                                                -

                                                                执行上述docker run命令时,容器默认执行paddle/scripts/docker/build_android.sh脚本。该脚本中记录了交叉编译Android版PaddlePaddle库常用的CMake配置,并且会根据ANDROID_ABIANDROID_API自动构建独立工具链、进行编译和安装。由于arm64架构要求Android API不小于21。因此当ANDROID_ABI=arm64-v8aANDROID_API<21时,Docker容器中将默认使用Android API 21的编译工具链。用户可以参考下文配置交叉编译参数章节,根据个人的需求修改定制Docker容器所执行的脚本。编译安装结束之后,PaddlePaddle的C-API库将被安装到$PWD/install_android目录,所依赖的第三方库同时也被安装到$PWD/install_android/third_party目录。

                                                                -
                                                                -
                                                                -
                                                                -

                                                                基于Linux交叉编译环境的编译方式

                                                                -

                                                                本文档将以Linux x86-64平台为例,介绍交叉编译Android平台上适用的PaddlePaddle库的方法和步骤。

                                                                -
                                                                -

                                                                准备交叉编译环境

                                                                -

                                                                从源码交叉编译PaddlePaddle,用户需要提前准备好交叉编译环境。Android平台上使用的C/C++交叉编译工具链为Android NDK,用户可自行前往下载预编译好的版本,也可通过以下命令获取:

                                                                -
                                                                wget -q https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip
                                                                -unzip -q android-ndk-r14b-linux-x86_64.zip
                                                                -
                                                                -
                                                                -

                                                                Android NDK中包含了所有Android API级别、所有架构(arm/arm64/x86/mips)需要用到的编译工具和系统库。用户可根据自己的编译目标架构、所需支持的最低Android API级别,构建独立工具链

                                                                -
                                                                  -
                                                                • 构建armeabi-v7aAndroid API 21的独立工具链:
                                                                • -
                                                                -
                                                                your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \
                                                                -        --arch=arm --platform=android-21 --install-dir=your/path/to/arm_standalone_toolchain
                                                                -
                                                                -
                                                                -

                                                                此命令将在your/path/to/arm_standalone_toolchain目录生成一套独立编译工具链,面向架构为32位ARM架构,支持的最小的Android API级别为21,支持编译器arm-linux-androideabi-gcc (GCC) 4.9clang 3.8

                                                                -
                                                                  -
                                                                • 构建arm64-v8aAndroid API 21的独立工具链:
                                                                • -
                                                                -
                                                                your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \
                                                                -        --arch=arm64 --platform=android-21 --install-dir=your/path/to/arm64_standalone_toolchain
                                                                -
                                                                -
                                                                -

                                                                此命令将在your/path/to/arm64_standalone_toolchain目录生成一套独立编译工具链,面向架构为64位ARM64架构,支持的最小Android API级别为21,支持编译器arm-linux-androideabi-gcc (GCC) 4.9clang 3.8

                                                                -
                                                                -
                                                                -

                                                                配置交叉编译参数

                                                                -

                                                                CMake系统对交叉编译提供了支持cmake-toolchains。为了简化cmake配置,PaddlePaddle为交叉编译提供了工具链配置文档cmake/cross_compiling/android.cmake,以提供一些默认的编译器和编译参数相关配置。注意,从CMake 3.7版本开始,CMake官方对Android平台的交叉编译提供了通用的支持。PaddlePaddle若检测到用户使用的CMake版本不低于3.7时,将会将用户传进来的配置参数传递CMake系统,交由CMake系统本身来处理。有关参数配置的详细说明见cmake-toolchains

                                                                -

                                                                交叉编译Android版本的PaddlePaddle库时,有一些必须配置的参数:

                                                                -
                                                                  -
                                                                • CMAKE_SYSTEM_NAME,CMake编译的目标平台,必须设置为Android。在设置CMAKE_SYSTEM_NAME=Android后,PaddlePaddle的CMake系统才认为是在交叉编译Android系统的版本,并自动编译PaddlePaddle所需的所有第三方库。此外,还会强制设置一些PaddlePaddle参数的值(WITH_GPU=OFFWITH_AVX=OFFWITH_PYTHON=OFFWITH_RDMA=OFFWITH_MKL=OFFWITH_GOLANG=OFF)。
                                                                • -
                                                                • WITH_C_API,必须设置为ON。在Android平台上只支持使用C-API来预测。
                                                                • -
                                                                • WITH_SWIG_PY,必须设置为OFF。在Android平台上不支持通过swig调用来训练或者预测。
                                                                • -
                                                                -

                                                                Android平台可选配置参数:

                                                                -
                                                                  -
                                                                • ANDROID_STANDALONE_TOOLCHAIN,独立工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动推导和设置需要使用的交叉编译器、sysroot、以及Android API级别;否则,用户需要在cmake时手动设置这些值。无默认值。
                                                                • -
                                                                • ANDROID_TOOLCHAIN,目标工具链。可设置gcc/clang,默认值为clang
                                                                    -
                                                                  • CMake 3.7以上,将会始终使用clang工具链;CMake 3.7以下,可设置ANDROID_TOOLCHAIN=gcc以使用gcc工具链。
                                                                  • -
                                                                  • Android官方提供的clang编译器要求系统支持GLIBC 2.15以上。
                                                                  • -
                                                                  -
                                                                • -
                                                                • ANDROID_ABI,目标架构ABI。目前支持armeabi-v7aarm64-v8a,默认值为armeabi-v7a
                                                                • -
                                                                • ANDROID_NATIVE_API_LEVEL,工具链的Android API级别。若没有显式设置,PaddlePaddle将根据ANDROID_STANDALONE_TOOLCHAIN的值自动推导得到。
                                                                • -
                                                                • ANROID_ARM_MODE,是否使用ARM模式。
                                                                    -
                                                                  • ANDROID_ABI=armeabi-v7a时,可设置ON/OFF,默认值为ON
                                                                  • -
                                                                  • ANDROID_ABI=arm64-v8a时,不需要设置。
                                                                  • -
                                                                  -
                                                                • -
                                                                • ANDROID_ARM_NEON,是否使用NEON指令。
                                                                    -
                                                                  • ANDROID_ABI=armeabi-v7a时,可设置ON/OFF,默认值为ON
                                                                  • -
                                                                  • ANDROID_ABI=arm64-v8a时,不需要设置。
                                                                  • -
                                                                  -
                                                                • -
                                                                -

                                                                其他配置参数:

                                                                -
                                                                  -
                                                                • USE_EIGEN_FOR_BLAS,是否使用Eigen库进行矩阵计算。可设置ON/OFF,默认值为OFF
                                                                • -
                                                                • HOST_C/CXX_COMPILER,宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量CC/CXX的值;若环境变量CC/CXX没有设置,则设置成cc/c++编译器。
                                                                • -
                                                                -

                                                                常用的cmake配置如下:

                                                                -
                                                                cmake -DCMAKE_SYSTEM_NAME=Android \
                                                                -      -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm_standalone_toolchain \
                                                                -      -DANDROID_ABI=armeabi-v7a \
                                                                -      -DANDROID_ARM_NEON=ON \
                                                                -      -DANDROID_ARM_MODE=ON \
                                                                -      -DUSE_EIGEN_FOR_BLAS=ON \
                                                                -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                                                                -      -DWITH_C_API=ON \
                                                                -      -DWITH_SWIG_PY=OFF \
                                                                -      ..
                                                                -
                                                                -
                                                                -
                                                                cmake -DCMAKE_SYSTEM_NAME=Android \
                                                                -      -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm64_standalone_toolchain \
                                                                -      -DANDROID_ABI=arm64-v8a \
                                                                -      -DUSE_EIGEN_FOR_BLAS=OFF \
                                                                -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                                                                -      -DWITH_C_API=ON \
                                                                -      -DWITH_SWIG_PY=OFF \
                                                                -      ..
                                                                -
                                                                -
                                                                -

                                                                用户还可根据自己的需求设置其他编译参数。

                                                                -
                                                                  -
                                                                • 设置CMAKE_BUILD_TYPEMinSizeRel,最小化生成的库的大小。
                                                                • -
                                                                • 设置CMAKE_BUILD_TYPERelease,获得最快的执行速度,
                                                                • -
                                                                • 用户亦可以通过手动设置CMAKE_C/CXX_FLAGS来影响PaddlePaddle的编译过程。
                                                                • -
                                                                -

                                                                性能TIPS,为了达到最快的计算速度,在CMake参数配置上,有以下建议:

                                                                -
                                                                  -
                                                                • 设置CMAKE_BUILD_TYPERelease
                                                                • -
                                                                • 使用clang编译工具链
                                                                • -
                                                                • armeabi-v7a时,设置USE_EIGEN_BLAS=ON,使用Eigen进行矩阵计算;arm64-v8a时,设置USE_EIGEN_FOR_BLAS=OFF,使用OpenBLAS进行矩阵计算
                                                                • -
                                                                -
                                                                -
                                                                -

                                                                编译和安装

                                                                -

                                                                CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。

                                                                -
                                                                make
                                                                -make install
                                                                -
                                                                -
                                                                -

                                                                注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用rm -rf命令删除third_party目录和build目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。

                                                                -

                                                                执行完安装命令后,your/path/to/install目录中会包含includelibthird_party目录,其中include中包含C-API的头文件,lib中包含若干个不同Android ABI的PaddlePaddle库,third_party中包含所依赖的所有第三方库。自此,PaddlePaddle的已经安装完成,用户可将your/path/to/install目录下的生成文件用于深度学习相关Android App中,调用方法见C-API文档。

                                                                -
                                                                -
                                                                -
                                                                - - -
                                                                -
                                                                -
                                                                - - -
                                                                - -
                                                                -

                                                                - © Copyright 2016, PaddlePaddle developers. - -

                                                                -
                                                                - Built with Sphinx using a theme provided by Read the Docs. - -
                                                                - -
                                                                -
                                                                - -
                                                                - -
                                                                - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/mobile/cross_compiling_for_ios_cn.html b/develop/doc_cn/mobile/cross_compiling_for_ios_cn.html deleted file mode 100644 index b3b1f32ae0ba5e720be14f5b1bb4c4b8f4e4fda4..0000000000000000000000000000000000000000 --- a/develop/doc_cn/mobile/cross_compiling_for_ios_cn.html +++ /dev/null @@ -1,377 +0,0 @@ - - - - - - - - - - - - - iOS平台编译指南 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                                                - - - - -
                                                                - - - - - - -
                                                                -
                                                                - - - - - - -
                                                                - -
                                                                -
                                                                -
                                                                -
                                                                - -
                                                                -

                                                                iOS平台编译指南

                                                                -

                                                                交叉编译iOS平台上适用的PaddlePaddle库,需要在MacOS系统上进行。本文的将介绍在MacOS上,从源码交叉编译iOS平台上适用的PaddlePaddle库。

                                                                -
                                                                -

                                                                准备交叉编译环境

                                                                -

                                                                Apple官方为iOS开发提供了完整的交叉编译工具和集成开发环境,用户从App Store下载安装Xcode即可。也可自行前往官网下载,Xcode。安装完成之后,可在命令行执行xcodebuild -version,判断是否安装成功。

                                                                -
                                                                $ xcodebuild -version
                                                                -Xcode 9.0
                                                                -Build version 9A235
                                                                -
                                                                -
                                                                -
                                                                -
                                                                -

                                                                配置交叉编译参数

                                                                -

                                                                PaddlePaddle为交叉编译提供了工具链配置文档cmake/cross_compiling/ios.cmake,以提供一些默认的编译器和编译参数配置。

                                                                -

                                                                交叉编译iOS版本的PaddlePaddle库时,有一些必须配置的参数:

                                                                -
                                                                  -
                                                                • CMAKE_SYSTEM_NAME,CMake编译的目标平台,必须设置为iOS。在设置CMAKE_SYSTEM_NAME=iOS后,PaddlePaddle的CMake系统会自动编译所有的第三方依赖库,并且强制设置一些PaddlePaddle参数的值(WITH_C_API=ONWITH_GPU=OFFWITH_AVX=OFFWITH_PYTHON=OFFWITH_RDMA=OFF)。
                                                                • -
                                                                • WITH_C_API,是否编译C-API预测库,必须设置为ON。在iOS平台上只支持使用C-API来预测。
                                                                • -
                                                                • WITH_SWIG_PY,必须设置为OFF。在iOS平台上不支持通过swig调用来训练或者预测。
                                                                • -
                                                                -

                                                                iOS平台可选配置参数:

                                                                -
                                                                  -
                                                                • IOS_PLATFORM,可设置为OS(默认值)或SIMULATOR

                                                                  -
                                                                    -
                                                                  • OS,构建目标为arm架构的iPhone或者iPad等物理设备。
                                                                  • -
                                                                  • SIMULATOR,构建目标为x86架构的模拟器平台。
                                                                  • -
                                                                  -
                                                                • -
                                                                • IOS_ARCH,目标架构。针对不同的IOS_PLATFORM,可设置的目标架构如下表所示,默认编译所有架构:

                                                                  - - - - - - - - - - - - - - - - - - - - - -
                                                                  IOS_PLATFORMIOS_ARCH
                                                                  OSarmv7, armv7s, arm64
                                                                  SIMULATORi386, x86_64
                                                                • -
                                                                • IOS_DEPLOYMENT_TARGET,最小的iOS部署版本,默认值为7.0

                                                                  -
                                                                • -
                                                                • IOS_ENABLE_BITCODE,是否使能Bitcode,可设置ON/OFF,默认值为ON

                                                                  -
                                                                • -
                                                                • IOS_USE_VECLIB_FOR_BLAS,是否使用vecLib框架进行BLAS矩阵计算,可设置ON/OFF,默认值为OFF

                                                                  -
                                                                • -
                                                                • IOS_DEVELOPMENT_ROOTDeveloper目录,可显式指定为/path/to/platform/Developer。若未显式指定,PaddlePaddle将会根据IOS_PLATFORM自动选择Xcode对应platformDeveloper目录。

                                                                  -
                                                                • -
                                                                • IOS_SDK_ROOT,所使用SDK的根目录,可显式指定为/path/to/platform/Developer/SDKs/SDK。若未显式指定,PaddlePaddle将会自动选择IOS_DEVELOPMENT_ROOT目录下最新的SDK版本。

                                                                  -
                                                                • -
                                                                -

                                                                其他配置参数:

                                                                -
                                                                  -
                                                                • USE_EIGEN_FOR_BLAS,是否使用Eigen库进行矩阵计算,在IOS_USE_VECLIB_FOR_BLAS=OFF时有效。可设置ON/OFF,默认值为OFF
                                                                • -
                                                                • HOST_C/CXX_COMPILER,宿主机的C/C++编译器。默认值为环境变量CC/CXX的值;若环境变量CC/CXX未设置,则使用cc/c++编译器。
                                                                • -
                                                                -

                                                                常用的cmake配置如下:

                                                                -
                                                                cmake -DCMAKE_SYSTEM_NAME=iOS \
                                                                -      -DIOS_PLATFORM=OS \
                                                                -      -DIOS_ARCH="armv7;arm64" \
                                                                -      -DIOS_ENABLE_BITCODE=ON \
                                                                -      -DIOS_USE_VECLIB_FOR_BLAS=ON \
                                                                -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                                                                -      -DWITH_C_API=ON \
                                                                -      -DWITH_TESTING=OFF \
                                                                -      -DWITH_SWIG_PY=OFF \
                                                                -      ..
                                                                -
                                                                -
                                                                -
                                                                cmake -DCMAKE_SYSTEM_NAME=iOS \
                                                                -      -DIOS_PLATFORM=SIMULATOR \
                                                                -      -DIOS_ARCH="x86_64" \
                                                                -      -DIOS_USE_VECLIB_FOR_BLAS=ON \
                                                                -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                                                                -      -DWITH_C_API=ON \
                                                                -      -DWITH_TESTING=OFF \
                                                                -      -DWITH_SWIG_PY=OFF \
                                                                -      ..
                                                                -
                                                                -
                                                                -

                                                                用户还可根据自己的需求设置其他编译参数。比如希望最小化生成库的大小,可以设置CMAKE_BUILD_TYPEMinSizeRel;若希望得到最快的执行速度,则可设置CMAKE_BUILD_TYPERelease。亦可以通过手动设置CMAKE_C/CXX_FLAGS来影响PaddlePaddle的编译过程。

                                                                -

                                                                性能TIPS,为了达到最快的计算速度,在CMake参数配置上,有以下建议:

                                                                -
                                                                  -
                                                                • 设置CMAKE_BUILD_TYPERelease
                                                                • -
                                                                • 设置IOS_USE_VECLIB_FOR_BLAS=ON,调用vecLib框架提供的BLAS函数进行矩阵计算。
                                                                • -
                                                                -
                                                                -
                                                                -

                                                                编译和安装

                                                                -

                                                                CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。

                                                                -
                                                                $ make
                                                                -$ make install
                                                                -
                                                                -
                                                                -

                                                                注意:如果你曾在源码目录下编译过其他平台的PaddlePaddle库,请先使用rm -rf命令删除third_party目录和build目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。

                                                                -

                                                                执行完安装命令后,your/path/to/install目录中会包含以下内容:

                                                                -
                                                                  -
                                                                • include目录,其中包含所有C-API的头文件
                                                                • -
                                                                • lib目录,其中包含PaddlePaddle的C-API静态库
                                                                • -
                                                                • third_party目录,其中包含所依赖的所有第三方库
                                                                • -
                                                                -

                                                                注意,如果PaddlePaddle库需要同时支持真机和模拟器,则需要分别编译真机和模拟器版本,然后使用lipo工具合并fat库。

                                                                -

                                                                自此,PaddlePaddle库已经安装完成,用户可将合成的fat库用于深度学习相关的iOS App中,调用方法见C-API文档。

                                                                -
                                                                -
                                                                - - -
                                                                -
                                                                -
                                                                - - -
                                                                - -
                                                                -

                                                                - © Copyright 2016, PaddlePaddle developers. - -

                                                                -
                                                                - Built with Sphinx using a theme provided by Read the Docs. - -
                                                                - -
                                                                -
                                                                - -
                                                                - -
                                                                - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/mobile/cross_compiling_for_raspberry_cn.html b/develop/doc_cn/mobile/cross_compiling_for_raspberry_cn.html deleted file mode 100644 index 140699b551967e9a33f8089b7513ca3f75488989..0000000000000000000000000000000000000000 --- a/develop/doc_cn/mobile/cross_compiling_for_raspberry_cn.html +++ /dev/null @@ -1,311 +0,0 @@ - - - - - - - - - - - - - Raspberry Pi平台编译指南 — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                                                - - - - -
                                                                - - - - - - -
                                                                -
                                                                - - - - - - -
                                                                - -
                                                                -
                                                                -
                                                                -
                                                                - -
                                                                -

                                                                Raspberry Pi平台编译指南

                                                                -

                                                                通常有两个方法来构建基于 Rasspberry Pi 的版本:

                                                                -
                                                                  -
                                                                1. 通过ssh等方式登录到Raspberry Pi系统上来构建。所需的开发工具和第三方库可以参考 /Dockerfile
                                                                2. -
                                                                3. 另一个方法是交叉编译。这篇文档介绍在 Linux/x64 上交叉编译Raspberry Pi平台上适用的PaddlePaddle的方法和步骤。
                                                                4. -
                                                                -
                                                                -

                                                                安装交叉编译器

                                                                -

                                                                克隆下面 Github repo

                                                                -
                                                                git clone https://github.com/raspberrypi/tools.git
                                                                -
                                                                -
                                                                -

                                                                即可在 ./tools/tree/master/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64 目录里找到交叉编译器 arm-linux-gnueabihf-gcc 4.8.3。运行该编译工具链需要一台 Linux x64 机器上以及 2.14版本以上的 glibc。

                                                                -
                                                                -
                                                                -

                                                                配置交叉编译参数

                                                                -

                                                                CMake支持交叉编译。PaddlePaddle for Raspberry Pi的配置信息在cmake/cross_compiling/raspberry_pi.cmake

                                                                -

                                                                交叉编译Raspberry Pi版本PaddlePaddle库时,有一些必须配置的参数:

                                                                -
                                                                  -
                                                                • CMAKE_SYSTEM_NAME:CMake编译的目标平台,必须配置为RPi。在设置CMAKE_SYSTEM_NAME=RPi后,PaddlePaddle的CMake系统才认为在是在交叉编译Raspberry Pi系统的版本,并自动编译宿主机版protoc可执行文件、目标机版protobuf库、以及目标机版OpenBLAS库。
                                                                • -
                                                                • RPI_TOOLCHAIN:编译工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动设置需要使用的交叉编译器;否则,用户需要在cmake时手动设置这些值。无默认值。
                                                                • -
                                                                • RPI_ARM_NEON:是否使用NEON指令。目前必须设置成ON,默认值为ON
                                                                • -
                                                                • HOST_C/CXX_COMPILER,宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量CC的值;若环境变量CC没有设置,则设置成cc编译器。
                                                                • -
                                                                -

                                                                一个常用的CMake配置如下:

                                                                -
                                                                cmake -DCMAKE_SYSTEM_NAME=RPi \
                                                                -      -DRPI_TOOLCHAIN=your/path/to/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64 \
                                                                -      -DRPI_ARM_NEON=ON \
                                                                -      -DCMAKE_INSTALL_PREFIX=your/path/to/install \
                                                                -      -DWITH_GPU=OFF \
                                                                -      -DWITH_C_API=ON \
                                                                -      -DWITH_PYTHON=OFF \
                                                                -      -DWITH_SWIG_PY=OFF \
                                                                -      ..
                                                                -
                                                                -
                                                                -

                                                                其中WITH_C_API=ON表示需要构建推理库。

                                                                -

                                                                用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置CMAKE_BUILD_TYPEMinSizeRel;若希望最快的执行速度,则可设置CMAKE_BUILD_TYPERelease

                                                                -
                                                                -
                                                                -

                                                                编译和安装

                                                                -

                                                                CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle。

                                                                -
                                                                make
                                                                -make install
                                                                -
                                                                -
                                                                -

                                                                注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用rm -rf命令删除third_party目录和build目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。

                                                                -

                                                                执行完安装命令后,your/path/to/install目录中会包含includelib目录,其中include中包含C-API的头文件,lib中包含一个Raspberry Pi版本的库。

                                                                -
                                                                -
                                                                - - -
                                                                -
                                                                -
                                                                - - -
                                                                - -
                                                                -

                                                                - © Copyright 2016, PaddlePaddle developers. - -

                                                                -
                                                                - Built with Sphinx using a theme provided by Read the Docs. - -
                                                                - -
                                                                -
                                                                - -
                                                                - -
                                                                - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/develop/doc_cn/objects.inv b/develop/doc_cn/objects.inv index 38801ebd29679cbfe21f2216798072176425a20d..5e8feb4e2833a4216fb1cc0a7c21cd313aeec694 100644 Binary files a/develop/doc_cn/objects.inv and b/develop/doc_cn/objects.inv differ diff --git a/develop/doc_cn/search.html b/develop/doc_cn/search.html index f14eb5e530bcb5f5af6e474c402cd300acf78ab2..04c7dcfb445aebaf80d7a50b7fc481fcd68aebf5 100644 --- a/develop/doc_cn/search.html +++ b/develop/doc_cn/search.html @@ -141,6 +141,7 @@ var _hmt = _hmt || [];
                                                              • 开发标准
                                                              • FAQ
                                                                  diff --git a/develop/doc_cn/searchindex.js b/develop/doc_cn/searchindex.js index db1564bd9bbdd6b63512c87959c5fb900017e5bf..0ef8401c804211dc27e0dad8079a50989fbd4a2e 100644 --- a/develop/doc_cn/searchindex.js +++ b/develop/doc_cn/searchindex.js @@ -1 +1 @@ -Search.setIndex({docnames:["build_and_install/build_from_source_cn","build_and_install/docker_install_cn","build_and_install/index_cn","build_and_install/pip_install_cn","design/api","design/auto_gradient_check","design/backward","design/block","design/build_system/README","design/cluster_train/README","design/cluster_train/checkpointing","design/cluster_train/data_dispatch","design/cluster_train/large_model_dist_train","design/cluster_train/master_server","design/cluster_train/pserver_client","design/cluster_train/remote_parameter_updater","design/cluster_train/save_model","design/cluster_train/submit-job","design/concurrent_programming","design/cpp_data_feeding","design/csp","design/dist_refactor/distributed_architecture","design/dist_refactor/multi_cpu","design/dist_refactor/parameter_server","design/error_clip","design/evaluator","design/executor","design/file_manager/README","design/file_manager/pfs/pfsclient","design/float16","design/fluid","design/fluid_compiler","design/functions_operators_layers","design/gan_api","design/graph","design/graph_survey","design/if_else_op","design/infer_var_type","design/kernel_hint_design","design/kernel_selection","design/memory_optimization","design/mkl/mkl_packed","design/mkl/mkldnn","design/mkl/mkldnn_fluid","design/model_format","design/multi_language_interface/00.why_plain_c","design/multi_language_interface/01.inference_implementation","design/operator_kernel_type","design/ops/rnn","design/ops/sequence_decoder","design/optimizer","design/paddle_nccl","design/parallel_do","design/parameter_average","design/parameters_in_cpp","design/profiler","design/program","design/prune","design/python_api","design/reader/README","design/refactorization","design/register_grad_op","design/regularization","design/releasing_process","design/scope","design/selected_rows","design/simple_op_design","design/speech/deep_speech_2","design/support_new_device","design/switch","design/tensor_array","design/var_desc","dev/contribute_to_paddle_cn","dev/index_cn","dev/new_layer_cn","dev/new_op_cn","dev/use_eigen_cn","dev/write_docs_cn","faq/build_and_install/index_cn","faq/cluster/index_cn","faq/index_cn","faq/local/index_cn","faq/model/index_cn","faq/parameter/index_cn","getstarted/concepts/use_concepts_cn","getstarted/index_cn","getstarted/quickstart_cn","howto/capi/compile_paddle_lib_cn","howto/capi/index_cn","howto/capi/organization_of_the_inputs_cn","howto/capi/workflow_of_capi_cn","howto/cluster/cmd_argument_cn","howto/cluster/index_cn","howto/cluster/multi_cluster/fabric_cn","howto/cluster/multi_cluster/index_cn","howto/cluster/multi_cluster/k8s_aws_cn","howto/cluster/multi_cluster/k8s_cn","howto/cluster/multi_cluster/k8s_distributed_cn","howto/cluster/multi_cluster/openmpi_cn","howto/cluster/multi_cluster/src/k8s_data/README","howto/cluster/multi_cluster/src/k8s_train/README","howto/cluster/preparations_cn","howto/cmd_parameter/arguments_cn","howto/cmd_parameter/detail_introduction_cn","howto/cmd_parameter/index_cn","howto/cmd_parameter/use_case_cn","howto/index_cn","howto/optimization/cpu_profiling_cn","howto/optimization/gpu_profiling_cn","howto/read_source","howto/rnn/hierarchical_layer_cn","howto/rnn/hrnn_rnn_api_compare_cn","howto/rnn/index_cn","howto/rnn/recurrent_group_cn","howto/rnn/rnn_config_cn","index_cn","mobile/cross_compiling_for_android_cn","mobile/cross_compiling_for_ios_cn","mobile/cross_compiling_for_raspberry_cn","survey/cluster_bootstrapping_tools"],envversion:50,filenames:["build_and_install/build_from_source_cn.rst","build_and_install/docker_install_cn.rst","build_and_install/index_cn.rst","build_and_install/pip_install_cn.rst","design/api.md","design/auto_gradient_check.md","design/backward.md","design/block.md","design/build_system/README.md","design/cluster_train/README.md","design/cluster_train/checkpointing.md","design/cluster_train/data_dispatch.md","design/cluster_train/large_model_dist_train.md","design/cluster_train/master_server.md","design/cluster_train/pserver_client.md","design/cluster_train/remote_parameter_updater.md","design/cluster_train/save_model.md","design/cluster_train/submit-job.md","design/concurrent_programming.md","design/cpp_data_feeding.md","design/csp.md","design/dist_refactor/distributed_architecture.md","design/dist_refactor/multi_cpu.md","design/dist_refactor/parameter_server.md","design/error_clip.md","design/evaluator.md","design/executor.md","design/file_manager/README.md","design/file_manager/pfs/pfsclient.md","design/float16.md","design/fluid.md","design/fluid_compiler.md","design/functions_operators_layers.md","design/gan_api.md","design/graph.md","design/graph_survey.md","design/if_else_op.md","design/infer_var_type.md","design/kernel_hint_design.md","design/kernel_selection.md","design/memory_optimization.md","design/mkl/mkl_packed.md","design/mkl/mkldnn.md","design/mkl/mkldnn_fluid.md","design/model_format.md","design/multi_language_interface/00.why_plain_c.md","design/multi_language_interface/01.inference_implementation.md","design/operator_kernel_type.md","design/ops/rnn.md","design/ops/sequence_decoder.md","design/optimizer.md","design/paddle_nccl.md","design/parallel_do.md","design/parameter_average.md","design/parameters_in_cpp.md","design/profiler.md","design/program.md","design/prune.md","design/python_api.md","design/reader/README.md","design/refactorization.md","design/register_grad_op.md","design/regularization.md","design/releasing_process.md","design/scope.md","design/selected_rows.md","design/simple_op_design.md","design/speech/deep_speech_2.md","design/support_new_device.md","design/switch.md","design/tensor_array.md","design/var_desc.md","dev/contribute_to_paddle_cn.md","dev/index_cn.rst","dev/new_layer_cn.rst","dev/new_op_cn.md","dev/use_eigen_cn.md","dev/write_docs_cn.rst","faq/build_and_install/index_cn.rst","faq/cluster/index_cn.rst","faq/index_cn.rst","faq/local/index_cn.rst","faq/model/index_cn.rst","faq/parameter/index_cn.rst","getstarted/concepts/use_concepts_cn.rst","getstarted/index_cn.rst","getstarted/quickstart_cn.rst","howto/capi/compile_paddle_lib_cn.md","howto/capi/index_cn.rst","howto/capi/organization_of_the_inputs_cn.md","howto/capi/workflow_of_capi_cn.md","howto/cluster/cmd_argument_cn.md","howto/cluster/index_cn.rst","howto/cluster/multi_cluster/fabric_cn.md","howto/cluster/multi_cluster/index_cn.rst","howto/cluster/multi_cluster/k8s_aws_cn.md","howto/cluster/multi_cluster/k8s_cn.md","howto/cluster/multi_cluster/k8s_distributed_cn.md","howto/cluster/multi_cluster/openmpi_cn.md","howto/cluster/multi_cluster/src/k8s_data/README.md","howto/cluster/multi_cluster/src/k8s_train/README.md","howto/cluster/preparations_cn.md","howto/cmd_parameter/arguments_cn.md","howto/cmd_parameter/detail_introduction_cn.md","howto/cmd_parameter/index_cn.rst","howto/cmd_parameter/use_case_cn.md","howto/index_cn.rst","howto/optimization/cpu_profiling_cn.md","howto/optimization/gpu_profiling_cn.rst","howto/read_source.md","howto/rnn/hierarchical_layer_cn.rst","howto/rnn/hrnn_rnn_api_compare_cn.rst","howto/rnn/index_cn.rst","howto/rnn/recurrent_group_cn.md","howto/rnn/rnn_config_cn.rst","index_cn.rst","mobile/cross_compiling_for_android_cn.md","mobile/cross_compiling_for_ios_cn.md","mobile/cross_compiling_for_raspberry_cn.md","survey/cluster_bootstrapping_tools.md"],objects:{},objnames:{},objtypes:{},terms:{"00m":108,"01org":78,"03m":108,"0424m":108,"04\u4ee5\u4e0a":3,"04\u4ee5\u53camaco":86,"055ee37d":95,"0630u":108,"06u":108,"0810u":108,"0957m":108,"0\u53f7\u8bad\u7ec3\u8282\u70b9\u662f\u4e3b\u8bad\u7ec3\u8282\u70b9":103,"0\u5c42\u5e8f\u5217":110,"0_cudnn5":0,"0_cudnn5_avx_mkl":[1,3],"0_cudnn7_avx_mkl":3,"0rc1":63,"0rc2":63,"0x10f256d50":35,"0x7ffe4de00110":35,"100gi":95,"100m":81,"10g":17,"1150u":108,"11\u5b9e\u73b0\u4e86c":46,"11e6":96,"124n":108,"12\u4ee5\u4e0a":3,"12\u64cd\u4f5c\u7cfb\u7edf":78,"12gb":40,"13m":96,"1490u":108,"14\u7248\u672c\u4ee5\u4e0a\u7684":118,"14\u8fd9\u79cd\u5199\u6cd5\u5c06\u4f1a\u6d4b\u8bd5\u6a21\u578b":105,"1550u":108,"15\u884c":111,"16\u5b57\u8282\u8868\u793a\u4fdd\u5b58\u7684\u53c2\u6570\u603b\u4e2a\u6570":83,"16u":108,"173n":108,"1770u":108,"18ad":95,"18e457ce3d362ff5f3febf8e7f85ffec852f70f3b629add10aed84f930a68750":96,"197u":108,"1\u4e4b\u540e\u7684\u4efb\u4f55\u4e00\u4e2a\u7248\u672c\u6765\u7f16\u8bd1\u8fd0\u884c":0,"1\u7684\u5c42\u4e4b\u5916":105,"1\u7a00\u758f\u6570\u636e":74,"1\u8f6e\u5b58\u50a8\u7684\u6240\u6709\u6a21\u578b":105,"210u":108,"215n":108,"228u":108,"2520u":108,"2680u":108,"26\u884c":111,"279n":108,"27m":108,"285m":108,"2863m":108,"28m":108,"2977m":108,"2\u4e09\u7c7b\u7684\u6bd4\u4f8b\u4e3a":83,"2\u4e2a\u5b50\u5e8f\u5217":89,"2\u5206\u522b\u4ee3\u88683\u4e2a\u8282\u70b9\u7684trainer":97,"2\u610f\u5473\u77400\u53f7\u548c1\u53f7gpu\u5c06\u4f1a\u4f7f\u7528\u6570\u636e\u5e76\u884c\u6765\u8ba1\u7b97fc1\u548cfc2\u5c42":105,"2\u8fd9\u51e0\u4e2a\u76ee\u5f55\u8868\u793apaddlepaddle\u8282\u70b9\u4e0etrain":97,"2cbf7385":95,"302n":108,"30u":108,"328n":108,"32u":108,"331n":108,"3320u":108,"365e":95,"36u":108,"3710m":108,"3768m":108,"387u":108,"38u":108,"3920u":108,"39u":108,"3\u4ee5\u4e0a\u7684\u7b26\u53f7":3,"3\u53f7gpu":81,"4035m":108,"4090u":108,"4096mb":103,"4279m":108,"43u":108,"448a5b355b84":96,"4560u":108,"4563m":108,"45u":108,"4650u":108,"4726m":108,"473m":96,"4\u4e2a\u5e8f\u5217\u7684\u957f\u5ea6\u5206\u522b\u4e3a":89,"4\u5b57\u8282\u8868\u793apaddlepaddle\u7248\u672c\u4fe1\u606f":83,"4gb":103,"500m":81,"50bd":95,"50gi":95,"514u":108,"525n":108,"526u":108,"536u":108,"5460u":108,"5470u":108,"54u":108,"5690m":108,"573u":108,"578n":108,"5798m":108,"586u":108,"58s":96,"5969m":108,"5\u4f5c\u4e3a\u7f16\u8bd1\u73af\u5883":3,"5\u5373\u5c06\u505c\u6b62\u7ef4\u62a4":3,"5_cudnn5_avx_mkl":3,"5_cudnn5_avx_openbla":[3,86],"6080u":108,"6140u":108,"6305m":108,"639u":108,"64\u5e73\u53f0\u4e3a\u4f8b":116,"64m":44,"655u":108,"6780u":108,"6810u":108,"682u":108,"6970u":108,"6\u4e07\u4ebf\u6b21\u6d6e\u70b9\u8fd0\u7b97\u6bcf\u79d2":108,"6\u4ee5\u4e0a":[3,86],"6\u4f5c\u4e3a\u6807\u51c6\u7f16\u8bd1\u73af\u5883":3,"6ce9":95,"704u":108,"7090u":108,"72u":108,"73u":108,"75u":108,"760u":108,"767u":108,"783n":108,"784u":108,"78m":108,"7\u4ee5\u4e0a":116,"7\u4ee5\u4e0a\u7684\u7b26\u53f7":3,"7\u4ee5\u4e0b":116,"7\u548cpip":78,"7\u7248\u672c\u5f00\u59cb":116,"7\u7cfb\u5217":3,"7kb":96,"8000\u5c31\u53ef\u4ee5\u5728\u7f51\u9875\u4e0a\u751f\u6210\u9700\u8981\u7684\u6587\u6863":77,"8250u":108,"8300u":108,"830n":108,"849m":108,"861u":108,"8661m":108,"892m":108,"8\u5b57\u8282\u8868\u793a\u6bcf\u4e2a\u53c2\u6570\u5360\u7528\u7684\u5b57\u8282\u6570":83,"901n":108,"90u":108,"918u":108,"9247m":108,"924n":108,"9261m":108,"9330m":108,"94u":108,"9530m":108,"983m":108,"988u":108,"997u":108,"99u":108,"9a235":117,"9f18":96,"\u4e00\u4e2a":89,"\u4e00\u4e2a0\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a0\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u6269\u5c55\u6210\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a\u4e0d\u5171\u4eab\u7684\u4f8b\u5b50\u662f":75,"\u4e00\u4e2a\u5178\u578b\u7684chunk\u5982\u4e0b\u6240\u793a":27,"\u4e00\u4e2a\u5206\u5e03\u5f0fpaddlepaddle\u8bad\u7ec3\u4efb\u52a1\u4e2d":96,"\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u6216\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u6269\u5c55\u6210\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u8fdb\u5165":113,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u6216\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a0\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u8fdb\u5165":113,"\u4e00\u4e2a\u53cc\u5c42rnn\u7531\u591a\u4e2a\u5355\u5c42rnn\u7ec4\u6210":113,"\u4e00\u4e2a\u53ef\u8c03\u7528\u7684\u51fd\u6570":113,"\u4e00\u4e2a\u5e38\u7528\u7684cmake\u914d\u7f6e\u5982\u4e0b":118,"\u4e00\u4e2a\u6570\u636e\u96c6\u5927\u90e8\u5206\u5e8f\u5217\u957f\u5ea6\u662f100":81,"\u4e00\u4e2a\u662f\u6d6e\u70b9\u8ba1\u7b97\u91cf":108,"\u4e00\u4e2a\u72ec\u7acb\u7684\u5143\u7d20":110,"\u4e00\u4e2a\u72ec\u7acb\u7684\u8bcd\u8bed":110,"\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u7684\u6a21\u578b\u7531\u5927\u91cf\u7684\u53c2\u6570\u7ec4\u6210":10,"\u4e00\u4e2a\u7f51\u7edc\u5c42\u7684\u524d\u5411\u4f20\u64ad\u90e8\u5206\u628a\u8f93\u5165\u8f6c\u5316\u4e3a\u76f8\u5e94\u7684\u8f93\u51fa":74,"\u4e00\u4e2a\u7f51\u7edc\u5c42\u7684\u53c2\u6570\u662f\u5728":74,"\u4e00\u4e2a\u7f51\u7edc\u5c42\u7684c":74,"\u4e00\u4e2a\u8f93\u51fa\u6570\u636e\u540c\u6837\u88ab\u7ec4\u7ec7\u4e3a\u4e00\u4e2a":89,"\u4e00\u4e2a\u8f93\u51fa\u7ec4\u6210":75,"\u4e00\u4e2a\u91cd\u8981\u7684\u95ee\u9898\u662f\u9009\u62e9\u6b63\u786e\u7684learning_r":83,"\u4e00\u4e2achunk\u7531\u6240\u5728\u7684\u6587\u4ef6\u504f\u79fb":27,"\u4e00\u4e2agpu\u8bbe\u5907\u4e0a\u4e0d\u5141\u8bb8\u914d\u7f6e\u591a\u4e2a\u6a21\u578b":103,"\u4e00\u4e2agradientmachine\u7c7b\u7684\u5bf9\u8c61\u7ba1\u7406\u7740\u4e00\u7ec4\u8ba1\u7b97\u5c42":90,"\u4e00\u4e2alabel":111,"\u4e00\u4e2amemory\u5305\u542b":114,"\u4e00\u4e2aposix\u517c\u5bb9\u7684\u6587\u4ef6\u7cfb\u7edf":27,"\u4e00\u4e9b\u60c5\u51b5\u4e3a\u4e86\u4fbf\u4e8e\u53d1\u5e03":90,"\u4e00\u53e5\u8bdd\u662f\u7531\u8bcd\u8bed\u6784\u6210\u7684\u5e8f\u5217":113,"\u4e00\u53f0\u7535\u8111":0,"\u4e00\u662fbatch":81,"\u4e00\u6837\u7684\u65b9\u5f0f":0,"\u4e00\u7ef4\u6570\u7ec4":[89,90],"\u4e00\u7ef4\u6574\u578b\u6570\u7ec4":89,"\u4e00\u81f4":[110,111],"\u4e00\u822c\u4e0d\u5141\u8bb8\u518d\u4ece":63,"\u4e00\u822c\u4ece":72,"\u4e00\u822c\u5728paddlepaddle\u4e2d":111,"\u4e00\u822c\u662f\u7531\u4e8e\u76f4\u63a5\u4f20\u9012\u5927\u5b57\u5178\u5bfc\u81f4\u7684":83,"\u4e00\u822c\u6765\u8bf4":114,"\u4e00\u822c\u7531mkl":42,"\u4e00\u822c\u8868\u793a":111,"\u4e00\u822c\u8bbe\u7f6e":83,"\u4e00\u8282":90,"\u4e09\u79cd\u5e8f\u5217\u6a21\u5f0f":84,"\u4e0a":72,"\u4e0a\u4ea4\u53c9\u7f16\u8bd1raspberri":118,"\u4e0a\u4f20\u5230cloud\u6216\u8005\u4e0b\u8f7d\u5230\u672c\u5730\u7684\u65f6\u95f4\u53ef\u80fd\u6bd4\u8f83\u957f":27,"\u4e0a\u4f20\u65b9\u6cd5":63,"\u4e0a\u4f20\u8ba1\u7b97\u5f97\u51fa\u7684\u68af\u5ea6":92,"\u4e0a\u56fe\u4e2d\u7684":89,"\u4e0a\u56fe\u4e2d\u865a\u7ebf\u7684\u8fde\u63a5":111,"\u4e0a\u56fe\u63cf\u8ff0\u4e86\u4e00\u4e2a3\u8282\u70b9\u7684\u5206\u5e03\u5f0f\u8bad\u7ec3\u573a\u666f":97,"\u4e0a\u6ce8\u518c\u4e00\u4e0b":27,"\u4e0a\u7f16\u8bd1\u5f88\u6162":0,"\u4e0a\u8fd0\u884c":116,"\u4e0a\u8ff0\u4ee3\u7801\u5c06bias\u5168\u90e8\u521d\u59cb\u5316\u4e3a1":83,"\u4e0a\u8ff0\u547d\u4ee4\u4e2d":1,"\u4e0a\u8ff0\u547d\u4ee4\u628a\u5f53\u524d\u76ee\u5f55":0,"\u4e0a\u8ff0\u7684":82,"\u4e0a\u8ff0\u7684\u4ee3\u7801\u7247\u6bb5\u5305\u542b\u4e86\u4e24\u79cd\u65b9\u6cd5":108,"\u4e0a\u8ff0\u7b2c4\u6b65":0,"\u4e0a\u8ff0paddlepaddl":63,"\u4e0a\u9762\u7684\u4ee3\u7801\u5728":75,"\u4e0a\u9762\u7684\u4ee3\u7801\u9996\u5148\u5bfc\u5165\u4f9d\u8d56\u7684\u5305":75,"\u4e0b":[75,77],"\u4e0b\u4e00\u4e2awheel\u5305\u9700\u8981\u66f4\u65b0\u7248\u672c\u53f7\u624d\u53ef\u4ee5\u4e0a\u4f20":63,"\u4e0b\u4e00\u6b65\u5c31\u662f\u7528\u6a21\u578b\u6765\u505a\u9884\u6d4b":88,"\u4e0b\u4f1a\u770b\u5230\u5982\u4e0b\u76ee\u5f55\u7ed3\u6784":87,"\u4e0b\u540c":83,"\u4e0b\u56fe\u4e2d\u5c31\u5c55\u793a\u4e86\u4e00\u4e9b\u5173\u4e8e\u5185\u5b58\u6570\u636e\u8fc1\u5f99\u548c\u8ba1\u7b97\u8d44\u6e90\u5229\u7528\u7387\u7684\u5efa\u8bae":108,"\u4e0b\u56fe\u662f\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42\u7684\u793a\u610f\u56fe":74,"\u4e0b\u56fe\u662fcsr\u5b58\u50a8\u7a00\u758f\u77e9\u9635\u7684\u793a\u610f\u56fe":89,"\u4e0b\u5b58\u653e\u516c\u5171\u6570\u636e\u96c6\u5408":11,"\u4e0b\u627e\u5230":87,"\u4e0b\u62c9\u6846\u4e2d\u627e\u5230\u751f\u6210\u76843\u4e2a\u4e8c\u8fdb\u5236\u6587\u4ef6":63,"\u4e0b\u6587\u4ee5nlp\u4efb\u52a1\u4e3a\u4f8b":113,"\u4e0b\u6587\u4f1a\u8be6\u7ec6\u8fdb\u884c\u4ecb\u7ecd":89,"\u4e0b\u6587\u4f7f\u7528":97,"\u4e0b\u6587\u5c31\u662f\u7528job\u7c7b\u578b\u7684\u8d44\u6e90\u6765\u8fdb\u884c\u8bad\u7ec3":96,"\u4e0b\u6587\u8be6\u7ec6\u89e3\u91ca":89,"\u4e0b\u7684":[90,97],"\u4e0b\u8868\u5217\u51fa\u4e86python\u7aef\u8bad\u7ec3\u63a5\u53e3\u66b4\u9732\u7684\u6570\u636e\u7c7b\u578b":89,"\u4e0b\u8f7d":27,"\u4e0b\u8f7d\u5230\u672c\u5730":27,"\u4e0b\u8f7d\u5b8c\u6570\u636e\u540e":96,"\u4e0b\u8f7d\u5f97\u5230":63,"\u4e0b\u8f7d\u6307\u5b9a\u7248\u672c\u7684docker\u955c\u50cf":1,"\u4e0b\u8f7dgpu\u7248\u672c":1,"\u4e0b\u9762":90,"\u4e0b\u9762\u4e3e\u4e2a\u7b80\u5355\u7684\u4f8b\u5b50":108,"\u4e0b\u9762\u4ecb\u7ecd\u4ecb\u7ecd":75,"\u4e0b\u9762\u4ee5":91,"\u4e0b\u9762\u4ee5\u77e9\u9635\u4e58\u64cd\u4f5c":75,"\u4e0b\u9762\u4ee5addop\u4e3a\u4f8b\u8bf4\u660etensor\u7684\u4f7f\u7528\u8fc7\u7a0b":76,"\u4e0b\u9762\u5206\u522b\u4ecb\u7ecd\u67d0\u4e00\u7c7b\u6587\u4ef6\u7684\u5b9e\u73b0\u65b9\u5f0f":46,"\u4e0b\u9762\u5217\u51fa\u4e86":114,"\u4e0b\u9762\u5217\u51fa\u4e86\u5168\u8fde\u63a5\u5c42\u7684\u68af\u5ea6\u68c0\u67e5\u5355\u5143\u6d4b\u8bd5":74,"\u4e0b\u9762\u5c31\u6839\u636e\u8fd9\u51e0\u4e2a\u6b65\u9aa4\u5206\u522b\u4ecb\u7ecd":97,"\u4e0b\u9762\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u955c\u50cf\u6765\u4e0b\u8f7d\u6570\u636e\u5230docker":96,"\u4e0b\u9762\u662f":75,"\u4e0b\u9762\u662f\u5bf9":75,"\u4e0b\u9762\u662fc":90,"\u4e0b\u9762\u7684\u4ee3\u7801\u5c06\u968f\u673a\u751f\u6210\u7684\u77e9\u9635\u8f6c\u5316\u4e3a\u53ef\u4ee5\u88abpaddlepaddle\u52a0\u8f7d\u7684\u6a21\u578b\u53c2\u6570":83,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u4ecegithub\u62c9\u53d6\u6700\u65b0\u4ee3\u7801":87,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u521b\u5efa\u4e86\u4e00\u4e2a\u9ad8\u5ea6\u4e3a1":89,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u521b\u5efa\u4e86\u4e00\u4e2acpu\u4e0a\u7684\u4e8c\u503c\u7a00\u758f\u77e9\u9635":89,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u521b\u5efa\u4e86\u542b\u6709\u4e09\u4e2a\u5143\u7d20":89,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u5728\u521b\u5efa\u4e86\u4e00\u4e2acpu\u4e0a\u7684\u5e26\u5143\u7d20\u503c\u7684\u7a00\u758f\u77e9\u9635":89,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u5b9e\u73b0\u4e86":74,"\u4e0b\u9762\u7684\u70b9\u5b9e\u73b0\u4e86mulop\u7684\u5b9a\u4e49":75,"\u4e0b\u9762\u7ed9\u51fa\u4e86\u4e00\u4e2a\u4f8b\u5b50":74,"\u4e0b\u9762\u7ed9\u51fa\u5728\u4e09\u7ef4\u7a7a\u95f4\u4e2d\u4f7f\u7528\u7ebf\u6027\u56de\u5f52\u62df\u5408\u4e00\u6761\u76f4\u7ebf\u7684\u4f8b\u5b50":84,"\u4e0b\u9762\u8be6\u7ec6\u89e3\u91ca\u4ec0\u4e48\u662f":89,"\u4e0b\u9762\u8fd9\u4e9blayer\u80fd\u591f\u63a5\u53d7\u53cc\u5c42\u5e8f\u5217\u4f5c\u4e3a\u8f93\u5165":110,"\u4e0d\u4e00\u81f4\u7684\u7531pfsclient\u4e0b\u8f7d\u6216\u8005\u4f20\u8f93chunk\u5b8c\u6210":27,"\u4e0d\u4ec5\u8981\u63d0\u4f9b\u6bcf\u4e00\u4e2a\u5916\u5c42\u5e8f\u5217\u5728\u6574\u4e2a":89,"\u4e0d\u4f1a\u4fdd\u7559\u5728\u78c1\u76d8\u4e0a":0,"\u4e0d\u4f1a\u518d\u4ece":81,"\u4e0d\u4f1a\u865a\u62df\u4efb\u4f55\u786c\u4ef6":0,"\u4e0d\u4f7f\u7528\u9759\u6001\u5e93":45,"\u4e0d\u4f7f\u7528\u989d\u5916\u7a7a\u95f4":74,"\u4e0d\u4f7f\u7528c":45,"\u4e0d\u4f7f\u7528swig":45,"\u4e0d\u5141\u8bb8\u4e00\u4e2a\u6587\u4ef6\u4e2d\u5305\u542b\u591a\u4e2aop":75,"\u4e0d\u5171\u4eab\u5219\u4e0d\u52a0":75,"\u4e0d\u5171\u4eab\u7684\u4f8b\u5b50\u53ef\u4ee5\u53c2\u8003":75,"\u4e0d\u53ef\u4ee5\u66f4\u6539":63,"\u4e0d\u53ef\u518d\u8fdb\u884c\u62c6\u5206":89,"\u4e0d\u540c":42,"\u4e0d\u540c\u4e8e\u4e0a\u8ff0\u4ecb\u7ecd\u7684recurr":82,"\u4e0d\u540c\u4e8eop\u7684\u7f16\u8bd1\u6d4b\u8bd5":75,"\u4e0d\u540c\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u6570\u636e\u5927\u5c0f\u7684\u6700\u5927\u503c\u4e0e\u6700\u5c0f\u503c\u7684\u6bd4\u7387":103,"\u4e0d\u540c\u5e8f\u5217\u53ef\u80fd\u4f1a\u542b\u6709\u4e0d\u540c\u6570\u76ee\u4e2a\u5143\u7d20":89,"\u4e0d\u540c\u65f6\u95f4\u6b65\u7684\u8f93\u5165\u662f\u4e0d\u540c\u7684":114,"\u4e0d\u540c\u7248\u672c\u7684\u7f16\u8bd1\u5668\u4e4b\u95f4":45,"\u4e0d\u540c\u7684\u4f18\u5316\u7b97\u6cd5\u9700\u8981\u4f7f\u7528\u4e0d\u540c\u5927\u5c0f\u7684\u5185\u5b58":81,"\u4e0d\u540c\u7684\u5206\u5e03\u5f0f\u6587\u4ef6\u7cfb\u7edf":97,"\u4e0d\u540c\u7684\u6570\u636e\u7c7b\u578b\u548c\u5e8f\u5217\u6a21\u5f0f\u8fd4\u56de\u7684\u683c\u5f0f\u4e0d\u540c":84,"\u4e0d\u540c\u8ba1\u7b97\u5c42\u5bf9\u7a7a\u8f93\u5165\u7684\u5904\u7406\u7b56\u7565\u6709\u53ef\u80fd\u4e0d\u540c":89,"\u4e0d\u540c\u8bbe\u5907":75,"\u4e0d\u540c\u8bed\u8a00\u7684\u63a5\u53e3\u9002\u5e94\u4e0d\u540c\u8bed\u8a00\u7684\u7279\u6027":45,"\u4e0d\u540c\u8f93\u5165\u542b\u6709\u7684\u5b50\u53e5":113,"\u4e0d\u540c\u8f93\u5165\u5e8f\u5217\u542b\u6709\u7684\u8bcd\u8bed\u6570\u5fc5\u987b\u4e25\u683c\u76f8\u7b49":113,"\u4e0d\u540cdataprovider\u5bf9\u6bd4\u5982\u4e0b":111,"\u4e0d\u540crank\u7684tensor\u662f\u4e0d\u540c\u7c7b\u578b":76,"\u4e0d\u5728":46,"\u4e0d\u5bb9\u6613\u51fa\u9519":27,"\u4e0d\u5d4c\u5165\u5176\u4ed6\u8bed\u8a00\u89e3\u91ca\u5668":45,"\u4e0d\u5d4c\u5165python\u89e3\u91ca\u5668":45,"\u4e0d\u5e94\u8be5\u88ab\u62c6\u89e3":113,"\u4e0d\u6307\u5b9a\u65f6":113,"\u4e0d\u652f\u6301":89,"\u4e0d\u652f\u6301\u5e8f\u5217\u957f\u5ea6\u4e3a":89,"\u4e0d\u662f\u4e00\u6761\u5e8f\u5217":84,"\u4e0d\u662f\u771f\u6b63\u7684layer":82,"\u4e0d\u662f\u901a\u8fc7\u4e00\u822c\u7684\u65b9\u5f0f\u6765\u5b9e\u73b0\u5bf9\u8f93\u51fa\u7684\u6fc0\u6d3b":82,"\u4e0d\u663e\u793a\u7684\u5199\u6bcf\u4e2a\u7c7b\u5177\u4f53\u5305\u542b\u4ec0\u4e48":45,"\u4e0d\u6ee1\u8db3\u94a9\u5b50\u7684":72,"\u4e0d\u7528mount\u7684\u65b9\u5f0f\u6765\u8bbf\u95ee\u6570\u636e":11,"\u4e0d\u80fd\u4fee\u6539op\u7684\u6210\u5458\u53d8\u91cf":75,"\u4e0d\u80fd\u592a\u968f\u610f":72,"\u4e0d\u80fd\u88ab\u63d0\u4ea4\u5230":72,"\u4e0d\u8981\u5728\u6ce8\u91cd\u6027\u80fd\u7684\u8bad\u7ec3\u573a\u666f\u4e0b\u4f7f\u7528":81,"\u4e0d\u8bba\u5e8f\u5217\u4e2d\u7684\u5143\u7d20\u5728\u5185\u5b58\u4e2d\u5360\u7528\u591a\u5c11\u5b9e\u9645\u5b58\u50a8\u7a7a\u95f4":89,"\u4e0d\u8bba\u6570\u636e\u57df\u662f":89,"\u4e0d\u8bba\u662f\u4e00\u7ef4\u6574\u578b\u6570\u7ec4\u8fd8\u662f\u4e8c\u7ef4\u6d6e\u70b9\u6570\u77e9\u9635":89,"\u4e0d\u8bba\u662f\u5355\u5c42\u5e8f\u5217\u8fd8\u662f\u53cc\u5c42\u5e8f\u5217\u7684\u5e8f\u5217\u4fe1\u606f":89,"\u4e0d\u8fc7\u5b9e\u9645\u4e0a\u662f\u8fd0\u884c\u5728\u4e00\u4e2a":0,"\u4e0d\u9700\u5728\u4f7f\u7528c":90,"\u4e0d\u9700\u8981\u4f9d\u8d56\u5176\u4ed6\u4efb\u4f55\u8f6f\u4ef6\u4e86":0,"\u4e0d\u9700\u8981\u63d0\u4f9b\u5143\u7d20\u503c":89,"\u4e0d\u9700\u8981\u8bbe\u7f6e":116,"\u4e0e":[42,75,97,107],"\u4e0e\u4e4b\u76f8\u5bf9\u7684\u662flocal":27,"\u4e0e\u5176\u4ed6\u7b2c\u4e09\u65b9\u5e93\u4e00\u6837":42,"\u4e0e\u5176\u5b83":90,"\u4e0e\u529f\u80fd\u5206\u652f\u4e0d\u540c\u7684\u662f":63,"\u4e0e\u5355\u5c42rnn\u7684\u914d\u7f6e\u7c7b\u4f3c":111,"\u4e0e\u53ef\u80fd\u6709\u7684":63,"\u4e0e\u540c\u6b65sgd\u76f8\u6bd4":92,"\u4e0e\u5bfb\u627epython\u4ee3\u7801\u7684\u6027\u80fd\u74f6\u9888\u7c7b\u4f3c":107,"\u4e0e\u5f53\u524d\u7684\u8870\u51cf\u56e0\u5b50\u7684\u4e58\u79ef":83,"\u4e0e\u672c\u5730\u8bad\u7ec3\u76f8\u540c":93,"\u4e0e\u6b64\u4e0d\u540c\u7684\u662f":97,"\u4e0e\u8c03\u4f18":107,"\u4e0e\u8f93\u5165\u4e0d\u540c\u7684\u662f":90,"\u4e0e\u8fd9\u4e2a\u8bad\u7ec3\u6570\u636e\u4ea4\u4e92\u7684layer":81,"\u4e0ebatch":41,"\u4e0ejob":97,"\u4e0eoperator\u524d\u5411\u8ba1\u7b97\u7684\u8f93\u51fa\u8fdb\u884c\u5bf9\u6bd4":75,"\u4e0eoperator\u6ce8\u518c\u65f6\u6ce8\u518c\u7684\u7c7b\u578b\u4e00\u81f4":75,"\u4e0epython\u4e0d\u540c":107,"\u4e14\u4e0d\u6392\u9664commit\u4e4b\u95f4\u7684\u4fee\u6539\u5b58\u5728\u76f8\u4e92\u8986\u76d6\u7684\u60c5\u51b5":72,"\u4e14\u4f7f\u7528":87,"\u4e14\u589e\u52a0\u4e00\u4e2a\u7b2c\u4e09\u65b9\u8bed\u8a00":45,"\u4e14\u5c55\u793a\u6548\u679c\u66f4\u597d":107,"\u4e14\u5e8f\u5217\u7684\u6bcf\u4e00\u4e2a\u5143\u7d20\u8fd8\u662f\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217":84,"\u4e14\u6bcf\u4e2a\u53e5\u5b50\u8868\u793a\u4e3a\u5bf9\u5e94\u7684\u8bcd\u8868\u7d22\u5f15\u6570\u7ec4":111,"\u4e14\u8c03\u7528\u65f6\u4e0d\u80fd\u629b\u51fa\u5f02\u5e38\u6216\u51fa\u73b0\u8fd0\u884c\u65f6\u9519\u8bef":46,"\u4e14c99\u652f\u6301bool\u7c7b\u578b\u548c\u5b9a\u957f\u6574\u6570":45,"\u4e14c99\u76f8\u5bf9\u4e8ec11\u4f7f\u7528\u66f4\u52a0\u5e7f\u6cdb":45,"\u4e24\u4e2a\u5b50\u76ee\u5f55\u4e0b":77,"\u4e24\u4e2a\u5d4c\u5957\u7684":113,"\u4e24\u4e2a\u64cd\u4f5c":108,"\u4e24\u4e2a\u8f93\u5165\u7684\u5b50\u5e8f\u5217\u957f\u5ea6\u4e5f\u5e76\u4e0d\u76f8\u540c":111,"\u4e24\u4e2a\u90e8\u5206":77,"\u4e24\u4e2a\u9690\u5c42\u7684\u7b80\u5355\u5168\u8fde\u63a5\u7f51\u7edc":90,"\u4e24\u6b21":89,"\u4e24\u79cd\u5e38\u7528\u7684\u6a21\u578b\u52a0\u8f7d\u65b9\u5f0f":90,"\u4e24\u79cd\u65b9\u6cd5\u7684\u533a\u522b":81,"\u4e24\u79cdblas\u5e93":0,"\u4e24\u8005\u90fd\u662f\u5bf9\u68af\u5ea6\u7684\u622a\u65ad":81,"\u4e25\u683c\u7684\u547d\u540d\u89c4\u8303pep":63,"\u4e2a\u5185\u5b58\u6c60\u5b9e\u9645\u4e0a\u51b3\u5b9a\u4e86shuffle\u7684\u7c92\u5ea6":81,"\u4e2a\u6027\u5316\u63a8\u8350":63,"\u4e2a\u6279\u6b21\u7684\u53c2\u6570\u5e73\u5747\u503c\u8fdb\u884c\u6d4b\u8bd5":103,"\u4e2a\u6a21\u578b\u6d4b\u8bd5\u6570\u636e":103,"\u4e2d":[41,42,45,46,74,75,76,81,89,97,107],"\u4e2d\u4e0d\u8981\u6dfb\u52a0\u5927\u6587\u4ef6\u7b49":72,"\u4e2d\u4f1a\u4f7f\u7528\u5230\u7684\u5b57\u5178\u6570\u636e\u6587\u4ef6":91,"\u4e2d\u4f1a\u63d0\u4f9b\u4e00\u4e9b\u5fc5\u8981\u7684\u63a5\u53e3\u548c\u51fd\u6570":42,"\u4e2d\u4f20\u5165\u53c2\u6570":91,"\u4e2d\u4f20\u5165\u7684\u53c2\u6570":91,"\u4e2d\u5143\u7d20\u4e2a\u6570\u603b\u662f\u7b49\u4e8e\u884c\u6570":89,"\u4e2d\u5143\u7d20\u7684\u4e2a\u6570\u7b49\u4e8e\u7f51\u7edc\u4e2d\u8f93\u51fa\u5c42\u7684\u4e2a\u6570":81,"\u4e2d\u5173\u4e8e\u65f6\u95f4\u9012\u5f52\u795e\u7ecf\u7f51\u7edc\u7684\u4ecb\u7ecd":111,"\u4e2d\u5199\u5165json\u5185\u5bb9":10,"\u4e2d\u5305\u542b\u4e00\u4e2araspberri":118,"\u4e2d\u5305\u542b\u6240\u4f9d\u8d56\u7684\u6240\u6709\u7b2c\u4e09\u65b9\u5e93":116,"\u4e2d\u5305\u542b\u82e5\u5e72\u4e2a\u4e0d\u540candroid":116,"\u4e2d\u5305\u542bc":[116,118],"\u4e2d\u5355\u5143\u6d4b\u8bd5\u7684\u4e00\u90e8\u5206":72,"\u4e2d\u5355\u5143\u6d4b\u8bd5\u80fd\u987a\u5229\u901a\u8fc7":72,"\u4e2d\u542b\u6709\u591a\u4e2a\u5e8f\u5217":89,"\u4e2d\u5b8c\u5168\u4e00\u81f4":45,"\u4e2d\u5b9a\u4e49":114,"\u4e2d\u5b9a\u4e49\u548c\u4f7f\u7528":113,"\u4e2d\u5b9e\u73b0\u4e86\u4e00\u4e2amerge\u7684\u65b9\u6cd5":42,"\u4e2d\u5b9e\u73b0\u7684\u7ed3\u6784\u4f53":46,"\u4e2d\u5bf9\u5e94\u7684layer\u5904":41,"\u4e2d\u5f15\u5165\u7684":41,"\u4e2d\u6253\u5370\u5176\u503c":81,"\u4e2d\u6307\u5b9a":103,"\u4e2d\u6307\u5b9a\u7684\u540d\u5b57":105,"\u4e2d\u63d0\u4f9b\u4e00\u4e2a\u4e0emkl\u6709\u5173\u7684\u603b\u5f00\u5173":42,"\u4e2d\u63d0\u4f9b\u4e86\u4e00\u4e9b\u5168\u5c40\u51fd\u6570\u7528\u6765\u5b9e\u73b0paddl":76,"\u4e2d\u641c\u7d22\u8fd9\u51e0\u4e2a\u5e93":0,"\u4e2d\u64cd\u4f5c":89,"\u4e2d\u6587\u6587\u6863":77,"\u4e2d\u6587\u6587\u6863\u76ee\u5f55":77,"\u4e2d\u6587\u7ef4\u57fa\u767e\u79d1\u9875\u9762":111,"\u4e2d\u6839\u636e":41,"\u4e2d\u6bcf\u4e2a\u5143\u7d20\u662f\u4e00\u4e2alayer\u7684\u8f93\u51fa\u7ed3\u679c\u77e9\u9635":81,"\u4e2d\u6bcf\u4e2apod\u7684ip\u5730\u5740":97,"\u4e2d\u6bcf\u5c42\u7684\u6570\u503c\u7edf\u8ba1":103,"\u4e2d\u6dfb\u52a0":41,"\u4e2d\u6dfb\u52a0\u4e00\u4e2a":42,"\u4e2d\u6dfb\u52a0\u4e24\u4e2a\u8f93\u5165":75,"\u4e2d\u7528\u4e8e\u5b58\u50a8\u6570\u636e\u7684":90,"\u4e2d\u7684":[76,90],"\u4e2d\u7684\u4e00\u884c":72,"\u4e2d\u7684\u4ee3\u7801\u4f5c\u4e3a\u5b9e\u4f8b":91,"\u4e2d\u7684\u504f\u79fb":89,"\u4e2d\u7684\u5bf9\u5e94\u5206\u652f\u5373\u53ef":72,"\u4e2d\u7684\u7248\u672c\u4fe1\u606f":63,"\u4e2d\u7684\u76f8\u5173\u811a\u672c":90,"\u4e2d\u7684\u8d77\u59cb\u504f\u79fb":89,"\u4e2d\u83b7\u53d6":97,"\u4e2d\u8bbe\u7f6e\u7684\u6240\u6709\u8282\u70b9":93,"\u4e2d\u8be6\u7ec6\u4ecb\u7ecd":74,"\u4e2d\u8c03\u7528":75,"\u4e2d\u8fd0\u884c\u4efb\u52a1\u7684\u89d2\u5ea6":11,"\u4e2d\u914d\u7f6e\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":114,"\u4e34\u65f6\u53d8\u91cf\u7b49\u7b49":81,"\u4e3a":[41,42,75,89,114,116,117,118],"\u4e3a\u4e86\u4f7f":75,"\u4e3a\u4e86\u4f7f\u8bc4\u5ba1\u4eba\u5728\u8bc4\u5ba1\u4ee3\u7801\u65f6\u66f4\u597d\u5730\u4e13\u6ce8\u4e8e\u4ee3\u7801\u672c\u8eab":72,"\u4e3a\u4e86\u4fdd\u8bc1\u6548\u7387":74,"\u4e3a\u4e86\u4fdd\u8bc1gpu\u9a71\u52a8\u80fd\u591f\u5728\u955c\u50cf\u91cc\u9762\u6b63\u5e38\u8fd0\u884c":1,"\u4e3a\u4e86\u51cf\u5c11\u751f\u6210\u94fe\u63a5\u5e93\u7684\u5927\u5c0f\u628a":87,"\u4e3a\u4e86\u548c\u7528\u6237\u7cfb\u7edf\u517c\u5bb9":88,"\u4e3a\u4e86\u5c01\u88c5\u80fd\u591f\u6b63\u786e\u5de5\u4f5c":74,"\u4e3a\u4e86\u5c3d\u53ef\u80fd\u5c11\u7684\u5728\u7236\u7c7blayer\u4e2d\u6dfb\u52a0\u53d8\u91cf\u6216\u8005\u51fd\u6570":42,"\u4e3a\u4e86\u5e94\u5bf9\u4ee5\u4e0a\u7684\u95ee\u9898":27,"\u4e3a\u4e86\u63cf\u8ff0\u65b9\u4fbf":113,"\u4e3a\u4e86\u65b9\u4fbf\u5927\u5bb6":72,"\u4e3a\u4e86\u65b9\u4fbf\u5927\u5bb6\u7684\u90e8\u7f72":94,"\u4e3a\u4e86\u66b4\u9732\u7684\u63a5\u53e3\u5c3d\u91cf\u7b80\u5355":46,"\u4e3a\u4e86\u66f4\u597d\u7684\u7b26\u5408paddlepaddle\u7684\u4ee3\u7801\u98ce\u683c":42,"\u4e3a\u4e86\u6700\u5927\u7a0b\u5ea6\u51cf\u5c11\u591a\u6b21\u8c03\u7528":41,"\u4e3a\u4e86\u751f\u6210\u66f4\u53ef\u8bfb\u7684\u6027\u80fd\u5206\u6790\u7ed3\u679c":107,"\u4e3a\u4e86\u7b80\u5316cmake\u914d\u7f6e":116,"\u4e3a\u4e86\u7f16\u8bd1paddlepaddl":0,"\u4e3a\u4e86\u8fbe\u5230\u6027\u80fd\u6700\u4f18":108,"\u4e3a\u4e86\u8fbe\u5230\u6700\u5feb\u7684\u8ba1\u7b97\u901f\u5ea6":[116,117],"\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347paddlepaddle\u5728\u57fa\u672c\u6570\u5b66\u8fd0\u7b97\u7684\u8ba1\u7b97\u901f\u5ea6":42,"\u4e3a\u4ec0\u4e48\u7528":0,"\u4e3a\u4f7f\u7528c":90,"\u4e3a\u4f8b":[75,82],"\u4e3a\u4f8b\u6765\u4ecb\u7ecd\u5982\u4f55\u5199\u5e26kernel\u7684oper":75,"\u4e3a\u53c2\u6570\u77e9\u9635\u7684\u5bbd\u5ea6":83,"\u4e3a\u5b83\u4eec\u9644\u52a0\u4e0a\u5e8f\u5217\u4fe1\u606f\u5c06\u53d8\u6210\u5e8f\u5217\u8f93\u5165":89,"\u4e3a\u5bb9\u5668\u5185\u6267\u884c\u7684\u547d\u4ee4":1,"\u4e3a\u60a8\u505a\u6027\u80fd\u8c03\u4f18\u63d0\u4f9b\u4e86\u65b9\u5411":108,"\u4e3a\u65b9\u4fbf\u4f5c\u4e1a\u542f\u52a8\u63d0\u4f9b\u4e86\u4e24\u4e2a\u72ec\u7279\u7684\u547d\u4ee4\u9009\u9879":93,"\u4e3a\u6b64":96,"\u4e3a\u6bcf\u4e00\u4e2a":[89,90],"\u4e3a\u6bcf\u4e00\u4e2a\u8f93\u5165":[89,90],"\u4e3a\u6bcf\u4e2aop\u521b\u5efa\u5355\u72ec\u7684":75,"\u4e3a\u8f93\u51fa\u5206\u914d\u5185\u5b58":74,"\u4e3aconst\u51fd\u6570":75,"\u4e3aoutput_\u7533\u8bf7\u5185\u5b58":74,"\u4e3b\u8981\u4e3a\u5f00\u53d1\u8005\u4f7f\u7528":103,"\u4e3b\u8981\u529f\u80fd\u5305\u62ec":27,"\u4e3b\u8981\u5305\u62ec":42,"\u4e3b\u8981\u5305\u62ec\u4e86\u6df1\u5ea6\u5b66\u4e60\u76f8\u5173\u7684\u6570\u5b66\u539f\u8bed\u4e0e\u64cd\u4f5c":42,"\u4e3b\u8981\u5305\u62ec\u56db\u79cd\u7c7b\u578b":84,"\u4e3b\u8981\u539f\u56e0\u5305\u62ec\u4e24\u4e2a\u65b9\u9762":81,"\u4e3b\u8981\u7528\u4e8epython":75,"\u4e3b\u8981\u9488\u5bf9paddlepaddle\u5728\u91cd\u6784\u4e4b\u524d\u7684\u4ee3\u7801\u6846\u67b6\u4ee5\u53cav1\u7684api":42,"\u4e3e\u4e00\u4e2a\u4f8b\u5b50":83,"\u4e3e\u4f8b":81,"\u4e3e\u4f8b\u8bf4\u660e":111,"\u4e4b\u524d":72,"\u4e4b\u540e":[74,84,91],"\u4e4b\u540e\u4f7f\u7528":74,"\u4e4b\u540e\u4f7f\u7528\u77e9\u9635\u8fd0\u7b97\u51fd\u6570\u6765\u8ba1\u7b97":74,"\u4e4b\u540e\u518d\u7528\u7f51\u9875\u8fde\u5230http":77,"\u4e4b\u540e\u521d\u59cb\u5316\u6240\u6709\u7684\u6743\u91cd\u77e9\u9635":74,"\u4e4b\u540e\u624d\u80fd\u5f00\u59cb\u7f16\u8bd1\u7684\u6b65\u9aa4":0,"\u4e4b\u5916\u7684\u6240\u6709\u5934\u6587\u4ef6":46,"\u4e4b\u7c7b\u7684\u7a0b\u5e8f\u6765\u7f16\u8bd1\u6e90\u7801":0,"\u4e4b\u95f4\u7684\u8fd0\u7b97\u662f\u72ec\u7acb\u7684":113,"\u4e58\u4e0a\u8f93\u51fa\u7684\u68af\u5ea6":74,"\u4e58\u6cd5\u548c\u4e58\u6cd5\u68af\u5ea6\u7684\u8ba1\u7b97\u5360\u75282":107,"\u4e58\u9664\u7b49\u65f6\u5019":81,"\u4e5f\u4e0d\u4f7f\u7528\u5176\u4ed6\u52a8\u6001\u5e93":45,"\u4e5f\u4e0d\u5b58\u5728\u4e00\u4e2asubseq\u76f4\u63a5\u751f\u6210\u4e0b\u4e00\u4e2asubseq\u7684\u60c5\u51b5":113,"\u4e5f\u4e0d\u5e94\u8be5\u62a5\u9519":46,"\u4e5f\u4e0d\u751f\u6210":46,"\u4e5f\u4e0d\u80fd\u63a5\u6536\u5e8f\u5217\u6570\u636e\u4f5c\u4e3a\u8f93\u5165":82,"\u4e5f\u4f1a\u5360\u7528\u78c1\u76d8":0,"\u4e5f\u53ef\u4ee5\u4f7f\u7528":72,"\u4e5f\u53ef\u4ee5\u4f7f\u7528\u8fd9\u4e9b\u955c\u50cf":63,"\u4e5f\u53ef\u4ee5\u5229\u7528paddlepaddl":77,"\u4e5f\u53ef\u4ee5\u662f\u4e00\u4e2a\u8bcd\u8bed":113,"\u4e5f\u53ef\u4ee5\u662f\u5728\u4efb\u52a1\u542f\u52a8\u524d\u4e0b\u8f7d\u5230\u672c\u5730\u7684":91,"\u4e5f\u53ef\u4ee5\u76f4\u63a5\u5728\u7f51\u9875\u9884\u89c8\u6587\u6863":77,"\u4e5f\u53ef\u4ee5\u8bf4\u662f\u67d0\u4e9b\u7279\u5b9a\u6307\u4ee4\u7684\u4f7f\u7528\u60c5\u51b5":108,"\u4e5f\u53ef\u4ee5\u901a\u8fc7\u4fee\u6539":97,"\u4e5f\u53ef\u5199\u6210":75,"\u4e5f\u53ef\u81ea\u884c\u524d\u5f80\u5b98\u7f51\u4e0b\u8f7d":117,"\u4e5f\u53ef\u901a\u8fc7\u4ee5\u4e0b\u547d\u4ee4\u83b7\u53d6":116,"\u4e5f\u5c31\u662f":72,"\u4e5f\u5c31\u662f\u7a7a\u8f93\u5165":89,"\u4e5f\u5c31\u662f\u81ea\u5df1\u7528\u6237\u540d\u4e0b\u7684":72,"\u4e5f\u5c31\u662f\u8bf4":[89,103,105],"\u4e5f\u5c31\u662f\u8bf4\u8f93\u51fa\u7684\u7ed3\u679c\u4e0d\u4f1a\u5728\u539f\u6765\u7684\u6570\u636e\u4e0a\u7d2f\u52a0":42,"\u4e5f\u5c31\u662fpaddlepaddle\u4e2d\u7684\u4e00\u7ef4\u6574\u578b\u6570\u7ec4":89,"\u4e5f\u63cf\u8ff0\u4e86\u5bb9\u5668\u9700\u8981\u4f7f\u7528\u7684\u5b58\u50a8\u5377\u6302\u8f7d\u7684\u60c5\u51b5":97,"\u4e5f\u652f\u6301cpu\u7684\u6027\u80fd\u5206\u6790":108,"\u4e5f\u662f\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217":111,"\u4e5f\u662fdecoder\u5faa\u73af\u5c55\u5f00\u7684\u4f9d\u636e":113,"\u4e5f\u6ca1\u7528":78,"\u4e66\u5199":45,"\u4e86":0,"\u4e86\u89e3\u5176\u8c03\u7528\u5173\u7cfb":107,"\u4e86\u89e3\u60a8\u7684\u786c\u4ef6":108,"\u4e86\u89e3\u66f4\u591a\u7ec6\u8282":114,"\u4e86\u89e3\u66f4\u591a\u8be6\u7ec6\u4fe1\u606f":114,"\u4e8c\u7ef4\u6d6e\u70b9\u578b\u77e9\u9635":89,"\u4e8c\u7ef4\u6d6e\u70b9\u6570\u77e9\u9635":89,"\u4e8c\u7ef4\u77e9\u9635":90,"\u4e8c\u7ef4\u77e9\u9635\u53ef\u4ee5\u8868\u793a\u884c\u5411\u91cf\u548c\u5217\u5411\u91cf":89,"\u4e8c\u8005\u8bed\u610f\u4e0a\u5b8c\u5168\u4e00\u81f4":111,"\u4e8e\u662f":89,"\u4e8e\u662f\u6211\u4eec\u53ef\u4ee5\u70b9\u51fb":107,"\u4e8e\u662f\u8fd9\u91cc\u4f7f\u7528":107,"\u4ea4\u4e92\u7684\u65b9\u6cd5":107,"\u4ea4\u53c9\u7f16\u8bd1\u5de5\u5177\u94fe\u4e3a":116,"\u4ea4\u53c9\u7f16\u8bd1android\u5e73\u53f0\u4e0a\u9002\u7528\u7684paddlepaddle\u5e93":116,"\u4ea4\u53c9\u7f16\u8bd1android\u7248\u672c\u7684paddlepaddle\u5e93\u65f6":116,"\u4ea4\u53c9\u7f16\u8bd1ios\u5e73\u53f0\u4e0a\u9002\u7528\u7684paddlepaddle\u5e93":117,"\u4ea4\u53c9\u7f16\u8bd1ios\u7248\u672c\u7684paddlepaddle\u5e93\u65f6":117,"\u4ea4\u53c9\u7f16\u8bd1raspberri":118,"\u4ea4\u7531cmake\u7cfb\u7edf\u672c\u8eab\u6765\u5904\u7406":116,"\u4ea6\u53ef\u4ee5\u901a\u8fc7\u624b\u52a8\u8bbe\u7f6e":117,"\u4eab\u53d7\u60a8\u7684\u65c5\u7a0b":1,"\u4eba\u8138\u8bc6\u522b":11,"\u4ec0\u4e48\u662f":0,"\u4ec5\u4ec5\u4f7f\u7528":45,"\u4ec5\u4f1a\u5728\u652f\u6301avx2\u6307\u4ee4\u96c6\u53ca\u4ee5\u4e0a\u7684\u673a\u5668\u624d\u4f7f\u7528mkl":42,"\u4ec5\u5728\u8fdc\u7a0b\u7a00\u758f\u8bad\u7ec3\u65f6\u6709\u6548":74,"\u4ec5\u5bf9\u7a00\u758f\u6570\u636e\u6709\u6548":74,"\u4ec5\u652f\u6301\u6574\u578b\u503c":89,"\u4ec5\u7528\u4e8e\u5b58\u50a8\u6574\u578b\u503c":90,"\u4ecb\u7ecd\u4e86\u4e00\u79cd\u901a\u8fc7ssh\u8fdc\u7a0b\u5206\u53d1\u4efb\u52a1":97,"\u4ecb\u7ecd\u4ea4\u53c9\u7f16\u8bd1android\u5e73\u53f0\u4e0a\u9002\u7528\u7684paddlepaddle\u5e93\u7684\u65b9\u6cd5\u548c\u6b65\u9aa4":116,"\u4ecb\u7ecd\u4f7f\u7528paddlepaddl":91,"\u4ece":[63,79,108],"\u4ece0\u5230num":103,"\u4ece0\u5f00\u59cb\u7684\u6574\u6570":91,"\u4ece\u4e00\u4e2aword\u751f\u6210\u4e0b\u4e00\u4e2aword":113,"\u4ece\u5185\u6838\u51fd\u6570\u7684\u89d2\u5ea6":108,"\u4ece\u6a21\u578b\u6587\u4ef6\u5c06\u9884\u8bad\u7ec3\u53c2\u6570\u8f7d\u5165":83,"\u4ece\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u6765\u770b":111,"\u4ece\u6e90\u7801\u4e2d\u6784\u5efa\u7528\u4e8e\u7f16\u8bd1paddlepaddle\u7684docker\u955c\u50cf":0,"\u4ece\u6e90\u7801\u4ea4\u53c9\u7f16\u8bd1ios\u5e73\u53f0\u4e0a\u9002\u7528\u7684paddlepaddle\u5e93":117,"\u4ece\u6e90\u7801\u4ea4\u53c9\u7f16\u8bd1paddlepaddl":116,"\u4ece\u6e90\u7801\u7f16\u8bd1":2,"\u4ece\u78c1\u76d8\u52a0\u8f7d\u9884\u6d4b\u6a21\u578b":90,"\u4ece\u78c1\u76d8\u6587\u4ef6\u4e2d\u52a0\u8f7duuid\u6587\u4ef6\u540d\u7684\u68c0\u67e5\u70b9\u5feb\u7167\u6587\u4ef6":10,"\u4ece\u800c\u53ef\u4ee5\u505a\u4e00\u4e9b\u4e0e\u8ba1\u7b97\u91cd\u53e0\u7684\u5de5\u4f5c":74,"\u4ece\u800c\u5f15\u53d1\u5176\u4ed6\u8282\u70b9\u65e0\u6cd5\u8fde\u63a5\u5bfc\u81f4":79,"\u4ece\u800c\u907f\u514d\u4e86packing\u5197\u4f59":41,"\u4ece\u8bed\u4e49\u4e0a\u770b":113,"\u4ece\u8d77\u59cb\u7aef\u53e3\u76d1\u542c\u591a\u4e2a\u7aef\u53e3\u7528\u4e8e\u901a\u4fe1":91,"\u4ece\u8f93\u5165\u6570\u636e\u4e0a\u770b":111,"\u4ececmake":116,"\u4eceetcd\u4e2d\u8bfb\u53d6\u8282\u70b9":10,"\u4ecestart":103,"\u4ed3\u5e93\u7684\u8fdc\u7a0b\u4e3b\u673a":72,"\u4ed6\u4e3b\u8981\u5305\u542b\u4e86\u5b9e\u9645\u66b4\u9732\u7684\u7c7b\u578b\u7ed3\u6784":46,"\u4ed6\u4eec\u5206\u522b\u662f":111,"\u4ed6\u4eec\u5728\u81ea\u5df1\u7684":0,"\u4ed6\u4eec\u5728paddle\u7684\u6587\u6863\u548capi\u4e2d\u662f\u4e00\u4e2a\u6982\u5ff5":111,"\u4ed6\u662f\u5c06":46,"\u4ed6\u7684\u76ee\u6807\u662f\u4f7f\u7528c":45,"\u4ee3\u66ff":97,"\u4ee3\u7801\u4e2d9":111,"\u4ee3\u7801\u53c2\u8003":91,"\u4ee3\u7801\u5982\u4e0b":[81,82,83,114],"\u4ee3\u7801\u6ce8\u91ca\u8bf7\u9075\u5b88":72,"\u4ee3\u7801\u7247\u6bb5\u5982\u4e0b":89,"\u4ee3\u7801\u751f\u6210\u7684\u7b26\u53f7\u53ef\u80fd\u4e0d\u4e00\u81f4":45,"\u4ee3\u7801\u7684\u6027\u80fd\u5206\u6790":107,"\u4ee3\u7801\u793a\u4f8b\u5982\u4e0b":[75,90],"\u4ee3\u8868\u5bbf\u4e3b\u673a\u76ee\u5f55":97,"\u4ee3\u8868\u8fd9\u4e2alayer\u662f\u7528\u4e8e\u8dd1\u5728mkl":42,"\u4ee3\u8868\u8fd9\u4e2ashard\u7684\u6700\u5927index":11,"\u4ee3\u8868shard\u7684index":11,"\u4ee5":82,"\u4ee5\u4e0a":[72,116],"\u4ee5\u4e0a\u4e24\u79cd\u65b9\u5f0f\u53ea\u9700\u9009\u62e9\u5176\u4e00\u5373\u53ef":90,"\u4ee5\u4e0a\u4ee3\u7801\u7684reader\u8f93\u51fa\u7684data":11,"\u4ee5\u4e0a\u547d\u4ee4\u4f1a\u5728\u5f53\u524d\u76ee\u5f55\u4e0b\u751f\u6210100\u4e2a\u6587\u4ef6":11,"\u4ee5\u4e0b":11,"\u4ee5\u4e0b\u4ee3\u7801\u7247\u6bb5\u5b9a\u4e49":114,"\u4ee5\u4e0b\u5c06\u4e00\u4e00\u4ecb\u7ecd":94,"\u4ee5\u4e0b\u6307\u4ee4\u80fd\u68c0\u67e5linux\u7535\u8111\u662f\u5426\u652f\u6301avx":1,"\u4ee5\u4e0b\u6307\u5357\u4ecb\u7ecd\u4e86\u5982\u4f55\u4f7f\u7528openmpi\u6765\u642d\u5efapaddlepaddle\u7684\u96c6\u7fa4\u8bad\u7ec3\u4efb\u52a1":94,"\u4ee5\u4e0b\u6307\u5357\u5c55\u793a\u4e86paddlepaddle\u5bf9kubernetes\u7684\u652f\u6301":94,"\u4ee5\u4e0b\u64cd\u4f5c\u5747\u5728head\u8282\u70b9\u4e2d\u6267\u884c":98,"\u4ee5\u4e0b\u6559\u7a0b\u5c06\u6307\u5bfc\u60a8\u63d0\u4ea4\u4ee3\u7801":72,"\u4ee5\u4e0b\u7b80\u79f0rnn":41,"\u4ee5\u4ea4\u4e92\u5f0f\u7684\u65b9\u5f0f\u6267\u884c\u6216\u8c03\u8bd5\u60a8\u7684\u4ee3\u7801":1,"\u4ee5\u4f7f\u7528":116,"\u4ee5\u4f7f\u7528adam\u7b97\u6cd5\u4e3a\u4f8b":83,"\u4ee5\u4fbf\u6211\u4eec\u53ef\u4ee5\u628a\u66f4\u591a\u7684\u7cbe\u529b\u653e\u5230\u903b\u8f91\u672c\u8eab\u4e0a":27,"\u4ee5\u4fbf\u83b7\u5f97\u8bad\u7ec3\u6570\u636e\u7684\u4f4d\u7f6e\u548c\u83b7\u53d6\u73af\u5883\u53d8\u91cf\u914d\u7f6e":91,"\u4ee5\u4fdd\u8bc1\u68af\u5ea6\u7684\u6b63\u786e\u8ba1\u7b97":74,"\u4ee5\u4fdd\u8bc1\u68af\u5ea6\u8ba1\u7b97\u7684\u6b63\u786e\u6027":74,"\u4ee5\u4fdd\u8bc1\u7f16\u8bd1\u9ad8\u6548":0,"\u4ee5\u51cf\u5c0fsdk\u7684\u4f53\u79ef":88,"\u4ee5\u53ca":[41,74,89],"\u4ee5\u53ca\u4f7f\u7528\u5b50\u5e8f\u5217\u6765\u5b9a\u4e49\u5206\u7ea7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u67b6\u6784":114,"\u4ee5\u53ca\u5207\u6362\u673a\u5668\u65f6\u9700\u8981\u65b0\u5b89\u88c5\u7684\u8f9b\u82e6":0,"\u4ee5\u53ca\u53cc\u5c42\u5e8f\u5217":110,"\u4ee5\u53ca\u5982\u4f55\u89e3\u6790\u795e\u7ecf\u7f51\u7edc\u524d\u5411\u8ba1\u7b97\u7684\u8f93\u51fa\u7ed3\u679c":89,"\u4ee5\u53ca\u76ee\u6807\u673a\u7248openblas\u5e93":118,"\u4ee5\u53ca\u76f8\u5173\u7684\u5c5e\u6027\u53c2\u6570":75,"\u4ee5\u53ca\u7b2c\u4e09\u65b9\u4f9d\u8d56\u94fe\u63a5\u5e93\u548c\u5934\u6587\u4ef6":87,"\u4ee5\u53ca\u8ba1\u7b97\u903b\u8f91\u5728\u5e8f\u5217\u4e0a\u7684\u5faa\u73af\u5c55\u5f00":113,"\u4ee5\u53ca\u8f93\u5165\u7684\u68af\u5ea6":74,"\u4ee5\u53caandroid":116,"\u4ee5\u53canumpi":11,"\u4ee5\u53carelu":74,"\u4ee5\u5b9e\u73b0\u5bf9\u6a21\u578b\u8bad\u7ec3\u6216\u9884\u6d4b\u6d41\u7a0b\u7684\u63a7\u5236":104,"\u4ee5\u63d0\u4f9b\u4e00\u4e9b\u9ed8\u8ba4\u7684\u7f16\u8bd1\u5668\u548c\u7f16\u8bd1\u53c2\u6570\u76f8\u5173\u914d\u7f6e":116,"\u4ee5\u63d0\u4f9b\u4e00\u4e9b\u9ed8\u8ba4\u7684\u7f16\u8bd1\u5668\u548c\u7f16\u8bd1\u53c2\u6570\u914d\u7f6e":117,"\u4ee5\u6b64\u8fbe\u5230\u6700\u597d\u7684\u6027\u80fd":42,"\u4ee5\u786e\u4fdd\u6240\u6709\u7684\u7b2c\u4e09\u65b9\u4f9d\u8d56\u5e93\u548cpaddlepaddle\u4ee3\u7801\u90fd\u662f\u9488\u5bf9\u65b0\u7684cmake\u914d\u7f6e\u91cd\u65b0\u7f16\u8bd1\u7684":[116,117,118],"\u4ee5\u793a\u533a\u5206":[41,42],"\u4ee5\u8f93\u51fa":81,"\u4ee5\u9017\u53f7\u95f4\u9694":103,"\u4ee5\u907f\u514d\u94fe\u63a5\u4e0d\u5fc5\u8981\u7684\u5e93":87,"\u4ee5eigentensor\u4e3a\u4f8b":76,"\u4ee5embedding\u5c42\u4e3a\u4f8b":83,"\u4ee5lstm\u4e3a\u4f8b":82,"\u4efb\u4f55\u65f6\u5019\u5982\u679c\u9700\u8981\u6d6e\u70b9\u578b\u6570\u7ec4":89,"\u4efb\u52a1\u6765\u7ec8\u6b62\u96c6\u7fa4\u4f5c\u4e1a":93,"\u4efb\u610f\u5c06\u4e00\u4e9b\u6570\u636e\u7ec4\u5408\u6210\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217":111,"\u4efb\u610f\u65f6\u523b\u53ea\u53ef\u80fd\u540c\u65f6\u6709\u4e00\u53f0\u670d\u52a1\u5668\u6545\u969c":10,"\u4f18\u5316\u524d":41,"\u4f18\u5316\u540e":41,"\u4f18\u5316\u5668\u5219\u7528\u94fe\u5f0f\u6cd5\u5219\u6765\u5bf9\u6bcf\u4e2a\u53c2\u6570\u8ba1\u7b97\u635f\u5931\u51fd\u6570\u7684\u68af\u5ea6":74,"\u4f1a\u4ea7\u751f\u5f53\u524dpython\u4e8c\u8fdb\u5236\u7684\u5b8c\u6574\u8def\u5f84":107,"\u4f1a\u4ee5":[41,42],"\u4f1a\u4f7f\u7528":90,"\u4f1a\u4f7f\u7528\u76f8\u540c\u7684\u539f\u6570\u636e":41,"\u4f1a\u5148\u4e34\u65f6\u4fdd\u5b58\u5728":42,"\u4f1a\u5148\u8fdb\u884c\u53c2\u6570\u7684\u521d\u59cb\u5316\u4e0e\u89e3\u6790":97,"\u4f1a\u5171\u4eab\u53c2\u6570":83,"\u4f1a\u5173\u8054\u53c2\u6570":82,"\u4f1a\u52a0\u8f7d\u4e0a\u4e00\u8f6e\u7684\u53c2\u6570":103,"\u4f1a\u53d8\u6210\u8bcd\u8868\u4e2d\u7684\u4f4d\u7f6e":111,"\u4f1a\u542f\u52a8pserver\u4e0etrainer\u8fdb\u7a0b":97,"\u4f1a\u5728":[42,77],"\u4f1a\u5728\u5f53\u524d\u76ee\u5f55\u751f\u6210\u4e24\u4e2a\u5b50\u76ee\u5f55":77,"\u4f1a\u5728\u7f16\u8bd1paddlepaddle\u7684\u65f6\u5019\u4e0b\u8f7d\u5e76\u7f16\u8bd1mkl":42,"\u4f1a\u5927\u4e0d\u76f8\u540c":91,"\u4f1a\u5bf9\u6bcf\u4e00\u4e2a\u6fc0\u6d3b\u6682\u5b58\u4e00\u4e9b\u6570\u636e":81,"\u4f1a\u5bf9\u8bad\u7ec3\u6027\u80fd\u9020\u6210\u5f71\u54cd":81,"\u4f1a\u5bf9\u8fd9\u7c7b\u8f93\u5165\u8fdb\u884c\u62c6\u89e3":113,"\u4f1a\u5bfc\u81f4\u4e0d\u540c\u7248\u672cpython\u5728\u4e00\u4e2a\u8fdb\u7a0b\u91cc\u7684bug":45,"\u4f1a\u5c06\u6bcf\u4e2a\u65f6\u95f4\u6b65\u7684\u8f93\u51fa\u62fc\u63a5":113,"\u4f1a\u5c06\u7b2c\u4e00\u4e2a":81,"\u4f1a\u5f15\u5165":42,"\u4f1a\u6210\u4e3astep\u51fd\u6570\u7684\u8f93\u5165":113,"\u4f1a\u6253\u5370\u5230\u6807\u51c6\u8f93\u51fa":107,"\u4f1a\u6267\u884c":0,"\u4f1a\u628a\u8bad\u7ec3\u96c6\u548c\u6d4b\u8bd5\u96c6\u5206\u522b\u5206\u5272\u6210\u591a\u4e2a\u6587\u4ef6":91,"\u4f1a\u628acpu\u7684buffer\u5bf9\u9f50\u4e3a4096":42,"\u4f1a\u62a5\u5982\u4e0b\u7684\u9519\u8bef":81,"\u4f1a\u62a5\u9519":113,"\u4f1a\u6dfb\u52a0\u76f8\u5e94\u7684\u811a\u672c\u5728":42,"\u4f1a\u6dfb\u52a0\u76f8\u5e94\u7684\u811a\u672c\u7528\u4e8e\u6d4b\u8bd5\u548c\u5bf9\u6bd4\u5728\u4f7f\u7528mkl":41,"\u4f1a\u72ec\u7acb\u62e5\u6709\u4e00\u4efd\u8bad\u7ec3\u597d\u7684\u6a21\u578b":90,"\u4f1a\u751f\u6210\u6027\u80fd\u5206\u6790\u7ed3\u679c\u6587\u4ef6":107,"\u4f1a\u76f4\u63a5\u62a5\u9519\u9000\u51fa":45,"\u4f1a\u76f8\u5e94\u5730\u6539\u53d8\u8f93\u51fa\u7684\u5c3a\u5bf8":74,"\u4f1a\u81ea\u52a8\u4f7f\u7528mklml\u5e93\u4f5c\u4e3apaddlepaddle\u7684cblas\u548clapack\u5e93":42,"\u4f1a\u81ea\u52a8\u5173\u95ed\u5bf9\u5e94\u7684issu":72,"\u4f1a\u81ea\u52a8\u5728\u7f16\u8bd1\u65f6\u4e0b\u8f7d":0,"\u4f1a\u81ea\u52a8\u6839\u636e\u786c\u4ef6\u914d\u7f6e":42,"\u4f1a\u83b7\u53d6\u5f53\u524dnamespace\u4e0b\u7684\u6240\u6709pod":97,"\u4f1a\u88ab":91,"\u4f1a\u88ab\u62c6\u89e3\u4e3a\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":113,"\u4f1a\u88ab\u62c6\u89e3\u4e3a\u975e\u5e8f\u5217":113,"\u4f1a\u88abpickle\u5e8f\u5217\u5316\u6210\u5b57\u7b26\u4e32":11,"\u4f1a\u901a\u8fc7\u5224\u6570\u636e\u662f\u5426\u9644\u5e26\u6709\u5e8f\u5217\u4fe1\u606f\u6765\u5224\u65ad\u4e00\u4e2a\u5411\u91cf":89,"\u4f1a\u9020\u6210\u90ae\u4ef6\u707e\u96be":72,"\u4f20\u5165":11,"\u4f20\u7ed9dataprovider\u7684\u67d0\u4e00\u4e2aargs\u8fc7\u5927":83,"\u4f20\u9012\u7ed9\u914d\u7f6e\u6587\u4ef6\u7684\u53c2\u6570":103,"\u4f46":46,"\u4f46\u4e0d\u66b4\u9732":46,"\u4f46\u4e0d\u7528\u4e8e\u8ba1\u7b97\u68af\u5ea6":74,"\u4f46\u4e0d\u9700\u8981\u63d0\u524d\u521b\u5efa":103,"\u4f46\u4e8e\u53cc\u5c42\u5e8f\u5217\u7684lstm\u6765\u8bf4":111,"\u4f46\u53ef\u4ee5\u83b7\u53d6":81,"\u4f46\u548c\u5355\u5c42rnn\u4e0d\u540c":111,"\u4f46\u5b50\u53e5\u542b\u6709\u7684\u8bcd\u8bed\u6570\u53ef\u4ee5\u4e0d\u76f8\u7b49":113,"\u4f46\u5c3d\u91cf\u8bf7\u4fdd\u6301\u7f16\u8bd1\u548c\u8fd0\u884c\u4f7f\u7528\u7684cudnn\u662f\u540c\u4e00\u4e2a\u7248\u672c":0,"\u4f46\u5e76\u6ca1\u6709\u7ecf\u8fc7\u56de\u5f52\u6d4b\u8bd5":63,"\u4f46\u5e8f\u5217\u8f93\u51fa\u65f6":111,"\u4f46\u622a\u65ad\u65f6\u673a\u4e0d\u540c":81,"\u4f46\u6240\u6709fork\u7684\u7248\u672c\u5e93\u7684\u6240\u6709\u5206\u652f\u90fd\u76f8\u5f53\u4e8e\u7279\u6027\u5206\u652f":63,"\u4f46\u662f":[81,111],"\u4f46\u662f\u53c8\u8fc7\u4e8e\u7410\u788e":46,"\u4f46\u662f\u5728mkl":42,"\u4f46\u662f\u5728paddlepaddle\u4e2d":42,"\u4f46\u662f\u5927\u90e8\u5206\u53c2\u6570\u662f\u4e3a\u5f00\u53d1\u8005\u63d0\u4f9b\u7684":102,"\u4f46\u662f\u5b50\u5e8f\u5217\u7684\u6570\u76ee\u5fc5\u987b\u4e00\u6837":111,"\u4f46\u662f\u5e76\u4e0d\u80fd\u4fdd\u8bc1\u53c2\u6570\u540c\u6b65\u66f4\u65b0":92,"\u4f46\u662f\u652f\u6301avx\u6307\u4ee4\u96c6":72,"\u4f46\u662f\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4e0d\u9700\u8981\u4efb\u4f55\u8f6c\u6362":42,"\u4f46\u662f\u6bcf\u4e2a\u6837\u672c\u4ec5\u5305\u542b\u51e0\u4e2a\u8bcd":105,"\u4f46\u662f\u6ce8\u610f\u7684\u662f":42,"\u4f46\u662f\u7a81\u7136\u6709\u4e00\u4e2a10000\u957f\u7684\u5e8f\u5217":81,"\u4f46\u662f\u865a\u62df\u7684\u4e0d\u4ec5\u4ec5\u662f":0,"\u4f46\u662f\u89e3\u91ca\u6027\u8bed\u8a00":45,"\u4f46\u662f\u8c03\u8bd5python\u4e2d\u4f7f\u7528\u7684\u52a8\u6001\u94fe\u63a5\u5e93\u4e0e\u76f4\u63a5\u8c03\u8bd5\u539f\u59cb\u4e8c\u8fdb\u5236\u76f8\u6bd4\u589e\u52a0\u4e86\u5f88\u591a\u590d\u6742\u5ea6":107,"\u4f46\u662fbatch":81,"\u4f46\u6709\u503c\u7684\u5730\u65b9\u5fc5\u987b\u4e3a1":84,"\u4f46\u6709\u503c\u7684\u90e8\u5206\u53ef\u4ee5\u662f\u4efb\u4f55\u6d6e\u70b9\u6570":84,"\u4f46\u7531\u4e8ecuda\u5e93\u901a\u5e38\u9700\u8981cento":3,"\u4f46\u9700\u6ce8\u610f\u53cd\u5411op\u6ca1\u6709":75,"\u4f46eigen":76,"\u4f5c\u4e3a\u4e0b\u4e00\u4e2a\u5b50\u53e5memory\u7684\u521d\u59cb\u72b6\u6001":111,"\u4f5c\u4e3a\u4f8b\u5b50\u6f14\u793a\u5982\u4f55\u914d\u7f6e\u590d\u6742\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u6a21\u578b":114,"\u4f5c\u4e3a\u53c2\u6570\u5c5e\u6027":75,"\u4f5c\u4e3a\u53c2\u6570\u7684id":83,"\u4f5c\u4e3a\u53e6\u4e00\u4e2a\u7b2c\u4e09\u65b9\u5e93\u96c6\u6210\u8fdbpaddlepaddl":42,"\u4f5c\u4e3a\u5b58\u50a8\u7cfb\u7edf":11,"\u4f5c\u4e3a\u5f53\u524d\u65f6\u523b\u8f93\u5165":113,"\u4f5c\u4e3a\u7c7b\u53e5\u67c4":45,"\u4f5c\u4e3a\u7edf\u8ba1\u7684\u57fa\u672c\u5355\u4f4d":89,"\u4f5c\u4e3a\u8c03\u7528":90,"\u4f5c\u4e3a\u8f93\u5165":89,"\u4f5c\u4e3a\u8f93\u51fa":114,"\u4f5c\u4e3aboot_layer\u4f20\u7ed9\u4e0b\u4e00\u4e2a\u5b50\u53e5\u7684memori":111,"\u4f5c\u7528":110,"\u4f60\u53ef\u4ee5\u5c06\u7f51\u7edc\u914d\u7f6e\u6210\u67d0\u4e9b\u5c42\u4f7f\u7528gpu\u8ba1\u7b97":105,"\u4f60\u8fd8\u53ef\u4ee5\u901a\u8fc7\u8fd0\u884cdjango\u6846\u67b6\u76f4\u63a5\u6fc0\u6d3b\u5de5\u5177\u7684\u670d\u52a1\u5668":77,"\u4f60\u9700\u8981\u4e00\u4e9b\u66f4\u590d\u6742\u7684\u5355\u5143\u6d4b\u8bd5\u6765\u4fdd\u8bc1\u4f60\u5b9e\u73b0\u7684\u7f51\u7edc\u5c42\u662f\u6b63\u786e\u7684":74,"\u4f60\u9700\u8981\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u6307\u5b9a\u8bbe\u5907\u7684id\u53f7":105,"\u4f60\u9700\u8981\u5728\u914d\u7f6ecmake\u65f6\u5c06":74,"\u4f60\u9700\u8981\u628a\u8be5\u6587\u4ef6\u52a0\u5165":74,"\u4f7f\u4e4b\u53d8\u4e3a\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u8f93\u5165":89,"\u4f7f\u4e4b\u53d8\u4e3a\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u8f93\u5165":89,"\u4f7f\u5f97\u5355\u5143\u6d4b\u8bd5\u6709\u4e00\u4e2a\u5e72\u51c0\u7684\u73af\u5883":78,"\u4f7f\u5f97\u642d\u6a21\u578b\u65f6\u66f4\u65b9\u4fbf":74,"\u4f7f\u68af\u5ea6\u7684\u63d0\u4ea4\u548c\u53c2\u6570\u7684\u66f4\u65b0\u6309\u7167\u987a\u5e8f\u65b9\u5f0f\u6267\u884c":92,"\u4f7f\u7528":[42,46,63,74,81,82,83,87,89,90,103,107,108,111,113,114,116],"\u4f7f\u75280\u53f7\u548c1\u53f7gpu\u8ba1\u7b97fc2\u5c42":105,"\u4f7f\u75280\u53f7gpu\u8ba1\u7b97fc2\u5c42":105,"\u4f7f\u75281\u53f7gpu\u8ba1\u7b97fc3\u5c42":105,"\u4f7f\u75282\u53f7\u548c3\u53f7gpu\u8ba1\u7b97fc3\u5c42":105,"\u4f7f\u7528\u4e00\u4e2a\u5c3a\u5ea6\u4e3a":74,"\u4f7f\u7528\u4e00\u4e2a\u8bcd\u524d\u4e24\u4e2a\u8bcd\u548c\u540e\u4e24\u4e2a\u8bcd":81,"\u4f7f\u7528\u4e0a\u6587\u521b\u5efa\u7684yaml\u6587\u4ef6\u521b\u5efakubernet":96,"\u4f7f\u7528\u4e0b\u9762\u547d\u4ee4":11,"\u4f7f\u7528\u4e0b\u9762\u7684\u547d\u4ee4\u6765\u8fd0\u884c\u5b83":77,"\u4f7f\u7528\u4e86\u540c\u6837\u7684parameter\u548cbia":83,"\u4f7f\u7528\u4ee5\u4e0a\u8bad\u7ec3\u597d\u7684\u6a21\u578b\u8fdb\u884c\u9884\u6d4b":84,"\u4f7f\u7528\u52a8\u6001\u5e93":45,"\u4f7f\u7528\u53c2\u6570":[0,91],"\u4f7f\u7528\u540c\u6837\u7684\u8bad\u7ec3\u6570\u636eblock":10,"\u4f7f\u7528\u57fa\u4e8edocker\u5bb9\u5668\u7684\u7f16\u8bd1\u65b9\u5f0f":116,"\u4f7f\u7528\u591a\u5757\u663e\u5361\u8bad\u7ec3":81,"\u4f7f\u7528\u591a\u7ebf\u7a0b\u8bad\u7ec3":81,"\u4f7f\u7528\u5b66\u4e60\u5b8c\u6210\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u751f\u6210\u5e8f\u5217":114,"\u4f7f\u7528\u5b83\u4f1a\u5f00\u542f\u4e00\u4e2ahttp\u670d\u52a1":107,"\u4f7f\u7528\u5bb9\u5668\u65b9\u5f0f\u8fd0\u884c\u8bad\u7ec3\u4efb\u52a1\u7684kubernet":97,"\u4f7f\u7528\u6211\u4eec\u4e4b\u524d\u6784\u9020\u7684\u955c\u50cf":96,"\u4f7f\u7528\u6570\u503c\u6cd5\u68c0\u6d4b\u68af\u5ea6\u6b63\u786e\u6027\u548c\u7a33\u5b9a\u6027":75,"\u4f7f\u7528\u6587\u6863":75,"\u4f7f\u7528\u663e\u5361\u8bad\u7ec3":81,"\u4f7f\u7528\u667a\u80fd\u6307\u9488\u7684\u539f\u56e0\u662f":46,"\u4f7f\u7528\u6848\u4f8b":104,"\u4f7f\u7528\u73af\u5883\u53d8\u91cf":91,"\u4f7f\u7528\u7684\u53c2\u6570\u4e0epaddlepaddle\u7533\u8bf7\u7684buffer\u5171\u7528\u4e00\u5757\u5185\u5b58":42,"\u4f7f\u7528\u76f8\u5bf9\u8def\u5f84\u7684\u5f15\u7528\u65b9\u5f0f":46,"\u4f7f\u7528\u8005\u4e0d\u9700\u8981\u5173\u5fc3":103,"\u4f7f\u7528\u8005\u53ea\u9700\u8981\u5173\u6ce8\u4e8e\u8bbe\u8ba1rnn\u5728\u4e00\u4e2a\u65f6\u95f4\u6b65\u4e4b\u5185\u5b8c\u6210\u7684\u8ba1\u7b97":113,"\u4f7f\u7528\u8005\u65e0\u9700\u5173\u5fc3\u8fd9\u4e2a\u53c2\u6570":103,"\u4f7f\u7528\u8005\u901a\u5e38\u65e0\u9700\u5173\u5fc3":103,"\u4f7f\u7528\u8be5learning_rate_schedule\u65f6":83,"\u4f7f\u7528\u8fd9\u4e2a\u795e\u7ecf\u7f51\u7edc\u53ef\u4ee5\u5b8c\u6210\u5bf9\u65b0\u6570\u636e\u7684\u9884\u6d4b":10,"\u4f7f\u7528\u8fd9\u79cd\u65b9\u5f0f":[90,111],"\u4f7f\u7528\u8fdc\u7a0b\u7a00\u758f\u65b9\u5f0f\u8bad\u7ec3\u65f6":74,"\u4f7f\u7528\u9759\u6001\u5e93\u548c\u52a8\u6001\u5e93\u96be\u5ea6\u5dee\u4e0d\u591a":45,"\u4f7f\u7528c":[46,87],"\u4f7f\u7528c99\u505a\u63a5\u53e3":45,"\u4f7f\u7528c99\u800c\u4e0d\u4f7f\u7528c11\u7684\u539f\u56e0\u662f":45,"\u4f7f\u7528c99\u800c\u4e0d\u4f7f\u7528c89":45,"\u4f7f\u7528checkgrad\u6a21\u5f0f\u65f6\u7684\u53c2\u6570\u53d8\u5316\u5927\u5c0f":103,"\u4f7f\u7528cmake\u7684\u8bdd":107,"\u4f7f\u7528cpu\u4e24\u7ebf\u7a0b\u8ba1\u7b97fc4\u5c42":105,"\u4f7f\u7528cpu\u8ba1\u7b97fc4\u5c42":105,"\u4f7f\u7528docker":1,"\u4f7f\u7528docker\u5b89\u88c5\u548c\u8fd0\u884cpaddlepaddle\u53ef\u4ee5\u65e0\u9700\u8003\u8651":1,"\u4f7f\u7528docker\u5b89\u88c5\u8fd0\u884c":2,"\u4f7f\u7528docker\u5c31\u4e0d\u7528\u914d\u7f6e\u4ea4\u53c9\u7f16\u8bd1\u73af\u5883\u4e86":0,"\u4f7f\u7528docker\u6784\u5efapaddlepaddle\u7684\u6587\u6863":77,"\u4f7f\u7528eigen\u8fdb\u884c\u77e9\u9635\u8ba1\u7b97":116,"\u4f7f\u7528fabric\u542f\u52a8\u96c6\u7fa4\u8bad\u7ec3":94,"\u4f7f\u7528init":105,"\u4f7f\u7528lstm\u4f5c\u4e3aencod":111,"\u4f7f\u7528memory\u7684rnn\u5b9e\u73b0\u4fbf\u5982\u4e0b\u56fe\u6240\u793a":111,"\u4f7f\u7528model":105,"\u4f7f\u7528openblas\u7684\u955c\u50cf":1,"\u4f7f\u7528openblas\u8fdb\u884c\u77e9\u9635\u8ba1\u7b97":116,"\u4f7f\u7528paddlepaddl":[87,90],"\u4f7f\u7528pip\u5b89\u88c5":2,"\u4f7f\u7528rdma\u8fd8\u662ftcp\u4f20\u8f93\u534f\u8bae":103,"\u4f7f\u7528regress":63,"\u4f7f\u7528swig\u53ea\u652f\u6301cpython\u89e3\u91ca\u5668":45,"\u4f7f\u7528swig\u9700\u8981\u591a\u8bed\u8a00\u7ed1\u5b9a\u7684\u5f00\u53d1\u4eba\u5458\u719f\u7ec3\u638c\u63e1swig\u914d\u7f6e":45,"\u4f7f\u7528void":45,"\u4f7f\u8be5\u5c42\u7684\u53c2\u6570\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4fdd\u6301\u4e0d\u53d8":83,"\u4f8b\u5982":[0,11,45,46,63,74,81,82,84,89,91,97,102,103,105,108,111,114],"\u4f8b\u5982\u4e0b\u56fe\u4e2d":107,"\u4f8b\u5982\u4e0b\u9762\u4ee3\u7801":81,"\u4f8b\u5982\u4e5f\u53ef\u5728\u7a0b\u5e8f\u8fd0\u884c\u8fc7\u7a0b\u4e2d\u518d\u52a0\u8f7d\u53e6\u5916\u4e00\u4e2a\u6a21\u578b":90,"\u4f8b\u5982\u4f7f\u7528":81,"\u4f8b\u5982\u542b\u6709\u591a\u4e2a\u901a\u9053\u7684\u56fe\u7247":89,"\u4f8b\u5982\u5728deepspeech2":41,"\u4f8b\u5982\u5bf9\u4e8ejava\u6216\u8005python":45,"\u4f8b\u5982\u5bf9\u4e8ejava\u6765\u8bf4":45,"\u4f8b\u5982\u5bf9\u4e8epython":45,"\u4f8b\u5982\u5c06\u7b2c\u4e00\u6761\u6570\u636e\u8f6c\u5316\u4e3a":111,"\u4f8b\u5982\u6587\u672c\u5206\u7c7b\u4e2d":111,"\u4f8b\u5982\u672c\u4f8b\u4e2d\u7684\u4e24\u4e2a\u7279\u5f81":111,"\u4f8b\u5982\u673a\u5668\u4e0a\u67094\u5757gpu":81,"\u4f8b\u5982c":45,"\u4f8b\u5982java\u4e0epython\u7684\u9519\u8bef\u5904\u7406\u662f\u76f4\u63a5\u6254\u51fa\u6765except":45,"\u4f8b\u5982output\u76ee\u5f55\u4e0b\u5c31\u5b58\u653e\u4e86\u8f93\u51fa\u7ed3\u679c":97,"\u4f8b\u5982python\u53ef\u4ee5\u4f7f\u7528":45,"\u4f8b\u5982python\u7684":45,"\u4f8b\u5982rnn":41,"\u4f8b\u5982sigmoid":74,"\u4f8b\u5b50\u4e2d\u4e3a3\u4e2a":91,"\u4f8b\u5b50\u4e2d\u662f":74,"\u4f8b\u5b50\u4e2d\u662f0":74,"\u4f8b\u5b50\u4e2d\u662f100":74,"\u4f8b\u5b50\u4e2d\u662f4096":74,"\u4f8b\u5b50\u4e2d\u662f8192":74,"\u4f8b\u5b50\u4e2d\u662ffc":74,"\u4f8b\u5b50\u4e2d\u662fsoftmax":74,"\u4f9bpaddlepaddle\u52a0\u8f7d":103,"\u4f9d\u636e\u662f\u5426\u5305\u542bkernel":75,"\u4f9d\u6b21\u7c7b\u63a8":63,"\u4f9d\u8d56":[0,3],"\u4f9d\u8d56\u73af\u5883\u5373\u53ef\u8fd0\u884c":1,"\u4f9d\u8d56libpython2":0,"\u4fbf\u548c\u5355\u5c42rnn\u914d\u7f6e\u4e2d\u7684":111,"\u4fbf\u662f\u5c06\u9759\u6001\u5e93\u52a0\u5165jvm\u4e2d":45,"\u4fdd\u5b58\u6a21\u578b\u53c2\u6570\u7684\u76ee\u5f55":103,"\u4fdd\u5b58\u7684\u53c2\u6570\u4e5f\u662ffloat\u7c7b\u578b":83,"\u4fdd\u5b58\u7f51\u7edc\u5c42\u8f93\u51fa\u7ed3\u679c\u7684\u76ee\u5f55":103,"\u4fdd\u5b58\u9884\u6d4b\u7ed3\u679c\u7684\u6587\u4ef6\u540d":103,"\u4fdd\u6301\u5c3d\u91cf\u5c11\u7684commit":72,"\u4fdd\u8bc1\u4f7f\u7528gpu\u8bad\u7ec3\u65f6\u4e5f\u53ef\u4ee5\u83b7\u5f97":81,"\u4fe1\u53f7\u6765\u81ea\u52a8\u7ec8\u6b62\u5b83\u542f\u52a8\u7684\u6240\u6709\u8fdb\u7a0b":93,"\u4fe1\u606f":89,"\u4fee\u590d\u6240\u6709bug\u540e":63,"\u4fee\u590ddocker\u7f16\u8bd1\u955c\u50cf\u95ee\u9898":63,"\u4fee\u6539":[42,63,96],"\u4fee\u6539\u542f\u52a8\u811a\u672c\u540e":96,"\u4fee\u6539\u6210":63,"\u4fee\u6539\u6210\u66f4\u5feb\u7684\u7248\u672c":108,"\u503c\u5f97\u6ce8\u610f\u7684\u662f":[72,111],"\u503c\u5f97\u6df1\u5165\u5206\u6790":108,"\u503c\u7c7b\u578b":105,"\u5047\u5982\u6211\u4eec\u662f\u4e09\u5206\u7c7b\u95ee\u9898":83,"\u5047\u8bbe":74,"\u5047\u8bbe\u60a8\u5df2\u7ecf\u5728\u5f53\u524d\u76ee\u5f55":1,"\u5047\u8bbe\u635f\u5931\u51fd\u6570\u662f":74,"\u5047\u8bbe\u7b2c\u4e00\u4e2alayer\u7684\u8f93\u51faa\u662f\u4e00\u4e2a":81,"\u504f\u7f6e\u53c2\u6570\u7684\u5927\u5c0f":74,"\u505a\u4e00\u4e2a\u4ecb\u7ecd":76,"\u505a\u53ea\u8bfb\u6302\u8f7d":11,"\u505a\u5982\u4e0b\u51e0\u4e2a\u64cd\u4f5c":63,"\u505a\u63a5\u53e3":45,"\u505a\u68af\u5ea6\u68c0\u6d4b":75,"\u505a\u68c0\u67e5":75,"\u505c\u6b62\u4fdd\u5b58\u68c0\u67e5\u70b9\u7684\u7ebf\u7a0b":10,"\u505c\u6b62\u52a0\u8f7d\u6570\u636e":103,"\u5141\u8bb8\u5916\u7f51\u8bbf\u95ee\u8fd9\u4e2ahttp\u670d\u52a1":107,"\u5143\u7d20":110,"\u5143\u7d20\u4e4b\u95f4\u7684\u987a\u5e8f\u662f\u5e8f\u5217\u6240\u643a\u5e26\u7684\u91cd\u8981\u4fe1\u606f":89,"\u5143\u7d20\u4e4b\u95f4\u7684\u987a\u5e8f\u662f\u91cd\u8981\u7684\u8f93\u5165\u4fe1\u606f":110,"\u5145\u5206\u53d1\u6325\u82f1\u7279\u5c14\u5e73\u53f0\u7684\u4f18\u52bf":41,"\u5145\u5206\u5c55\u73b0\u82f1\u7279\u5c14\u5e73\u53f0\u7684\u4f18\u52bf":42,"\u5148\u4ece\u5355\u7ebf\u7a0b\u5f00\u59cb":107,"\u5148\u5378\u8f7d\u4e4b\u524d\u7684\u7248\u672c":0,"\u5148\u5b8c\u6210\u5bf9\u6743\u91cd\u7684packing\u64cd\u4f5c":41,"\u5148\u5b9e\u73b0\u6a21\u578b\u63a8\u65ad\u7684api":46,"\u5148\u627e\u51fa\u53c2\u6570":82,"\u5148\u67e5\u770b\u4e00\u4e0b\u662f\u5426\u66fe\u7ecf\u5b89\u88c5\u8fc7paddl":78,"\u5148\u68c0\u67e5\u5173\u952e\u8def\u5f84\u7684\u6027\u80fd\u95ee\u9898":107,"\u514b\u9686\u4e0b\u9762":118,"\u5168\u8fde\u63a5\u5c42\u4ee5\u4e00\u4e2a\u7ef4\u5ea6\u4e3a":74,"\u5168\u8fde\u63a5\u5c42\u6ca1\u6709\u7f51\u7edc\u5c42\u914d\u7f6e\u7684\u8d85\u53c2\u6570":74,"\u5168\u8fde\u63a5\u5c42\u7684\u5b9e\u73b0\u4f4d\u4e8e":74,"\u5168\u8fde\u63a5\u5c42\u7684\u6bcf\u4e2a\u8f93\u51fa\u90fd\u8fde\u63a5\u5230\u4e0a\u4e00\u5c42\u7684\u6240\u6709\u7684\u795e\u7ecf\u5143\u4e0a":74,"\u5168\u8fde\u63a5\u5c42python\u5c01\u88c5\u7684\u4f8b\u5b50\u4e2d\u5305\u542b\u4e0b\u9762\u51e0\u6b65":74,"\u516c\u5f0f":1,"\u5171\u4eab\u4e00\u4e2aop\u5b9a\u4e49":75,"\u5171\u4eab\u5185\u5b58":42,"\u5171\u4eab\u540c\u4e00\u4e2a\u6743\u91cd":41,"\u5171\u4eab\u540c\u4e00\u4e2akernel\u65f6":75,"\u5171\u4eab\u5b58\u50a8\u6302\u5728\u7684\u8def\u5f84":97,"\u5173\u4e8e\u4ec0\u4e48\u662f":89,"\u5173\u4e8e\u5728paddlepaddle\u4e2d\u5982\u4f55\u4f7f\u7528eigen\u5e93":75,"\u5173\u4e8e\u65f6\u95f4\u5e8f\u5217":111,"\u5173\u4e8e\u6784\u5efa\u548c\u6d4b\u8bd5\u7684\u66f4\u591a\u4fe1\u606f":72,"\u5173\u4e8eavx":1,"\u5173\u4e8ec":88,"\u5173\u4e8eeigen":76,"\u5173\u4e8elstm":82,"\u5173\u4e8epaddlepaddle\u7684\u5206\u5e03\u5f0f\u8bad\u7ec3":97,"\u5173\u4e8epaddlepaddle\u7684\u66f4\u591a\u4f7f\u7528\u65b9\u6cd5\u8bf7\u53c2\u8003":84,"\u5173\u4e8eunbound":113,"\u5173\u952e\u8bcd\u5305\u62ec":72,"\u5176\u4e2d":[45,63,74,81,83,84,107,114,116,118],"\u5176\u4e2d\u5305\u542b\u4e86\u7528\u6237\u7684\u8bad\u7ec3\u7a0b\u5e8f":91,"\u5176\u4e2d\u5305\u542b\u6240\u4f9d\u8d56\u7684\u6240\u6709\u7b2c\u4e09\u65b9\u5e93":117,"\u5176\u4e2d\u5305\u542b\u6240\u6709c":117,"\u5176\u4e2d\u5305\u542bpaddlepaddle\u7684c":117,"\u5176\u4e2d\u6bcf\u4e2a\u5143\u7d20\u662f\u53cc\u5c42\u5e8f\u5217\u4e2d\u6bcf\u4e2asubseq\u6700\u540e\u4e00\u4e2a":110,"\u5176\u4e2dcheckgrad\u4e3b\u8981\u4e3a\u5f00\u53d1\u8005\u4f7f\u7528":103,"\u5176\u4e2dmean\u548cstd\u662f\u8bad\u7ec3\u914d\u7f6e\u4e2d\u7684\u53c2\u6570":103,"\u5176\u4e2dx\u8868\u793a\u8f93\u5165\u6570\u636e\u662f\u4e00\u4e2a\u7ef4\u5ea6\u4e3a2\u7684\u7a20\u5bc6\u5411\u91cf":84,"\u5176\u4e3b\u8981\u63a5\u53e3\u5982\u4e0b":76,"\u5176\u4ed6\u4eba\u53ef\u4ee5\u590d\u73b0\u95ee\u9898\u4ee5\u4fbf\u5e2e\u52a9":0,"\u5176\u4ed6\u5185\u5b58\u6742\u9879":81,"\u5176\u4ed6\u5185\u5b58\u6742\u9879\u662f\u6307paddlepaddle\u672c\u8eab\u6240\u7528\u7684\u4e00\u4e9b\u5185\u5b58":81,"\u5176\u4ed6\u51fd\u6570\u5747\u8fd4\u56de":46,"\u5176\u4ed6\u6240\u6709\u5c42\u90fd\u4f1a\u4f7f\u7528gpu\u8ba1\u7b97":105,"\u5176\u4ed6\u7528\u6237\u7684fork\u7248\u672c\u5e93\u5e76\u4e0d\u9700\u8981\u4e25\u683c\u9075\u5b88":63,"\u5176\u4ed6\u7684\u4f9d\u8d56\u8f6f\u4ef6":0,"\u5176\u4ed6\u914d\u7f6e\u53c2\u6570":[116,117],"\u5176\u4ed6\u9ad8\u7ea7\u529f\u80fd\u5305\u62ec\u5b9a\u4e49\u591a\u4e2amemori":114,"\u5176\u4f1a\u81ea\u52a8\u88ab\u52a0\u5165\u7f16\u8bd1\u5217\u8868":74,"\u5176\u547d\u4ee4\u5982\u4e0b":107,"\u5176\u5b83\u53ef\u9009\u7f16\u8bd1\u9009\u9879\u6309\u9700\u8fdb\u884c\u8bbe\u5b9a":87,"\u5176\u5b83layer\u7684\u8f93\u51fa":113,"\u5176\u5b9e\u4e5f\u662f\u548c\u6bcf\u4e2amini":81,"\u5176\u6b21":111,"\u5176\u8bf4\u660e\u5982\u4e0b":111,"\u5176\u8f6c\u6362\u6b21\u6570\u51cf\u5c11\u81f3":41,"\u5176\u8f93\u51fa\u88ab\u7528\u4f5cmemory\u7684\u521d\u59cb\u503c":114,"\u5176name\u7531\u53c2\u6570":82,"\u5177\u4f53\u4f7f\u7528\u65b9\u6cd5\u4e3a":[46,81],"\u5177\u4f53\u505a\u6cd5\u8bf7\u53c2\u8003":0,"\u5177\u4f53\u539f\u56e0\u53c2\u8003":46,"\u5177\u4f53\u53ef\u4ee5\u53c2\u8003":[74,81],"\u5177\u4f53\u53ef\u4ee5\u53c2\u8003mkl":42,"\u5177\u4f53\u53ef\u53c2\u8003\u6587\u6863":113,"\u5177\u4f53\u5b9e\u73b0\u65b9\u5f0f\u6bd4\u5982":[41,42],"\u5177\u4f53\u60c5\u51b5\u56e0\u4eba\u800c\u5f02":108,"\u5177\u4f53\u64cd\u4f5c\u5982\u4e0b":78,"\u5177\u4f53\u6b65\u9aa4\u5982\u4e0b":78,"\u5177\u4f53\u7684\u5b8c\u6210\u72b6\u6001\u53ef\u4ee5\u53c2\u89c1":42,"\u5177\u4f53\u7684\u89e3\u51b3\u65b9\u6cd5\u662f":78,"\u5177\u4f53\u8bf7\u53c2\u8003":[46,72],"\u5177\u4f53\u8bf7\u89c1":72,"\u5177\u6709\u76f8\u540c\u7684\u7ed3\u679c\u4e86":111,"\u5185":114,"\u5185\u5b58":108,"\u5185\u5b58\u4e0d\u8db3":79,"\u5185\u5b58\u5bb9\u9650\u9608\u503c":103,"\u5185\u5bb9":75,"\u5185\u5bb9\u5982\u4e0b":96,"\u5185\u5c42\u5e8f\u5217\u5728":89,"\u5185\u5c42inner_step\u7684recurrent_group\u548c\u5355\u5c42\u5e8f\u5217\u7684\u51e0\u4e4e\u4e00\u6837":111,"\u5185\u5df2\u7ecf\u5305\u542bpaddlepaddle\u7684\u6267\u884c\u7a0b\u5e8f\u4f46\u662f\u8fd8\u6ca1\u4e0a\u8ff0\u529f\u80fd":97,"\u5185\u7f6e\u7684":90,"\u5185\u90e8":[90,97],"\u5185\u90e8\u5b58\u50a8":42,"\u5185\u90e8\u7531":[89,90],"\u5185\u90e8\u9a71\u52a8python\u89e3\u91ca\u5668\u8fdb\u884c\u6a21\u578b\u914d\u7f6e\u89e3\u6790\u548c\u6570\u636e\u8bfb\u53d6":45,"\u518d\u4ee5":75,"\u518d\u505a\u4e00\u5b9a\u7684reshap":82,"\u518d\u5199\u5165\u7f51\u7edc\u53c2\u6570":83,"\u518d\u5728\u6bcf\u4e00\u4e2aapi\u4e2d\u81ea\u5df1\u68c0\u67e5\u7c7b\u578b":45,"\u518d\u57fa\u4e8e":63,"\u518d\u5b89\u88c5":[3,78],"\u518d\u5bf9\u6bcf\u4e00\u4e2a\u5355\u5c42\u65f6\u95f4\u5e8f\u5217\u8fdb\u884c\u5904\u7406":111,"\u518d\u5bf9\u6bcf\u4e00\u53e5\u8bdd\u7684\u7f16\u7801\u5411\u91cf\u7528lstm\u7f16\u7801\u6210\u4e00\u4e2a\u6bb5\u843d\u7684\u5411\u91cf":111,"\u518d\u5bf9\u8fd9\u4e2a\u6bb5\u843d\u5411\u91cf\u8fdb\u884c\u5206\u7c7b":111,"\u518d\u5c06\u66f4\u65b0\u540e\u7684\u53c2\u6570\u4e0b\u53d1\u5230\u6bcf\u4e2a\u8ba1\u7b97\u8282\u70b9":92,"\u518d\u5f00\u542f\u591a\u7ebf\u7a0b":107,"\u518d\u628a\u5df2\u8f6c\u6362\u4e3apacked\u683c\u5f0f\u7684\u6570\u636e\u4f20\u9012\u7ed9\u90a3\u4e9b\u590d\u7528\u540c\u4e00\u6570\u636e\u7684gemm":41,"\u518d\u6307\u5b9a":0,"\u518d\u68c0\u67e5\u5176\u4ed6\u90e8\u5206\u7684\u6027\u80fd\u95ee\u9898":107,"\u518d\u6b21\u5bf9\u4ee3\u7801\u8fdb\u884c\u6027\u80fd\u5206\u6790":108,"\u518d\u6b21\u8fdb\u884c\u6027\u80fd\u5206\u6790":107,"\u518d\u7528\u8fd9\u4e2a\u68af\u5ea6\u53bb\u548c":74,"\u518d\u901a\u8fc7\u51fd\u6570":97,"\u518d\u91cd\u65b0\u5b89\u88c5":0,"\u5199\u4ee3\u7801":45,"\u5199\u5165\u5feb\u7167\u6570\u636e":10,"\u5199\u5165\u6587\u4ef6\u4e2d":90,"\u5199\u68af\u5ea6\u68c0\u67e5\u5355\u5143\u6d4b\u8bd5\u662f\u4e00\u4e2a\u9a8c\u8bc1\u65b0\u5b9e\u73b0\u7684\u5c42\u662f\u5426\u6b63\u786e\u7684\u76f8\u5bf9\u7b80\u5355\u7684\u529e\u6cd5":74,"\u5199\u7684":107,"\u51c6\u5907\u60a8\u7684\u8ba1\u7b97\u96c6\u7fa4":101,"\u51c6\u5907\u8bad\u7ec3\u6570\u636e":98,"\u51c6\u5907\u8bad\u7ec3\u6570\u636e\u548c\u9a8c\u8bc1\u6570\u636e\u96c6":91,"\u51c6\u5907\u9884\u6d4b\u6a21\u578b\u548c":90,"\u51c6\u5907\u9884\u6d4b\u6a21\u578b\u90e8\u5206":90,"\u51cf\u5c0f\u5e8f\u5217\u7684\u957f\u5ea6":81,"\u51cf\u5c0f\u8fd9\u4e2a\u5185\u5b58\u6c60\u5373\u53ef\u51cf\u5c0f\u5185\u5b58\u5360\u7528":81,"\u51cf\u5c0fbatch":81,"\u51e0\u53f0\u5230\u51e0\u5343\u53f0\u89c4\u6a21":101,"\u51fa\u73b0":78,"\u51fa\u73b0\u4ee5\u4e0b\u9519\u8bef":83,"\u51fa\u73b0\u8be5\u9519\u8bef\u7684\u539f\u56e0\u4e00\u822c\u662f\u7528\u6237\u5bf9\u4e0d\u540clayer\u7684\u53c2\u6570":82,"\u51fa\u73b0\u8fd9\u4e2a\u95ee\u9898\u7684\u4e3b\u8981\u539f\u56e0\u662f":[3,78],"\u51fd\u6570":[41,42,74,89,107,108,114],"\u51fd\u6570\u4e2d\u64cd\u4f5c\u7684\u91cd\u8981\u53d8\u91cf\u7684\u8be6\u7ec6\u89e3\u91ca":75,"\u51fd\u6570\u5047\u8bbe":114,"\u51fd\u6570\u52a0\u5230\u4ee3\u7801\u4e2d":108,"\u51fd\u6570\u5373\u53ef\u5b8c\u6210\u8f6c\u6362":11,"\u51fd\u6570\u53ea\u5173\u6ce8\u4e8ernn\u4e00\u4e2a\u65f6\u95f4\u6b65\u4e4b\u5185\u7684\u8ba1\u7b97":113,"\u51fd\u6570\u540d":107,"\u51fd\u6570\u540d\u4e3a":46,"\u51fd\u6570\u547d\u540d":45,"\u51fd\u6570\u5b9a\u4e49\u8f93\u5165":75,"\u51fd\u6570\u5b9e\u9645\u4f7f\u7528\u7684\u603b\u65f6\u95f4":107,"\u51fd\u6570\u5f97\u5230\u7684\u68af\u5ea6\u53bb\u5bf9\u6bd4":74,"\u51fd\u6570\u5fc5\u987b\u5148\u8c03\u7528\u57fa\u7c7b\u4e2d\u7684\u51fd\u6570":74,"\u51fd\u6570\u5fc5\u987b\u8fd4\u56de\u4e00\u4e2a\u6216\u591a\u4e2alayer\u7684\u8f93\u51fa":113,"\u51fd\u6570\u603b\u65f6\u95f4":107,"\u51fd\u6570\u6307\u51fa\u4e86\u5728\u8bad\u7ec3\u65f6\u9700\u8981\u4ece\u53c2\u6570\u670d\u52a1\u5668\u53d6\u51fa\u7684\u884c":74,"\u51fd\u6570\u6765\u5c06\u4fe1\u606f\u8f93\u51fa\u5230\u754c\u9762\u4e2d":108,"\u51fd\u6570\u7684\u5b9e\u73b0\u662f\u6b63\u786e\u7684":74,"\u51fd\u6570\u7684\u5f00\u5934\u5fc5\u987b\u8c03\u7528":74,"\u51fd\u6570\u7684\u603b\u5171\u8017\u65f6\u5f88\u957f":107,"\u51fd\u6570\u7684\u8c03\u7528\u6b21\u6570":107,"\u51fd\u6570\u80fd\u591f\u5c06\u4f7f\u7528":90,"\u51fd\u6570\u91cc\u5b9e\u73b0":75,"\u5206\u4e3a":90,"\u5206\u522b\u4e3a\u6570\u636e\u8f93\u5165\u6dfb\u52a0\u5916\u5c42\u5e8f\u5217\u548c\u5185\u5c42\u5e8f\u5217\u7684\u5e8f\u5217\u4fe1\u606f":89,"\u5206\u522b\u4ece\u8bcd\u8bed\u548c\u53e5\u5b50\u7ea7\u522b\u7f16\u7801\u8f93\u5165\u6570\u636e":113,"\u5206\u522b\u4ee3\u8868\u8f93\u5165\u6570\u636e":42,"\u5206\u522b\u4f7f\u7528\u5355\u53cc\u5c42rnn\u4f5c\u4e3a\u7f51\u7edc\u914d\u7f6e\u7684\u6a21\u578b":111,"\u5206\u522b\u5b9a\u4e49\u5b50\u53e5\u7ea7\u522b\u548c\u8bcd\u8bed\u7ea7\u522b\u4e0a\u9700\u8981\u5b8c\u6210\u7684\u8fd0\u7b97":113,"\u5206\u522b\u5bf9\u5e94capi":63,"\u5206\u522b\u662f":110,"\u5206\u522b\u662frnn\u72b6\u6001\u548c\u8f93\u5165\u7684\u53d8\u6362\u77e9\u9635":114,"\u5206\u522b\u662fsentences\u548clabel":111,"\u5206\u522b\u662fwords\u548clabel":111,"\u5206\u522b\u8ba1\u7b97\u6bcf\u4e2a\u53c2\u6570\u7684\u68af\u5ea6":74,"\u5206\u522b\u8fdb\u884c\u5e8f\u5217\u64cd\u4f5c":111,"\u5206\u5e03\u5f0f\u5b58\u50a8\u670d\u52a1":10,"\u5206\u5e03\u5f0f\u8bad\u7ec3":106,"\u5206\u5e03\u5f0f\u8bad\u7ec3\u67b6\u6784\u5982\u4e0b\u56fe\u6240\u793a":92,"\u5206\u652f":[63,72],"\u5206\u652f\u4e00\u65e6\u5efa\u7acb":63,"\u5206\u652f\u4e0a":72,"\u5206\u652f\u4e0a\u521b\u5efa\u65b0\u5206\u652f":72,"\u5206\u652f\u4e2d":63,"\u5206\u652f\u4e3a\u5f00\u53d1":63,"\u5206\u652f\u4e3a\u6bcf\u4e00\u6b21release\u65f6\u5efa\u7acb\u7684\u4e34\u65f6\u5206\u652f":63,"\u5206\u652f\u4e3a\u7a33\u5b9a":63,"\u5206\u652f\u529f\u80fd\u7684\u5c01\u95ed":63,"\u5206\u652f\u5408\u5165":63,"\u5206\u652f\u5408\u5165master\u5206\u652f":63,"\u5206\u652f\u540c\u6b65\u4e3b\u7248\u672c\u5e93\u7684":63,"\u5206\u652f\u540d":72,"\u5206\u652f\u540d\u4e3a":63,"\u5206\u652f\u5b58\u5728\u7684\u65f6\u5019":63,"\u5206\u652f\u6d3e\u751f\u51fa\u65b0\u7684\u5206\u652f":63,"\u5206\u652f\u7528\u6765\u6d4b\u8bd5\u53ea\u9700\u8981\u8ba1\u7b97\u4e00\u4e2a\u8f93\u5165\u68af\u5ea6\u7684\u60c5\u51b5":75,"\u5206\u652f\u7684\u7248\u672c\u90fd\u662f\u7ecf\u8fc7\u5355\u5143\u6d4b\u8bd5\u548c\u56de\u5f52\u6d4b\u8bd5\u7684\u7248\u672c":63,"\u5206\u652f\u7684\u7248\u672c\u90fd\u7ecf\u8fc7\u5355\u5143\u6d4b\u8bd5":63,"\u5206\u652f\u89c4\u8303":72,"\u5206\u6790\u5f97\u5230\u7684\u4fe1\u606f\u7528\u4e8e\u534f\u52a9\u8fdb\u884c\u7a0b\u5e8f\u7684\u4f18\u5316":108,"\u5206\u7247":10,"\u5206\u7c7b\u4efb\u52a1\u4e2d\u7c7b\u522b\u6807\u7b7e":89,"\u5206\u914d\u5230\u5f53\u524d\u6570\u636e\u5757\u6837\u672c\u6570\u7684\u56db\u5206\u4e4b\u4e00":103,"\u5207\u6362\u5230":72,"\u5207\u6362\u5230\u6240\u5efa\u5206\u652f":72,"\u5217\u5143\u7d20\u6392\u5217\u6210\u7684\u77e9\u5f62\u9635\u5217":89,"\u5217\u540d":107,"\u5217\u8868\u5982\u4e0b":84,"\u5219\u4e0d\u9700\u8981\u91cd\u5199\u8be5\u51fd\u6570":74,"\u5219\u4f1a\u4f7f\u7528openblas\u4f5c\u4e3ablas\u5e93":0,"\u5219\u4f7f\u7528":117,"\u5219\u4f7f\u7528\u540c\u6b65\u8bad\u7ec3":103,"\u5219\u4f7f\u7528\u542f\u52a8\u53c2\u6570\u5b9a\u4e49\u7684\u521d\u59cb\u5316\u65b9\u6cd5\u521d\u59cb\u5316\u53c2\u6570":10,"\u5219\u4f7f\u7528\u8be5\u53c2\u6570\u4f5c\u4e3a\u9ed8\u8ba4\u503c":103,"\u5219\u53ef\u8bbe\u7f6e":[117,118],"\u5219\u5e76\u4e0d\u4f1a\u7b49\u5f85\u6240\u6709trainer\u63d0\u4ea4\u68af\u5ea6\u624d\u66f4\u65b0\u53c2\u6570":92,"\u5219\u5ffd\u7565":10,"\u5219\u603b\u4f1a\u663e\u793a\u963b\u9694\u6458\u8981\u4fe1\u606f":103,"\u5219\u628a\u53e6\u4e00\u4e2a\u6162\u901f\u7684kill\u6389":10,"\u5219\u662f\u5e26gui\u7684nvidia\u53ef\u89c6\u5316\u6027\u80fd\u5206\u6790\u5de5\u5177":108,"\u5219\u663e\u793a\u963b\u9694\u6027\u80fd\u7684\u6458\u8981\u4fe1\u606f":103,"\u5219\u76f4\u63a5\u5f15\u5165\u53e6\u4e00\u79cd\u7c7b\u578b\u7684\u5934\u6587\u4ef6":46,"\u5219\u8bbe\u7f6e\u6210":[116,118],"\u5219\u9700\u8981\u4f7f\u7528\u7b49\u4e8e\u6743\u91cd\u53c2\u6570\u89c4\u6a21\u5927\u7ea65\u500d\u7684\u5185\u5b58":81,"\u5219\u9700\u8981\u5206\u522b\u7f16\u8bd1\u771f\u673a\u548c\u6a21\u62df\u5668\u7248\u672c":117,"\u5219\u9700\u8981\u56de\u6eda\u5230\u4e0a\u4e00\u4e2a\u68c0\u67e5\u70b9":10,"\u5219\u9700\u8981\u5728\u672c\u673a\u5b89\u88c5\u4e0b\u9762\u7ae0\u8282\u5217\u51fa\u7684":0,"\u5219\u9700\u8981\u624b\u52a8\u62f7\u8d1d\u5c5e\u4e8e\u6bcf\u4e2atrainer\u8282\u70b9\u7684\u8bad\u7ec3\u6570\u636e\u5230\u5bf9\u5e94\u7684\u8282\u70b9\u4e0a":91,"\u521b\u5efa":[42,89,90],"\u521b\u5efa\u4e00\u4e2a":86,"\u521b\u5efa\u4e00\u4e2akubernet":97,"\u521b\u5efa\u5e76\u5207\u6362\u5230\u65b0\u5206\u652f":72,"\u521b\u5efa\u6210\u529f\u540e":97,"\u521b\u5efa\u65e5\u5fd7\u76ee\u5f55":98,"\u521b\u5efa\u7a00\u758f\u77e9\u9635\u65f6\u9700\u8981\u663e\u793a\u5730\u6307\u5b9a\u77e9\u9635\u7684":89,"\u521b\u5efaissu":2,"\u521d\u59cb\u5316\u504f\u7f6e\u5411\u91cf":74,"\u521d\u59cb\u5316\u6743\u91cd\u8868":74,"\u521d\u59cb\u5316\u6a21\u578b\u7684\u8def\u5f84":103,"\u521d\u59cb\u5316\u7236\u7c7b":74,"\u521d\u59cb\u5316biases_":74,"\u521d\u59cb\u72b6\u6001":113,"\u5220\u9664":72,"\u5220\u9664\u78c1\u76d8\u76ee\u5f55\u4e2d\u4e0d\u662f\u5f53\u524duuid\u7684\u5feb\u7167\u6587\u4ef6":10,"\u5224\u65ad\u662f\u5426\u5b89\u88c5\u6210\u529f":117,"\u5229\u7528\u5206\u5e03\u5f0f\u8bad\u7ec3\u9a7e\u9a6d\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90":81,"\u5229\u7528\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90\u53ef\u4ee5\u5206\u4e3a\u4ee5\u4e0b\u51e0\u4e2a\u65b9\u5f0f\u6765\u8fdb\u884c":81,"\u5229\u7528\u8fd9\u79cd\u7279\u6027":113,"\u522b\u4eba\u5e2e\u4e86\u5fd9":72,"\u522b\u5fd8\u4e86":0,"\u5230":[10,78,114],"\u5230\u6307\u5b9a\u6587\u4ef6\u4e2d":90,"\u5230\u672c\u5730":72,"\u5230\u6b64":75,"\u5230\u7b2c\u4e8c\u6b65":63,"\u5236\u4f5c\u65b0\u955c\u50cf\u6765\u5b8c\u6210\u4ee5\u4e0a\u7684\u5de5\u4f5c":97,"\u5236\u4f5cpaddlepaddle\u955c\u50cf":97,"\u524d\u4e00\u7bc7\u6587\u7ae0\u4ecb\u7ecd\u4e86\u5982\u4f55\u5728kubernetes\u96c6\u7fa4\u4e0a\u542f\u52a8\u4e00\u4e2a\u5355\u673apaddlepaddle\u8bad\u7ec3\u4f5c\u4e1a":97,"\u524d\u540e\u7684\u7f51\u7edc\u6027\u80fd":41,"\u524d\u5411\u4f20\u64ad":74,"\u524d\u5411\u4f20\u64ad\u7ed9\u5b9a\u8f93\u5165":74,"\u524d\u5411\u548c\u540e\u5411":74,"\u524d\u5411\u8ba1\u7b97\u4e4b\u540epaddlepaddle\u5185\u90e8\u5df2\u7ecf\u5206\u914d":90,"\u524d\u5411op\u5b9e\u73b0\u5b8c\u6210":75,"\u524d\u8005\u5728":81,"\u524d\u8005\u5b58\u50a8op\u7684\u8f93\u5165\u8f93\u51fa\u548c\u53c2\u6570\u5c5e\u6027":75,"\u524d\u8005\u622a\u65ad\u53ef\u5b66\u4e60\u53c2\u6570\u7684\u68af\u5ea6":81,"\u524d\u8005op\u7684\u5b9a\u4e49\u7ee7\u627f\u81ea":75,"\u524d\u81ea\u52a8\u68c0\u67e5\u4e00\u4e9b\u57fa\u672c\u4e8b\u5b9c":72,"\u524d\u9700\u8981\u5b89\u88c5":107,"\u524d\u9988":92,"\u529f\u80fd":27,"\u529f\u80fd\u7684\u6b63\u786e\u6027\u5305\u62ec\u9a8c\u8bc1paddlepaddle\u76ee\u524d\u7684":63,"\u52a0\u4e0a\u504f\u7f6e\u5411\u91cf":74,"\u52a0\u5165":108,"\u52a0\u6743\u548c\u7528\u6765\u751f\u6210":114,"\u52a0\u6743\u7f16\u7801\u5411\u91cf":114,"\u52a0\u8f7d\u5177\u4f53\u7f51\u7edc\u53c2\u6570":83,"\u52a0\u8f7d\u6a21\u578b\u53ef\u5176\u5b83\u591a\u79cd\u65b9\u5f0f":90,"\u52a0\u8f7d\u6a21\u578b\u9700\u540c\u65f6\u6307\u5b9a":90,"\u52a0\u8f7d\u9884\u6d4b\u6a21\u578b":90,"\u52a0\u8f7d\u9884\u8bad\u7ec3\u53c2\u6570":83,"\u52a0\u8f7dtest":103,"\u52a0\u901f\u7f16\u8bd1":0,"\u52a0\u901fpaddlepaddle\u8bad\u7ec3\u53ef\u4ee5\u8003\u8651\u4ece\u4ee5\u4e0b\u51e0\u4e2a\u65b9\u9762":81,"\u52a8\u6001\u5e93":[45,87],"\u52a9\u624b":74,"\u5305":107,"\u5305\u542b\u4e86\u67d0\u79cd\u7c7b\u578b\u7684\u7c7b\u578b\u5b9a\u4e49\u548c\u66b4\u9732\u7684\u5168\u90e8\u51fd\u6570":46,"\u5305\u542b\u4f46\u4e0d\u9650\u4e8e":0,"\u5305\u542b\u6d4b\u8bd5\u6570\u636e\u96c6\u7684\u76ee\u5f55":91,"\u5305\u542b\u8bad\u7ec3\u6570\u636e\u7684\u76ee\u5f55":91,"\u5305\u542b\u8fd9\u4e2a\u51fd\u6570\u8c03\u7528\u5176\u4ed6\u51fd\u6570\u7684\u65f6\u95f4":107,"\u5305\u542bkernel\u7684op\u548c\u4e0d\u5305\u542bkernel\u7684op":75,"\u5305\u62ec":[11,41,42,87,103],"\u5305\u62ec\u4e86\u7f16\u8bd1\u51fa\u7684paddlepaddle\u5934\u6587\u4ef6\u548c\u94fe\u63a5\u5e93":87,"\u5305\u62ec\u5b57\u7b26\u4e32\u5206\u914d":81,"\u5305\u62ec\u6743\u91cdw\u548c\u504f\u7f6eb":10,"\u5305\u62ec\u751f\u6210cpu":0,"\u5305\u62ec\u795e\u7ecf\u7f51\u7edc\u62d3\u6251\u7ed3\u6784":84,"\u5305\u62ecbool":105,"\u5305\u62eclinux":116,"\u5305\u62ecmkl":42,"\u5305\u7684\u65b9\u6cd5\u662f":78,"\u533a\u522b\u662f\u540c\u65f6\u5904\u7406\u4e86\u4e24\u4e2a\u8f93\u5165":111,"\u533a\u522b\u662frnn\u4f7f\u7528\u4e24\u5c42\u5e8f\u5217\u6a21\u578b":111,"\u534f\u540c\u5b8c\u6210releas":63,"\u5355\u4e2a\u503c":11,"\u5355\u4f4d\u662fmb":103,"\u5355\u5143\u6d4b\u8bd5":76,"\u5355\u5143\u6d4b\u8bd5\u4f1a\u5f15\u7528site":78,"\u5355\u5143\u6d4b\u8bd5\u4f1a\u88ab\u81ea\u52a8\u52a0\u5165\u5de5\u7a0b\u8fdb\u884c\u7f16\u8bd1":75,"\u5355\u5143\u6d4b\u8bd5checkgrad_ep":102,"\u5355\u53cc\u5c42\u5e8f\u5217\u7684\u53e5\u5b50\u662f\u4e00\u6837\u7684":111,"\u5355\u53cc\u5c42rnn":112,"\u5355\u5c42":113,"\u5355\u5c42\u4e0d\u7b49\u957frnn":111,"\u5355\u5c42\u548c\u53cc\u5c42\u5e8f\u5217\u7684\u4f7f\u7528\u548c\u793a\u4f8b2\u4e2d\u7684\u793a\u4f8b\u7c7b\u4f3c":111,"\u5355\u5c42\u5e8f\u5217":[89,110],"\u5355\u5c42\u5e8f\u5217\u7684\u6bcf\u4e2a\u5143\u7d20":110,"\u5355\u5c42\u5e8f\u5217\u7b2ci\u4e2a\u5143\u7d20":110,"\u5355\u5c42\u6216\u53cc\u5c42":110,"\u5355\u5c42\u65f6\u95f4\u5e8f\u5217":111,"\u5355\u5c42rnn":[111,113],"\u5355\u5c42rnn\u548c\u53cc\u5c42rnn\u7684\u7f51\u7edc\u914d\u7f6e":111,"\u5355\u673acpu\u8bad\u7ec3":81,"\u5355\u673agpu\u8bad\u7ec3":81,"\u5355\u6b65\u51fd\u6570":114,"\u5355\u6b65\u51fd\u6570\u548c\u8f93\u51fa\u51fd\u6570\u5728":114,"\u5355\u6b65\u51fd\u6570\u548c\u8f93\u51fa\u51fd\u6570\u90fd\u975e\u5e38\u7b80\u5355":114,"\u5355\u6b65\u51fd\u6570\u7684\u5b9e\u73b0\u5982\u4e0b\u6240\u793a":114,"\u5355\u6d4b\u5305\u62ec\u5bf9\u6bd4\u524d\u5411op\u4e0d\u540c\u8bbe\u5907":75,"\u5355\u70b9\u6545\u969c":10,"\u5355\u7eaf\u7684":107,"\u5355\u8fdb\u5355\u51fa":113,"\u5360\u7528\u4e8617":107,"\u5373":[46,75,77,81,82,97],"\u5373\u4e0a\u8ff0\u4ee3\u7801\u4e2d\u7684\u7b2c19\u884c":111,"\u5373\u4e0b\u8f7d\u5931\u8d25":78,"\u5373\u4e0d\u5141\u8bb8\u5728":75,"\u5373\u4e0d\u9700\u8981\u4f7f\u7528memori":111,"\u5373\u4e3a\u4e00\u4e2a\u65f6\u95f4\u6b65":111,"\u5373\u4e3a\u5355\u5c42rnn\u5e8f\u5217\u7684\u4f7f\u7528\u4ee3\u7801":111,"\u5373\u4e3a\u65f6\u95f4\u5e8f\u5217\u7684\u8f93\u5165":111,"\u5373\u4e3a\u8fd9\u4e2a\u53cc\u5c42rnn\u7684\u7f51\u7edc\u7ed3\u6784":111,"\u5373\u4e8c\u7ef4\u6570\u7ec4":111,"\u5373\u4f7f\u7528":[46,82],"\u5373\u4f7f\u7528\u6237\u76f4\u63a5\u5f15\u7528\u67d0\u79cd\u7c7b\u578b\u7684\u5934\u6587\u4ef6":46,"\u5373\u4f7f\u95f4\u9694\u5f88\u5c0f":103,"\u5373\u4f7fc":46,"\u5373\u4f8b\u5982":46,"\u5373\u4fbf\u662f":0,"\u5373\u4fbf\u8bbe\u7f6e":78,"\u5373\u4fbfpaddl":46,"\u5373\u521d\u59cb\u72b6\u6001\u4e3a0":113,"\u5373\u5355\u65f6\u95f4\u6b65\u6267\u884c\u7684\u51fd\u6570":114,"\u5373\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217":111,"\u5373\u53cc\u5c42rnn\u7684\u6bcf\u4e2a\u72b6\u6001":113,"\u5373\u53ef":72,"\u5373\u53ef\u4ee5\u6781\u5927\u7684\u52a0\u901f\u6570\u636e\u8f7d\u5165\u6d41\u7a0b":81,"\u5373\u53ef\u4f7f\u7528\u5f00\u53d1\u955c\u50cf\u6765\u7f16\u8bd1android\u7248paddlepaddl":116,"\u5373\u53ef\u5728":118,"\u5373\u53ef\u5f00\u59cb\u4e0b\u8f7d":3,"\u5373\u53ef\u5f00\u59cb\u4e0b\u9762\u7684\u6b65\u9aa4":1,"\u5373\u53ef\u663e\u793a\u6027\u80fd\u5206\u6790\u7684\u7ed3\u679c":107,"\u5373\u53ef\u68c0\u67e5\u6211\u4eec\u8c03\u4f18\u540e\u7684\u4fee\u6b63\u662f\u5426\u80fd\u591f\u6539\u5584\u7a0b\u5e8f\u7684\u6027\u80fd":107,"\u5373\u5728\u53cc\u5c42\u5e8f\u5217\u7684\u539f\u59cb\u6570\u636e\u4e2d":111,"\u5373\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d":81,"\u5373\u5927\u90e8\u5206\u503c\u4e3a0":84,"\u5373\u5b8c\u6210\u67d0\u4e00\u4e2a\u4efb\u52a1\u7684\u6700\u5c11\u51fd\u6570":46,"\u5373\u5c06\u4e00\u6bb5\u8bdd\u8fdb\u884c\u5206\u7c7b":111,"\u5373\u5c06nchw\u8f6c\u6362\u6210nhwc":82,"\u5373\u5f53\u524d\u65f6\u95f4\u6b65\u4e0b\u7684\u795e\u7ecf\u7f51\u7edc\u4f9d\u8d56\u524d\u4e00\u4e2a\u65f6\u95f4\u6b65\u795e\u7ecf\u7f51\u7edc\u4e2d\u67d0\u4e00\u4e2a\u795e\u7ecf\u5143\u8f93\u51fa":111,"\u5373\u6211\u4eec\u53ef\u4ee5\u5148\u5b9a\u4e49\u4e00\u4e2atensor":76,"\u5373\u628a\u5355\u5c42rnn\u751f\u6210\u540e\u7684subseq\u7ed9\u62fc\u63a5\u6210\u4e00\u4e2a\u65b0\u7684\u53cc\u5c42seq":113,"\u5373\u6574\u4e2a\u53cc\u5c42group\u662f\u5c06\u524d\u4e00\u4e2a\u5b50\u53e5\u7684\u6700\u540e\u4e00\u4e2a\u5411\u91cf":111,"\u5373\u6574\u4e2a\u8f93\u5165\u5e8f\u5217":110,"\u5373\u6574\u6570\u6570\u7ec4":111,"\u5373\u65f6\u95f4\u9012\u5f52\u795e\u7ecf\u7f51\u7edc":111,"\u5373\u662f\u8de8\u8d8a\u65f6\u95f4\u6b65\u7684\u7f51\u7edc\u8fde\u63a5":111,"\u5373\u66b4\u9732":46,"\u5373\u7279\u5f81\u7684\u6570\u7ec4":111,"\u5373\u7f51\u5361\u540d":97,"\u5373\u8868\u793a\u4e0d\u9700\u8981\u8f6c\u6362":42,"\u5373\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u51fa\u73b0nan\u6216\u8005inf":81,"\u5373\u8bbe\u7f6e":81,"\u5373\u8fd0\u884c\u8bad\u7ec3\u7a0b\u5e8f":1,"\u5373\u8fd9\u4e2a\u52a8\u6001\u5e93\u662f\u4e0d\u4f9d\u8d56\u4e8e\u5176\u4ed6\u4efb\u4f55\u6587\u4ef6\u7684":45,"\u5373define_py_data_sources2\u5e94\u6539\u4e3a":83,"\u5373input":113,"\u5373rnn\u4e4b\u95f4\u6709\u4e00\u6b21\u5d4c\u5957\u5173\u7cfb":111,"\u5378\u8f7dpaddlepaddle\u5305":78,"\u538b\u6241\u6210\u4e3aeigen\u7684\u4e00\u7ef4tensor":76,"\u538b\u7f29\u6210\u4e00\u4e2a\u5411\u91cf":111,"\u539f\u56e0":[72,78],"\u539f\u56e0\u5728\u4e8e\u6ca1\u6709\u628a\u673a\u5668\u4e0acuda\u76f8\u5173\u7684\u9a71\u52a8\u548c\u5e93\u6620\u5c04\u5230\u5bb9\u5668\u5185\u90e8":78,"\u539f\u56e0\u662f\u6bcf\u4e2a\u56de\u590d\u90fd\u4f1a\u53d1\u9001\u4e00\u5c01\u90ae\u4ef6":72,"\u539f\u6765\u7684\u65b9\u6848":42,"\u53c2\u6570":[0,45,74,81,90,91,97,102],"\u53c2\u6570\u4e3a":75,"\u53c2\u6570\u5171\u4eab\u7684\u914d\u7f6e\u793a\u4f8b\u4e3a":83,"\u53c2\u6570\u548c\u73af\u5883\u53d8\u91cf":91,"\u53c2\u6570\u670d\u52a1\u5668":[92,102],"\u53c2\u6570\u670d\u52a1\u5668\u4e4b\u95f4\u4e0d\u76f8\u4e92\u4f9d\u8d56":92,"\u53c2\u6570\u670d\u52a1\u5668\u4e5f\u4e0d\u4f1a\u7b49\u5f85\u8ba1\u7b97\u8282\u70b9\u5168\u90e8\u90fd\u63d0\u4ea4\u68af\u5ea6\u4e4b\u540e\u624d\u5f00\u59cb\u4e0b\u4e00\u6b65":92,"\u53c2\u6570\u670d\u52a1\u5668\u63a5\u6536\u4ece\u8ba1\u7b97\u8282\u70b9\u4e0a\u4f20\u7684\u68af\u5ea6":92,"\u53c2\u6570\u670d\u52a1\u5668\u7684\u53c2\u6570\u5206\u5757\u5927\u5c0f":103,"\u53c2\u6570\u670d\u52a1\u5668\u7684\u76d1\u542c\u7aef\u53e3":103,"\u53c2\u6570\u670d\u52a1\u5668\u7684\u7f51\u7edc\u8bbe\u5907\u540d\u79f0":103,"\u53c2\u6570\u670d\u52a1\u5668\u7684ip\u5730\u5740":103,"\u53c2\u6570\u670d\u52a1\u5668\u7a00\u758f\u66f4\u65b0\u7684\u53c2\u6570\u5206\u5757\u5927\u5c0f":103,"\u53c2\u6570\u6765\u63a7\u5236\u7f13\u5b58\u65b9\u6cd5":81,"\u53c2\u6570\u6982\u8ff0":104,"\u53c2\u6570\u7684\u4e2a\u6570\u548c\u53c2\u6570\u5217\u8868":90,"\u53c2\u6570\u7684\u89e3\u6790":97,"\u53c2\u6570\u8bbe\u7f6e":80,"\u53c2\u6570\u8bbe\u7f6e\u4e86\u5916\u5c42":111,"\u53c2\u6570\u8bf4\u660e":91,"\u53c2\u6570\u8bf4\u660e\u5bb9\u5668\u5df2\u4ea4\u4e92\u5f0f\u8fd0\u884c":1,"\u53c2\u6570\u8f93\u5165":81,"\u53c2\u6570\u9700\u8981\u5b9e\u73b0":114,"\u53c2\u7167\u4e0a\u8ff0\u6b65\u9aa4\u66f4\u65b0":72,"\u53c2\u8003":[1,27,45],"\u53c2\u8003\u4e0b\u56fe":63,"\u53c2\u8003\u4e0b\u8ff0\u53ef\u9009\u6b65\u9aa4":0,"\u53c2\u8003\u5f3a\u8c03\u90e8\u5206":108,"\u53c2\u8003\u65f6\u95f4\u5e8f\u5217":111,"\u53c2\u8003\u6837\u4f8b\u6570\u636e\u51c6\u5907\u811a\u672c":91,"\u53c2\u8003\u955c\u50cf\u7684":97,"\u53c8\u53ef\u4ee5\u907f\u514d\u4e0d\u5fc5\u8981\u7684\u8f6c\u6362":42,"\u53c8\u662f\u4e00\u4e2a\u5355\u5c42\u7684\u5e8f\u5217":110,"\u53c8\u8981\u4fdd\u8bc1\u6570\u636e\u662f\u968f\u673a\u7684":81,"\u53ca":74,"\u53cc\u5411\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u9690\u85cf\u72b6\u6001":114,"\u53cc\u5411\u9a8c\u8bc1":27,"\u53cc\u5c42":113,"\u53cc\u5c42\u4e0d\u7b49\u957frnn":111,"\u53cc\u5c42\u5e8f\u5217":[89,110],"\u53cc\u5c42\u5e8f\u5217\u5728\u5904\u7406\u957f\u5e8f\u5217\u7684\u4efb\u52a1\u6216\u662f\u6784\u5efa\u5c42\u7ea7\u6a21\u578b\u65f6\u4f1a\u53d1\u6325\u4f5c\u7528":89,"\u53cc\u5c42\u5e8f\u5217\u6216\u5355\u5c42\u5e8f\u5217":110,"\u53cc\u5c42\u5e8f\u5217\u6570\u636e\u4e00\u5171\u67094\u4e2a\u6837\u672c":111,"\u53cc\u5c42\u5e8f\u5217\u662f\u4e00\u4e2a\u5d4c\u5957\u7684\u5e8f\u5217":110,"\u53cc\u5c42\u5e8f\u5217\u662fpaddlepaddle\u652f\u6301\u7684\u4e00\u79cd\u975e\u5e38\u7075\u6d3b\u7684\u6570\u636e\u7ec4\u7ec7\u65b9\u5f0f":113,"\u53cc\u5c42\u5e8f\u5217\u6bcf\u4e2asubseq\u4e2d\u6bcf\u4e2a\u5143\u7d20":110,"\u53cc\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a0\u5c42\u5e8f\u5217":110,"\u53cc\u5c42\u5e8f\u5217\u9700\u8981\u8bbe\u7f6e\u5206\u522b\u4e3a\u5916\u5c42\u5e8f\u5217\u548c\u5185\u5c42\u5e8f\u5217\u5206\u522b\u8bbe\u7f6e":89,"\u53cc\u5c42\u6216\u8005\u5355\u5c42":110,"\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217\u7684dataprovider\u7684\u4ee3\u7801":111,"\u53cc\u5c42rnn":113,"\u53cc\u5c42rnn\u6570\u636e\u968f\u610f\u52a0\u4e86\u4e00\u4e9b\u9694\u65ad":111,"\u53cc\u5c42rnn\u987e\u540d\u601d\u4e49":111,"\u53cc\u8fdb\u5355\u51fa":113,"\u53cc\u8fdb\u53cc\u51fa":113,"\u53cd\u5411\u4f20\u64ad":74,"\u53cd\u5411\u4f20\u64ad\u6839\u636e\u8f93\u51fa\u7684\u68af\u5ea6":74,"\u53cd\u5411\u8ba1\u7b97\u5df2\u7ecf\u81ea\u52a8\u96c6\u6210\u8fdb\u6d4b\u8bd5\u6846\u67b6":75,"\u53cd\u5411op\u7684\u68af\u5ea6\u6d4b\u8bd5":75,"\u53cd\u5411op\u7c7b":75,"\u53cd\u5411op\u7c7b\u7684\u5b9a\u4e49":75,"\u53cd\u5411opkernel\u7684\u5b9a\u4e49\u4e0e\u524d\u5411op\u7c7b\u4f3c":75,"\u53d1\u578b\u7248":63,"\u53d1\u5e03\u5230dockerhub":63,"\u53d1\u5e03docker\u955c\u50cf\u53ea\u9700\u8981\u5bf9\u81ea\u52a8push\u7684\u955c\u50cf\u6253\u4e0a":63,"\u53d1\u6563\u5230\u4e86\u4e00\u4e2a\u6570\u503c\u7279\u522b\u5927\u7684\u5730\u65b9":81,"\u53d1\u884c\u548c\u7ef4\u62a4":72,"\u53d1\u9001\u53c2\u6570\u7684\u7aef\u53e3\u53f7":103,"\u53d6\u503c\u76f8\u540c\u7684layer":82,"\u53d6\u5176\u4e2d\u4e00\u4e2a\u6a21\u578bparams_pass_90":84,"\u53d6\u51b3\u4e8e":75,"\u53d8\u6362\u77e9\u9635":74,"\u53d8\u91cf\u6765\u533a\u5206layer\u7684\u5c5e\u6027":42,"\u53e5\u5b50\u662f\u7531\u8bcd\u8bed\u6784\u6210\u7684\u5e8f\u5217":89,"\u53e6\u4e00\u4e2a\u65b9\u6cd5\u662f\u4ea4\u53c9\u7f16\u8bd1":118,"\u53e6\u4e00\u4e2a\u662f\u5185\u5b58\u64cd\u4f5c\u91cf":108,"\u53e6\u4e00\u4e2a\u662f\u6bcf\u6761\u5e8f\u5217":81,"\u53e6\u4e00\u79cd\u65b9\u5f0f\u662f\u5c06\u7f51\u7edc\u5c42\u5212\u5206\u5230\u4e0d\u540c\u7684gpu\u4e0a\u53bb\u8ba1\u7b97":105,"\u53e6\u5916":[0,111],"\u53e6\u5916\u6700\u65b0\u7684pip\u5b98\u65b9\u6e90\u4e2d\u7684\u5b89\u88c5\u5305\u9ed8\u8ba4\u662fmanylinux1\u6807\u51c6":3,"\u53ea\u4f5c\u4e3aread":113,"\u53ea\u4fdd\u5b58\u6700\u540e\u4e00\u8f6e\u7684\u53c2\u6570":103,"\u53ea\u5728\u7b2c\u4e00\u6b21cmake\u7684\u65f6\u5019\u6709\u6548":0,"\u53ea\u5bf9\u7279\u6b8a\u5728\u7ebf\u7cfb\u7edf\u8003\u8651\u4e24\u53f0\u4ee5\u4e0a\u540c\u65f6\u6545\u969c\u7684\u5bb9\u707e":10,"\u53ea\u5bf9\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u8fdb\u884c\u5e8f\u5217\u5316":90,"\u53ea\u5c06\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u8fdb\u884c\u5e8f\u5217\u5316":90,"\u53ea\u662f\u53cc\u5c42\u5e8f\u5217\u5c06\u5176\u53c8\u505a\u4e86\u5b50\u5e8f\u5217\u5212\u5206":111,"\u53ea\u66b4\u9732\u6982\u5ff5\u7684\u63a5\u53e3":46,"\u53ea\u6709\u5f53\u8bbe\u7f6e\u4e86spars":103,"\u53ea\u7528\u4e8e\u5728\u5e8f\u5217\u751f\u6210\u4efb\u52a1\u4e2d\u6307\u5b9a\u8f93\u5165\u6570\u636e":113,"\u53ea\u7559\u4e0b\u6838\u5fc3\u8ba1\u7b97\u5c42":90,"\u53ea\u80fd\u5728recurrent_group\u4e2d\u4f5c\u4e3astep":82,"\u53ea\u80fd\u6309\u884c\u8ba1\u7b97":82,"\u53ea\u80fd\u6d4b\u8bd5\u5355\u4e2a\u6a21\u578b":105,"\u53ea\u80fd\u8bbf\u95ee\u5b83\u4eec\u7684\u8f93\u51fa\u503c":82,"\u53ea\u80fd\u8c03\u7528paddle\u7684\u52a8\u6001\u5e93":45,"\u53ea\u8981\u4e00\u7cfb\u5217\u7279\u5f81\u6570\u636e\u4e2d\u7684":111,"\u53ea\u8981\u51fa\u73b0\u6d6e\u70b9\u6570\u5f02\u5e38":81,"\u53ea\u8bfbmemory\u8f93\u5165":113,"\u53ea\u9488\u5bf9\u5185\u5b58":81,"\u53ea\u9700\u4e2d\u65ad":93,"\u53ea\u9700\u5728\u7f16\u8bd1\u65f6\u9700\u914d\u5236\u4e0b\u9762\u8fd9\u4e9b\u7f16\u8bd1\u9009\u9879":87,"\u53ea\u9700\u7528\u60a8\u5b9a\u4e49\u7684\u76ee\u5f55\u4fee\u6539":93,"\u53ea\u9700\u8981":114,"\u53ea\u9700\u8981\u6062\u590d\u8fd9\u53f0\u8282\u70b9":10,"\u53ea\u9700\u8981\u8bbe\u7f6e\u884c\u504f\u79fb":89,"\u53ea\u9700\u8981\u94fe\u63a5":87,"\u53ea\u9700\u8fdb\u884c\u524d\u5411\u8ba1\u7b97\u800c\u65e0\u9700\u8c03\u7528\u53cd\u5411\u8ba1\u7b97":90,"\u53ef\u4ee5":[1,63,72,77],"\u53ef\u4ee5\u4ece":1,"\u53ef\u4ee5\u4ece\u6211\u4eec\u7684ci\u7cfb\u7edf\u4e2d\u4e0b\u8f7d\u6700\u65b0\u7684whl\u5b89\u88c5\u5305\u548cc":3,"\u53ef\u4ee5\u4f30\u8ba1\u51fa\u5982\u679c\u6a21\u578b\u91c7\u7528\u4e0d\u53d8\u7684\u8f93\u51fa\u6700\u5c0f\u7684cost0\u662f\u591a\u5c11":83,"\u53ef\u4ee5\u4f7f\u7528":[83,90],"\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u7684\u547d\u4ee4\u66f4\u65b0\u60a8\u7684pip":3,"\u53ef\u4ee5\u4f7f\u7528\u5982\u4e0b\u4ee3\u7801":83,"\u53ef\u4ee5\u4f7f\u7528\u76f8\u5e94\u6570\u636e\u7c7b\u578b\u7684":83,"\u53ef\u4ee5\u4f7f\u7528\u8be5\u53c2\u6570":103,"\u53ef\u4ee5\u4f7f\u7528kubernetes\u7684\u547d\u4ee4\u884c\u5de5\u5177\u521b\u5efajob":97,"\u53ef\u4ee5\u5148\u4f7f\u7528":82,"\u53ef\u4ee5\u51cf\u5c0f\u7cfb\u7edf\u590d\u6742\u6027":10,"\u53ef\u4ee5\u51cf\u5c11\u7f13\u5b58\u6c60\u7684\u5927\u5c0f":81,"\u53ef\u4ee5\u521b\u5efa\u4e00\u4e2a":96,"\u53ef\u4ee5\u521b\u5efa\u975e":75,"\u53ef\u4ee5\u52a0\u901fpaddlepaddle\u7684\u8ba1\u7b97":1,"\u53ef\u4ee5\u53c2\u8003":[0,1,72,111,114],"\u53ef\u4ee5\u53c2\u8003\u4e0b\u9762\u7684\u6b65\u9aa4\u6392\u67e5":79,"\u53ef\u4ee5\u53c2\u8003paddlepaddl":84,"\u53ef\u4ee5\u540c\u65f6\u5728cpu":76,"\u53ef\u4ee5\u542b\u6709\u4e00\u6761\u6216\u591a\u6761\u6837\u672c":89,"\u53ef\u4ee5\u544a\u8bc9\u60a8\u67d0\u4e2a\u64cd\u4f5c\u5230\u5e95\u82b1\u4e86\u591a\u957f\u65f6\u95f4":108,"\u53ef\u4ee5\u5728":[0,93],"\u53ef\u4ee5\u5728\u4efb\u4f55\u673a\u5668\u4e0a\u6267\u884c\u7684":45,"\u53ef\u4ee5\u5728\u5171\u4eab\u5b58\u50a8\u4e0a\u67e5\u770b\u8f93\u51fa\u7684\u65e5\u5fd7\u548c\u6a21\u578b":97,"\u53ef\u4ee5\u5728\u6b64\u9875\u9762\u7684":63,"\u53ef\u4ee5\u5728\u8fd9\u4e2a":72,"\u53ef\u4ee5\u5728event_handler\u4e2d":81,"\u53ef\u4ee5\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u7684sgd\u65b9\u6cd5\u7684\u8bad\u7ec3":92,"\u53ef\u4ee5\u5b9e\u73b0\u4ecepaddl":76,"\u53ef\u4ee5\u5c06cmake":107,"\u53ef\u4ee5\u5c06memory\u7406\u89e3\u4e3a\u4e00\u4e2a\u65f6\u5ef6\u64cd\u4f5c":113,"\u53ef\u4ee5\u5c06op\u5206\u4e3a\u4e24\u79cd":75,"\u53ef\u4ee5\u5c1d\u8bd5\u4ee5\u4e0b\u7684\u65b9\u6cd5":1,"\u53ef\u4ee5\u5e2e\u60a8\u63d0\u4f9b\u4e00\u4e9b\u5b9a\u4f4d\u6027\u80fd\u74f6\u9888\u7684\u5efa\u8bae":108,"\u53ef\u4ee5\u5e76\u884c\u7f16\u8bd1\u5417":0,"\u53ef\u4ee5\u5feb\u901f\u5728\u672c\u5730\u542f\u52a8\u4e00\u4e2a\u5305\u542b\u4e86paddlepaddle\u5b98\u65b9book\u6559\u7a0b\u7684jupyt":1,"\u53ef\u4ee5\u6267\u884c":[3,78],"\u53ef\u4ee5\u6267\u884c\u4ee5\u4e0b\u547d\u4ee4\u7f16\u8bd1\u751f\u6210\u6587\u6863":77,"\u53ef\u4ee5\u628a\u5b83\u60f3\u8c61\u4e3a\u4e00\u4e2a\u7c7b\u4f3c":0,"\u53ef\u4ee5\u628a\u672c\u5730\u7684\u6570\u636e\u4e0a\u4f20\u5230\u5b58\u50a8\u96c6\u7fa4\u4e2d":11,"\u53ef\u4ee5\u6307\u5b9a\u540c\u65f6\u6267\u884cgpu\u4e0a\u7684\u5355\u5143\u6d4b\u8bd5":0,"\u53ef\u4ee5\u6307\u5b9a\u54ea\u4e00\u4e2a\u8f93\u5165\u548c\u8f93\u51fa\u5e8f\u5217\u4fe1\u606f\u4e00\u81f4":111,"\u53ef\u4ee5\u6307\u5b9a\u5f00\u542f\u81ea\u52a8\u68c0\u6d4bsm\u67b6\u6784":0,"\u53ef\u4ee5\u6309\u7167\u4e0b\u9762\u7684\u65b9\u6cd5":0,"\u53ef\u4ee5\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":[110,113],"\u53ef\u4ee5\u662f\u4e00\u4e2a\u975e\u5e8f\u5217":113,"\u53ef\u4ee5\u662f\u4ece\u5206\u5e03\u5f0f\u5b58\u50a8\u6302\u8f7d\u8fc7\u6765\u7684":91,"\u53ef\u4ee5\u662f\u4ee5\u4e0b\u51e0\u79cd":74,"\u53ef\u4ee5\u662f\u6574\u578b":89,"\u53ef\u4ee5\u663e\u793a\u5730\u6307\u5b9a\u4e00\u4e2alayer\u7684\u8f93\u51fa\u7528\u4e8e\u521d\u59cb\u5316memori":113,"\u53ef\u4ee5\u66f4\u6709\u6b21\u5e8f\u7684\u5b8c\u6210\u6027\u80fd\u7684\u4f18\u5316":107,"\u53ef\u4ee5\u6709\u4ee5\u4e0b\u4e24\u79cd":113,"\u53ef\u4ee5\u6709\u6548\u51cf\u5c0f\u7f51\u7edc\u7684\u963b\u585e":103,"\u53ef\u4ee5\u6709\u6548\u7684\u907f\u514dparamet":10,"\u53ef\u4ee5\u67e5\u770b":97,"\u53ef\u4ee5\u67e5\u770b\u6b64pod\u8fd0\u884c\u7684\u5bbf\u4e3b\u673a":96,"\u53ef\u4ee5\u6d4b\u8bd5\u591a\u4e2a\u6a21\u578b":105,"\u53ef\u4ee5\u7528":[0,27],"\u53ef\u4ee5\u7528\u4ee5\u4e0b\u6307\u4ee4":11,"\u53ef\u4ee5\u7528\u5982\u4e0b\u547d\u4ee4":72,"\u53ef\u4ee5\u7528\u6765\u8ba1\u7b97cpu\u51fd\u6570\u6216cuda\u5185\u6838\u7684\u65f6\u95f4\u6d88\u8017":108,"\u53ef\u4ee5\u76f4\u63a5\u8fd0\u884c":90,"\u53ef\u4ee5\u7701\u7565\u6b65\u9aa43\u4e2d":0,"\u53ef\u4ee5\u770b\u4f5c\u662f\u4e00\u4e2a\u975e\u5e8f\u5217\u8f93\u5165":110,"\u53ef\u4ee5\u770b\u51fa":92,"\u53ef\u4ee5\u770b\u5230\u6700\u8017\u65f6\u7684\u51fd\u6570\u662fc":107,"\u53ef\u4ee5\u7cbe\u786e\u8bf4\u660e\u4e00\u4e2a\u957f\u8017\u65f6\u64cd\u4f5c\u7684\u5177\u4f53\u539f\u56e0":108,"\u53ef\u4ee5\u7ee7\u7eed\u5728\u81ea\u5df1\u7684\u529f\u80fd\u5206\u652f\u63d0\u4ea4\u4ee3\u7801":63,"\u53ef\u4ee5\u8003\u8651\u4f7f\u7528\u4e00\u4e9b\u4f18\u5316\u7b97\u6cd5":81,"\u53ef\u4ee5\u8054\u7cfbop":79,"\u53ef\u4ee5\u8054\u7cfbop\u662f\u5426\u53ef\u4ee5\u66f4\u6362\u96c6\u7fa4\u6216\u5347\u7ea7\u5f53\u524d\u96c6\u7fa4":79,"\u53ef\u4ee5\u83b7\u53d6\u7f51\u7edc\u4e2d\u5b9a\u4e49\u7684\u4efb\u610f\u591a\u4e2a":90,"\u53ef\u4ee5\u88c5\u7684\u662f":0,"\u53ef\u4ee5\u8bbe\u7f6e":[107,117,118],"\u53ef\u4ee5\u8bbf\u95ee\u7531recurr":82,"\u53ef\u4ee5\u8c03\u7528resize\u63a5\u53e3\u8fdb\u884c\u6539\u53d8":76,"\u53ef\u4ee5\u8f7b\u677e\u5730\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u914d\u7f6e":84,"\u53ef\u4ee5\u8fd0\u884c":91,"\u53ef\u4ee5\u9009\u5728\u5728\u5f53\u524d\u673a\u5668\u5b89\u88c5\u4e5f\u53ef\u4ee5\u62f7\u8d1d\u5230\u76ee\u6807\u673a\u5668\u5b89\u88c5":0,"\u53ef\u4ee5\u9009\u62e9\u662f\u5426\u4f7f\u7528\u53c2\u6570":105,"\u53ef\u4ee5\u901a\u8fc7":72,"\u53ef\u4ee5\u901a\u8fc7\u4fee\u6539\u8fd9\u4e24\u4e2a\u51fd\u6570\u6765\u5b9e\u73b0\u590d\u6742\u7684\u7f51\u7edc\u914d\u7f6e":114,"\u53ef\u4ee5\u901a\u8fc7\u5728":81,"\u53ef\u4ee5\u901a\u8fc7\u7f51\u9875\u6d4f\u89c8":1,"\u53ef\u4ee5\u901a\u8fc7\u8fd9\u4e2a\u8f93\u51fa\u6765\u5b8c\u6210\u81ea\u5b9a\u4e49\u7684\u8bc4\u4f30\u6307\u6807\u8ba1\u7b97\u7b49\u529f\u80fd":81,"\u53ef\u4ee5\u901a\u8fc7\u9636\u6bb5\u6027\u7684\u4fdd\u5b58\u6bcf\u4e2aparamet":10,"\u53ef\u4ee5\u91c7\u53d6\u4e0b\u9762\u51e0\u70b9\u63aa\u65bd":107,"\u53ef\u4ee5\u91cd\u547d\u540d\u8fd9\u4e2awhl\u5305\u4e3a":[3,78],"\u53ef\u53c2\u8003":90,"\u53ef\u5728":87,"\u53ef\u5728\u547d\u4ee4\u884c\u6267\u884c":117,"\u53ef\u663e\u5f0f\u6307\u5b9a\u4e3a":117,"\u53ef\u7528\u4e8e\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u89e3\u6790\u8fd9\u4e9b\u53c2\u6570":105,"\u53ef\u76f4\u63a5\u8fd0\u884c":90,"\u53ef\u80fd\u4f1a\u5bfc\u81f4\u51fa\u9519":97,"\u53ef\u80fd\u4f1a\u9020\u6210\u7f51\u7edc\u62e5\u585e":10,"\u53ef\u80fd\u7684\u4ee3\u7801\u4e3a":81,"\u53ef\u80fd\u7684\u539f\u56e0\u662f":83,"\u53ef\u80fd\u7684\u60c5\u51b5\u4e0b":108,"\u53ef\u80fd\u9700\u8981\u6ce8\u610f\u7ed9\u8fd9\u4e2a\u865a\u62df\u673a\u591a\u5206\u914d\u4e00\u4e9b":0,"\u53ef\u89c1\u8be5\u8ba1\u7b97\u7531\u4e24\u4e2a\u8f93\u5165":75,"\u53ef\u8bbe\u7f6e":[116,117],"\u53ef\u8bbe\u7f6e\u4e3a":117,"\u53ef\u8bbe\u7f6e\u7684\u76ee\u6807\u67b6\u6784\u5982\u4e0b\u8868\u6240\u793a":117,"\u53ef\u9009":[0,74,90,91],"\u53ef\u9009\u6b65\u9aa4":0,"\u53ef\u9009\u7684\u4e0d\u540c\u7f16\u8bd1\u73af\u5883docker\u955c\u50cf":0,"\u53ef\u9009\u914d\u7f6e\u9009\u9879":87,"\u53ef\u901a\u8fc7pip\u4e00\u952e\u5b89\u88c5":2,"\u53ef\u914d\u7f6e\u4e3a":87,"\u53ef\u91c7\u7528\u7b2c\u4e8c\u79cd\u65b9\u5f0f":82,"\u53f3\u4fa7\u7684":63,"\u5404\u6b21\u524d\u5411\u4e4b\u95f4\u4e5f\u90fd\u4f7f\u7528\u4e86\u76f8\u540c\u7684\u6743\u91cd":41,"\u5404\u9879\u66f4\u52a0\u5177\u4f53\u7684\u5355\u5143\u6d4b\u8bd5\u5728":75,"\u5408\u5e76\u5165\u4e00\u4e2a\u6587\u4ef6":90,"\u5408\u5e76\u6a21\u578b\u6587\u4ef6":90,"\u540c\u4e00\u6b21\u524d\u5411":41,"\u540c\u65f6":[41,42,78,81,108],"\u540c\u65f6\u4e5f\u4f1a\u8bfb\u53d6\u76f8\u5173\u8def\u5f84\u53d8\u91cf\u6765\u8fdb\u884c\u641c\u7d22":0,"\u540c\u65f6\u4e5f\u53ef\u4ee5\u52a0\u901f\u5f00\u59cb\u8bad\u7ec3\u524d\u6570\u636e\u8f7d\u5165\u7684\u8fc7\u7a0b":81,"\u540c\u65f6\u4e5f\u53ef\u4ee5\u901a\u8fc7":72,"\u540c\u65f6\u4e5f\u80fd\u591f\u5f15\u5165\u66f4\u52a0\u590d\u6742\u7684\u8bb0\u5fc6\u673a\u5236":113,"\u540c\u65f6\u4f1a\u5f00\u542fintel":42,"\u540c\u65f6\u5176\u5185\u90e8\u5b9e\u73b0\u53ef\u4ee5\u907f\u514d\u7eafcpu\u7248\u672cpaddlepaddle\u5728\u6267\u884c\u672c\u8bed\u53e5\u65f6\u53d1\u751f\u5d29\u6e83":108,"\u540c\u65f6\u518d\u5c06":63,"\u540c\u65f6\u53c8\u5c3d\u53ef\u80fd\u5c11\u7684\u727a\u7272mkl":42,"\u540c\u65f6\u5728\u5185\u5b58\u91cc\u76f4\u63a5\u968f\u5373\u9009\u53d6\u6570\u636e\u6765\u505ashuffl":81,"\u540c\u65f6\u5c06\u53c2\u6570\u521d\u59cb\u5316\u4e3a":83,"\u540c\u65f6\u63d0\u8d77":63,"\u540c\u65f6\u6570\u636e\u683c\u5f0f\u5c31\u662f":42,"\u540c\u65f6\u7528\u6237\u9700\u8981\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u6307\u5b9a":105,"\u540c\u65f6\u8bbe\u7f6e\u5185\u5b58\u7f13\u5b58\u529f\u80fd":81,"\u540c\u65f6\u8f93\u51fa\u5e8f\u5217\u5c42\u548c\u975e\u5e8f\u5217\u5c42":81,"\u540c\u6837":84,"\u540c\u6837\u4e5f\u53ef\u4ee5\u5728\u6d4b\u8bd5\u6a21\u5f0f\u4e2d\u6307\u5b9a\u6a21\u578b\u8def\u5f84":103,"\u540c\u6837\u53ef\u4ee5\u6269\u5c55\u5230\u53cc\u5c42\u5e8f\u5217\u7684\u5904\u7406\u4e0a":113,"\u540c\u6837\u53ef\u83b7\u53d6\u5230\u8f93\u5165\u8f93\u51fa\u548c\u5c5e\u6027\u53c2\u6570":75,"\u540c\u6b65\u6267\u884c\u64cd\u4f5c\u7684\u7ebf\u7a0b\u6570":103,"\u540c\u7406":75,"\u540d\u5b57\u4fee\u9970":45,"\u540e":[0,72,83,97,116,117,118],"\u540e\u5411":41,"\u540e\u5411\u4f20\u64ad":74,"\u540e\u5411\u4f20\u64ad\u7ed9\u5b9a\u8f93\u51fa\u7684\u68af\u5ea6":74,"\u540e\u5411\u65f6\u590d\u7528\u5df2\u7ecf\u8f6c\u6362\u8fc7\u7684\u6743\u91cd":41,"\u540e\u7f00\u4e3a":91,"\u540e\u8005\u5728\u6fc0\u6d3b\u51fd\u6570\u53cd\u5411\u8ba1\u7b97\u65f6\u88ab\u8c03\u7528":81,"\u540e\u8005\u622a\u65ad\u56de\u4f20\u7ed9\u524d\u5c42\u7684\u68af\u5ea6":81,"\u540e\u8005\u7528\u4e8e\u68c0\u67e5\u53c2\u6570\u5c5e\u6027\u7684\u5408\u6cd5\u6027":75,"\u540e\u8005\u7ee7\u627f\u81ea":75,"\u540e\u9762\u7684gradient":91,"\u540e\u9988":92,"\u5411\u6307\u5b9a\u7684\u76ee\u5f55\u4e2d\u4e00\u4e2a\u65b0\u7684\u6587\u4ef6":10,"\u5411\u91cf":89,"\u5411\u91cfenable_parallel_vector":102,"\u5411paddlepaddle\u7684\u4e3b\u7248\u672c\u5e93\u63d0\u4ea4":63,"\u5417":0,"\u5426\u5219":[75,116,118],"\u5426\u5219\u4f1a\u628a":72,"\u5426\u5219\u4f7f\u7528\u591a\u673a\u8bad\u7ec3":103,"\u5426\u5219\u4f7f\u7528cpu\u6a21\u5f0f":103,"\u5426\u5219\u4f7f\u7528gpu":105,"\u5426\u5219\u5b83\u4ee5\u4e00\u4e2a\u5e8f\u5217\u8f93\u5165":114,"\u5426\u5219\u5f97\u628apaddle\u9759\u6001\u5e93\u94fe\u63a5\u5230\u89e3\u91ca\u5668\u91cc":45,"\u5426\u5219\u9891\u7e41\u7684\u591a\u8282\u70b9\u5de5\u4f5c\u7a7a\u95f4\u90e8\u7f72\u53ef\u80fd\u4f1a\u5f88\u9ebb\u70e6":93,"\u542b\u4e49":107,"\u542b\u6709\u5e8f\u5217\u4fe1\u606f\u548c\u5b50\u5e8f\u5217\u4fe1\u606f\u7684\u7a20\u5bc6\u5411\u91cf":74,"\u542b\u6709\u5e8f\u5217\u4fe1\u606f\u7684\u6574\u6570":74,"\u542b\u6709\u5e8f\u5217\u4fe1\u606f\u7684\u7a20\u5bc6\u5411\u91cf":74,"\u542f\u52a8\u4e00\u4e2a\u65b0\u7684\u7ebf\u7a0b\u5f00\u59cb\u4fdd\u5b58\u68c0\u67e5\u70b9":10,"\u542f\u52a8\u4e00\u4e2a\u6d4b\u8bd5\u96c6\u7fa4":93,"\u542f\u52a8\u53c2\u6570\u8bf4\u660e":92,"\u542f\u52a8\u5bb9\u5668\u5f00\u59cb\u8bad\u7ec3":97,"\u542f\u52a8\u5e76\u884c\u5411\u91cf\u7684\u9608\u503c":103,"\u542f\u52a8\u5feb\u901f\u5e94\u7b54":103,"\u542f\u52a8\u8bad\u7ec3\u4efb\u52a1":98,"\u542f\u7528\u68af\u5ea6\u53c2\u6570\u7684\u9608\u503c":103,"\u547d\u4ee4\u4e3a":[78,96],"\u547d\u4ee4\u521b\u5efa\u65b0\u955c\u50cf":96,"\u547d\u4ee4\u5220\u9664":[116,117,118],"\u547d\u4ee4\u53ef\u4ee5\u8bbe\u7f6e":0,"\u547d\u4ee4\u65f6":116,"\u547d\u4ee4\u6709\u65f6\u5019\u4f1a\u4ea7\u751f\u4e00\u4e9b\u4e2d\u95f4\u7ed3\u679c":0,"\u547d\u4ee4\u770b\u5230\u505c\u6b62\u540e\u4f46\u662f\u6ca1\u6709\u5220\u9664\u7684":0,"\u547d\u4ee4\u7f16\u8bd1\u6e90\u7801\u5373\u53ef":0,"\u547d\u4ee4\u884c\u4e2d\u7684":107,"\u547d\u4ee4\u884c\u53c2\u6570\u8bbe\u7f6e":106,"\u547d\u4ee4\u8bbe\u7f6e\u8be5\u7c7b\u7f16\u8bd1\u9009\u9879":0,"\u547d\u4ee4\u9009\u9879\u5e76\u4e14":93,"\u547d\u4ee4\u91cc\u90fd\u7528\u4e86":0,"\u547d\u540d\u4e3a":72,"\u547d\u540d\u89c4\u8303":75,"\u547d\u540d\u8bf7\u9075\u5b88":75,"\u548c":[0,11,41,42,45,46,63,72,74,75,76,77,81,82,83,87,89,91,92,93,105,107,108,111,114,116,118],"\u548c\u4e00\u4e2a\u5df2\u7ecf\u5206\u8bcd\u540e\u7684\u53e5\u5b50":111,"\u548c\u4e09\u79cd\u5e8f\u5217\u6a21\u5f0f":84,"\u548c\u4e0b\u9762\u5c06\u8981\u4ecb\u7ecd\u7684\u6ce8\u518c\u51fd\u6570\u4e00\u8d77\u653e\u5728":75,"\u548c\u4e2d\u6587\u6587\u6863":77,"\u548c\u4e4b\u524d\u51cf\u5c0f\u901a\u8fc7\u51cf\u5c0f\u7f13\u5b58\u6c60\u6765\u51cf\u5c0f\u5185\u5b58\u5360\u7528\u7684\u539f\u7406\u4e00\u81f4":81,"\u548c\u504f\u7f6e\u5411\u91cf":74,"\u548c\u5185\u5b58":0,"\u548c\u5217\u53f7":89,"\u548c\u53cc\u5c42\u5e8f\u5217\u542b\u6709subseq":110,"\u548c\u5bf9\u5e94\u884c\u7684\u4ee3\u7801":107,"\u548c\u5e8f\u5217\u4e2d\u542b\u6709\u5143\u7d20\u7684\u6570\u76ee\u540c":110,"\u548c\u5f02\u6b65\u968f\u673a\u68af\u5ea6\u4e0b\u964d":92,"\u548c\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u8f93\u5165":114,"\u548c\u64cd\u4f5c\u7cfb\u7edf\u4e0a\u76f4\u63a5\u8fd0\u884c\u7684":0,"\u548c\u672a\u6765\u53ef\u80fd\u8fd8\u4f1a\u7528\u5230":42,"\u548c\u793a\u4f8b2\u4e2d\u7684\u914d\u7f6e\u7c7b\u4f3c":111,"\u548c\u79bb\u7ebf\u6570\u636e\u7684\u65b9\u5f0f":11,"\u548c\u90e8\u5206layer":113,"\u548cpool":110,"\u548cpserver\u4e4b\u95f4\u7528\u4e8e\u7a00\u758f\u7c7b\u578b\u53c2\u6570\u901a\u4fe1\u7684\u7aef\u53e3\u4e2a\u6570":91,"\u54ea\u4e2atrainer\u5148\u5b8c\u6210block\u7684\u8bad\u7ec3":10,"\u54ea\u4e9b\u4e0d\u662f":111,"\u56db\u79cd\u6570\u636e\u7c7b\u578b":84,"\u56e0\u4e3a\u4e0a\u8ff0\u7279\u70b9":88,"\u56e0\u4e3a\u5168\u8fde\u63a5\u5c42\u7684\u6fc0\u6d3b\u53ef\u4ee5\u662fsoftmax":74,"\u56e0\u4e3a\u53c2\u6570":105,"\u56e0\u4e3a\u5b83\u4eec\u7684\u8ba1\u7b97\u6548\u7387\u6bd4":114,"\u56e0\u4e3a\u5b83\u6bd4":114,"\u56e0\u4e3a\u5b98\u65b9\u955c\u50cf":97,"\u56e0\u4e3a\u6211\u4eec\u4f1a\u628a\u6240\u6709\u7f16\u8bd1\u5de5\u5177\u90fd\u5b89\u88c5\u8fdb\u4e00\u4e2a":0,"\u56e0\u4e3a\u6e90\u7801\u5c31\u5728\u672c\u673a\u4e0a":0,"\u56e0\u4e3a\u8f93\u5165\u6570\u636e\u53ef\u80fd\u6709\u591a\u79cd\u7ed3\u6784":88,"\u56e0\u4e3a\u8fd9\u4e2a\u5de5\u5177\u5177\u6709web\u670d\u52a1\u754c\u9762":107,"\u56e0\u4e3a\u8fd9\u6837\u505a\u4e5f\u6ca1\u6cd5\u4fdd\u8bc1\u6d88\u9664\u968f\u673a\u6027":10,"\u56e0\u4e3ac":107,"\u56e0\u4e3apython\u7684\u641c\u7d22\u8def\u5f84\u662f\u4f18\u5148\u5df2\u7ecf\u5b89\u88c5\u7684python\u5305":78,"\u56e0\u4e3aswig\u5728\u7b2c\u4e09\u65b9\u8bed\u8a00\u4e2d\u66b4\u9732\u7684\u51fd\u6570\u540d":45,"\u56e0\u6b64":[41,74,111,113,116],"\u56e0\u6b64\u53cc\u5c42\u5e8f\u5217\u7684\u914d\u7f6e\u4e2d":111,"\u56e0\u6b64\u53ef\u80fd\u6d4b\u8bd5\u4e0d\u591f\u5b8c\u5907":76,"\u56e0\u6b64\u5728\u8f6c\u6362\u65f6\u9700\u8981\u663e\u793a\u7684\u6307\u5b9a":76,"\u56e0\u6b64\u5b83\u662finteger_value_sub_sequ":111,"\u56e0\u6b64\u5f53":116,"\u56e0\u6b64\u6211\u4eec\u91c7\u7528\u8f93\u51fa\u7684\u52a0\u6743\u548c":74,"\u56e0\u6b64\u7528\u6237\u5e76\u4e0d\u9700\u8981\u5173\u5fc3\u5b83\u4eec":102,"\u56e0\u6b64\u9519\u8bef\u7684\u4f7f\u7528\u4e8c\u8fdb\u5236\u53d1\u884c\u7248\u53ef\u80fd\u4f1a\u5bfc\u81f4\u8fd9\u79cd\u9519\u8bef":78,"\u56fd\u5185\u7528\u6237\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u7684\u955c\u50cf\u6e90\u6765\u52a0\u901f\u8bbf\u95ee":1,"\u56fe1":[89,90],"\u56fe2":89,"\u56fe\u50cf\u5206\u7c7b":63,"\u56fe\u8868":1,"\u5728":[0,41,42,46,63,72,75,89,90,91,107,110,114,117],"\u5728\u4e00\u4e2a\u4e0d\u53ef\u4e2d\u65ad\u5e76\u7f3a\u5c11\u5907\u4efd\u7684\u8bad\u7ec3\u4efb\u52a1\u4e2d":10,"\u5728\u4e00\u4e2a\u529f\u80fd\u9f50\u5168\u7684kubernetes\u673a\u7fa4\u91cc":96,"\u5728\u4e00\u4e2a\u53c2\u6570\u7684\u68af\u5ea6\u88ab\u66f4\u65b0\u540e":74,"\u5728\u4e00\u4e9b\u5206\u5e03\u5f0f\u7cfb\u7edf\u4e2d":91,"\u5728\u4e00\u6b21\u8bad\u7ec3\u4e2d":107,"\u5728\u4e00\u8f6e\u4e2d\u6bcfsave":103,"\u5728\u4e0a\u56fe\u4e2d\u663e\u793a\u4e86\u5728\u4e00\u4e2a\u5b9e\u9645\u751f\u4ea7\u73af\u5883\u4e2d\u7684\u5e94\u7528":11,"\u5728\u4e0a\u9762\u4ee3\u7801\u4e2d":111,"\u5728\u4e0a\u9762\u7684\u4ee3\u7801\u4e2d":75,"\u5728\u4e0b\u4e00\u7bc7\u4e2d":96,"\u5728\u4e0d\u540c\u96c6\u7fa4\u4e2d\u8fd0\u884c":92,"\u5728\u4e4b\u540e\u7684":81,"\u5728\u4e86\u89e3docker\u7684\u57fa\u672c\u4f7f\u7528\u65b9\u6cd5\u4e4b\u540e":1,"\u5728\u4efb\u610f\u65f6\u95f4\u67d0\u4e00\u53f0\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u4fdd\u5b58\u7684\u53c2\u6570\u53ef\u80fd\u6bd4\u53e6\u4e00\u53f0\u8981\u66f4\u65b0":92,"\u5728\u4f7f\u7528\u4e0d\u540c\u7684\u5206\u5e03\u5f0f\u8ba1\u7b97\u5e73\u53f0\u65f6":91,"\u5728\u4f7f\u7528\u540c\u6b65sgd\u8bad\u7ec3\u795e\u7ecf\u7f51\u7edc\u65f6":92,"\u5728\u4f7f\u7528\u65f6":82,"\u5728\u4f7f\u7528\u8be5\u6587\u6863\u4e4b\u524d":84,"\u5728\u4f7f\u7528c":87,"\u5728\u4f7f\u7528paddlepaddl":87,"\u5728\u4f7f\u7528twine\u4e0a\u4f20\u4e4b\u524d":63,"\u5728\u5168\u8fde\u63a5\u5c42\u4e2d":74,"\u5728\u5177\u4f53\u7684\u8ba1\u7b97\u4e2d":76,"\u5728\u51c6\u5907\u53d1\u8d77":72,"\u5728\u51fa\u73b0\u5355\u70b9\u6545\u969c\u65f6":10,"\u5728\u51fd\u6570":97,"\u5728\u5206\u5e03\u5f0f\u73af\u5883\u4e2d\u6d4b\u8bd5":103,"\u5728\u5206\u5e03\u5f0f\u8bad\u7ec3\u4e2d":103,"\u5728\u521b\u5efaparameters\u540e":83,"\u5728\u5355\u5c42\u6570\u636e\u7684\u57fa\u7840\u4e0a":111,"\u5728\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u52a0\u8f7d\u548c\u4fdd\u5b58\u53c2\u6570":103,"\u5728\u53c2\u6570\u670d\u52a1\u5668\u7ec8\u7aef\u6bcflog":103,"\u5728\u53cc\u5c42rnn\u4e2d\u7684\u7ecf\u5178\u60c5\u51b5\u662f\u5c06\u5185\u5c42\u7684\u6bcf\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217\u6570\u636e":111,"\u5728\u53cd\u5411\u4f20\u9012\u7684\u65f6\u5019":81,"\u5728\u53d8\u6362\u65f6\u9700\u8981\u5c06\u8f93\u5165\u5e8f\u5217\u4f20\u5165":111,"\u5728\u542f\u52a8job\u4e4b\u524d":97,"\u5728\u547d\u4ee4\u884c\u663e\u793a\u5206\u6790\u7ed3\u679c":107,"\u5728\u56de\u590d\u8bc4\u5ba1\u4eba\u610f\u89c1\u65f6":72,"\u5728\u56fe\u50cf\u4efb\u52a1\u4e2d":82,"\u5728\u591acpu\u8bad\u7ec3\u65f6\u5171\u4eab\u8be5\u53c2\u6570":103,"\u5728\u5b8c\u6210\u4e00\u5b9a\u91cf\u6570\u636e\u7684\u8bad\u7ec3\u540e":92,"\u5728\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u7684\u642d\u5efa\u4e4b\u540e":84,"\u5728\u5b9a\u4e49\u8f93\u5165layer\u4e4b\u540e":84,"\u5728\u5b9e\u73b0\u6bcf\u4e2a\u5b50\u7c7b\u7684\u65f6\u5019\u5c31\u4e0d\u9700\u8981\u5173\u5fc3\u5206\u652f\u7684\u4e8b\u60c5\u4e86":42,"\u5728\u5b9e\u73b0\u8fc7\u7a0b\u4e2d":46,"\u5728\u5b9e\u9645\u5e94\u7528\u4e2d":82,"\u5728\u5bb9\u5668\u4e2d\u7f16\u8f91\u4ee3\u7801":1,"\u5728\u5bb9\u5668\u521b\u5efa\u540e":97,"\u5728\u5bf9\u5bb9\u5668\u7684\u63cf\u8ff0":97,"\u5728\u5bf9\u5e94\u7684":41,"\u5728\u5c42\u4e2d\u6307\u5b9a":105,"\u5728\u5c42\u521d\u59cb\u5316\u7684\u65f6\u5019":41,"\u5728\u5e8f\u5217\u751f\u6210\u4efb\u52a1\u4e2d":113,"\u5728\u5f00\u59cb\u8bad\u7ec3\u4e4b\u524d":11,"\u5728\u5f02\u6784\u96c6\u7fa4\u4e2d":10,"\u5728\u5f02\u6b65sgd\u4e2d":92,"\u5728\u5f15\u5165\u5176\u4ed6\u7c7b\u578b\u7684\u5934\u6587\u4ef6\u65f6":46,"\u5728\u5f53\u524d":81,"\u5728\u5f53\u524d\u7684\u5b9e\u73b0\u65b9\u5f0f\u4e0b":74,"\u5728\u5f97\u5230":97,"\u5728\u5feb\u7167\u5199\u5165\u5b8c\u6210\u540e":10,"\u5728\u60a8\u7684\u5b9e\u9645\u73af\u5883\u4e2d":10,"\u5728\u6211\u4eec\u7684\u4f8b\u5b50\u4e2d":114,"\u5728\u6267\u884c\u65f6":76,"\u5728\u63d0\u4ea4":72,"\u5728\u642d\u5efa\u795e\u7ecf\u7f51\u7edc\u7684\u8fc7\u7a0b\u4e2d":84,"\u5728\u65e0\u7279\u6b8a\u9700\u6c42\u60c5\u51b5\u4e0b":87,"\u5728\u6709\u666e\u901a\u7684cpu":42,"\u5728\u672c\u4f8b\u4e2d":[72,105,111],"\u5728\u672c\u6559\u7a0b\u4e2d":114,"\u5728\u672c\u6587\u6863\u4e2d":27,"\u5728\u672c\u793a\u4f8b\u4e2d":111,"\u5728\u672c\u8282\u4e2d":114,"\u5728\u673a\u7fa4\u4e0a\u8fd0\u884c\u8f6c\u6362\u7a0b\u5e8f":11,"\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b":81,"\u5728\u6811\u7684\u6bcf\u4e00\u5c42\u4e0a":103,"\u5728\u6837\u4f8b\u4e2d":46,"\u5728\u6b64":[102,105],"\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u4e2d":114,"\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u7684\u5b50\u5e8f\u5217\u957f\u5ea6\u53ef\u4ee5\u4e0d\u76f8\u7b49":111,"\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u957f":114,"\u5728\u6bcf\u4e2apod\u4e0a\u90fd\u901a\u8fc7volume\u65b9\u5f0f\u6302\u8f7d\u5206\u5e03\u5f0f\u6587\u4ef6\u7cfb\u7edf\u7684\u4e00\u4e2a\u76ee\u5f55\u7528\u4e8e\u4fdd\u5b58\u8bad\u7ec3\u6570\u636e\u548c\u8f93\u51fa\u7ed3\u679c":97,"\u5728\u6d4b\u8bd5\u9636\u6bb5":103,"\u5728\u6e90\u7801\u76ee\u5f55\u6811\u7684\u6839\u76ee\u5f55\u4e2d\u8fd0\u884c":72,"\u5728\u751f\u6210\u65f6":114,"\u5728\u7528\u6237\u4f7f\u7528c":46,"\u5728\u76f8\u5e94\u7684\u4f18\u5316\u7b97\u6cd5\u91cc\u8bbe\u7f6elearning_rate_schedule\u53ca\u76f8\u5173\u53c2\u6570":83,"\u5728\u76f8\u5e94layer\u7684":82,"\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u7b49\u4e8e\u4e00\u6b21\u9884\u6d4b\u5904\u7406\u7684\u6837\u672c\u6570":89,"\u5728\u7a0b\u5e8f\u5b9e\u73b0\u4e2d\u90fd\u4f1a\u8f6c\u5316\u4e3a\u4e8c\u7ef4\u77e9\u9635":89,"\u5728\u7b2c\u4e8c\u4e2atab":63,"\u5728\u7ebf\u4e0a\u7cfb\u7edf\u4e2d":91,"\u5728\u7ebf\u6a21\u578b\u9884\u6d4b\u670d\u52a1":11,"\u5728\u7ec4\u5408\u65f6":84,"\u5728\u7ec4\u7ec7\u795e\u7ecf\u7f51\u7edc\u8f93\u5165":90,"\u5728\u7ec4\u7ec7\u795e\u7ecf\u7f51\u7edc\u8f93\u5165\u65f6":89,"\u5728\u7ec8\u7aef\u6267\u884c":90,"\u5728\u7f16\u8bd1\u5bbf\u4e3b\u673a\u7248protoc\u53ef\u6267\u884c\u6587\u4ef6\u548c\u76ee\u6807\u673a\u7248openblas\u5e93\u65f6\u9700\u8981\u7528\u5230":[116,118],"\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d":74,"\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d":110,"\u5728\u8bad\u7ec3\u4e2d":114,"\u5728\u8bad\u7ec3\u4e4b\u524d":97,"\u5728\u8bad\u7ec3\u65f6":96,"\u5728\u8bad\u7ec3\u7ed3\u675f\u7684\u65f6\u5019\u518d\u4fdd\u5b58\u4e3apaddlepaddle\u7684\u683c\u5f0f":42,"\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d":97,"\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6bcfshow":103,"\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u53c2\u6570\u7684\u6743\u91cd\u548c\u68af\u5ea6":81,"\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u67d0\u4e00\u4e2alayer\u7684output":81,"\u5728\u8bbe\u7f6e":[116,117,118],"\u5728\u8bc4\u5ba1\u8fc7\u7a0b\u4e2d":63,"\u5728\u8be5\u793a\u4f8b\u4e2d":83,"\u5728\u8be5\u914d\u7f6e\u76847":111,"\u5728\u8c03\u7528":90,"\u5728\u8c03\u7528c":90,"\u5728\u8f6f\u4ef6\u5de5\u7a0b\u7684\u8303\u7574\u91cc":108,"\u5728\u8f93\u51fa\u7684\u8fc7\u7a0b\u4e2d":113,"\u5728\u8fd0\u884c\u5b8c\u6027\u80fd\u5206\u6790\u540e":107,"\u5728\u8fd0\u884c\u65f6\u5c06\u795e\u7ecf\u7f51\u7edc\u7684\u591a\u4e2a\u53ef\u5b66\u4e60\u53c2\u6570\u653e\u5728\u540c\u4e00\u4e2a\u76ee\u5f55\u4e2d":90,"\u5728\u8fd0\u884c\u795e\u7ecf\u7f51\u7edc\u8ba1\u7b97\u56fe\u65f6":76,"\u5728\u8fd9\u4e00\u90e8\u5206":104,"\u5728\u8fd9\u4e2a":63,"\u5728\u8fd9\u4e2a\u4f8b\u5b50\u91cc":[74,96],"\u5728\u8fd9\u4e2a\u51fd\u6570\u4e2d":111,"\u5728\u8fd9\u4e2a\u52a8\u6001\u5e93\u4e2d\u4e0d\u5d4c\u5165\u4efb\u4f55\u5176\u4ed6\u8bed\u8a00\u7684\u89e3\u91ca\u5668":45,"\u5728\u8fd9\u4e2a\u6559\u7a0b\u4e2d":108,"\u5728\u8fd9\u4e2a\u6a21\u578b\u4e2d":114,"\u5728\u8fd9\u4e2a\u9636\u6bb5\u7684\u4ee3\u7801\u6b63\u5728\u7ecf\u5386\u56de\u5f52\u6d4b\u8bd5":63,"\u5728\u8fd9\u4e9b\u5934\u6587\u4ef6\u4e2d":46,"\u5728\u8fd9\u4e9b\u6587\u4ef6\u4e2d":46,"\u5728\u8fd9\u4e9blayer\u4e2d":111,"\u5728\u8fd9\u65f6\u771f\u6b63\u7684\u5206\u914d\u5185\u5b58":76,"\u5728\u8fd9\u6bb5\u4ee3\u7801\u4e2d":76,"\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b":[74,114],"\u5728\u8fd9\u79cd\u7ed3\u6784\u4e2d":113,"\u5728\u8fd9\u7bc7\u6587\u6863\u91cc":96,"\u5728\u8fd9\u7bc7\u6587\u7ae0\u91cc":97,"\u5728\u8fd9\u91cc":113,"\u5728\u8fd9\u91cc\u6211\u4eec\u4f7f\u7528\u5168\u8fde\u63a5\u5c42\u4f5c\u4e3a\u4f8b\u5b50\u6765\u5c55\u793a\u5b9e\u73b0\u65b0\u7f51\u7edc\u5c42\u6240\u9700\u8981\u7684\u56db\u4e2a\u6b65\u9aa4":74,"\u5728\u8fd9\u91cc\u7528eigenvector\u6765\u8868\u793a":76,"\u5728\u8fd9\u91cc\u9700\u8981\u6ce8\u610f\u7684\u662f":76,"\u5728\u8fdb\u884c\u5206\u5e03\u5f0f\u8bad\u7ec3\u65f6":91,"\u5728\u8fdb\u884c\u7f51\u7edc\u914d\u7f6e\u4e4b\u524d":84,"\u5728\u91c7\u7528sgd":83,"\u5728\u91cd\u6784\u524d\u7684paddlepaddle\u4e2d":42,"\u5728\u95ee\u9898\u672c\u8eab\u7684\u8ba1\u7b97\u91cf\u6bd4\u8f83\u5c0f\u7684\u65f6\u5019":41,"\u5728\u96c6\u7fa4\u4e0a\u8bad\u7ec3\u4e00\u4e2a\u7a00\u758f\u6a21\u578b\u9700\u8981\u52a0\u4e0a\u4e0b\u9762\u7684\u53c2\u6570":105,"\u5728\u975e\u5e8f\u5217\u8f93\u5165\u65f6":81,"\u5728android\u5e73\u53f0\u4e0a\u4e0d\u652f\u6301\u901a\u8fc7swig\u8c03\u7528\u6765\u8bad\u7ec3\u6216\u8005\u9884\u6d4b":116,"\u5728android\u5e73\u53f0\u4e0a\u53ea\u652f\u6301\u4f7f\u7528c":116,"\u5728batch":41,"\u5728build\u76ee\u5f55\u4e0b\u6267\u884c":78,"\u5728c":[45,89],"\u5728c\u7684\u5934\u6587\u4ef6":45,"\u5728cmake\u53c2\u6570\u914d\u7f6e\u4e0a":[116,117],"\u5728cmake\u7684\u547d\u4ee4\u884c\u4e2d":0,"\u5728eigen\u4e2d":76,"\u5728hpc\u9886\u57df\u4f7f\u7528\u975e\u5e38\u7684\u5e7f\u6cdb":94,"\u5728ios\u5e73\u53f0\u4e0a\u4e0d\u652f\u6301\u901a\u8fc7swig\u8c03\u7528\u6765\u8bad\u7ec3\u6216\u8005\u9884\u6d4b":117,"\u5728ios\u5e73\u53f0\u4e0a\u53ea\u652f\u6301\u4f7f\u7528c":117,"\u5728main":107,"\u5728openmpi\u96c6\u7fa4\u4e2d\u542f\u52a8\u8bad\u7ec3":94,"\u5728packing\u4e0a\u7684\u8017\u65f6":41,"\u5728paddl":97,"\u5728paddle\u4e2d":105,"\u5728paddle\u4e4b\u4e0a\u8fd0\u884c\u7684\u6df1\u5ea6\u5b66\u4e60\u8bad\u7ec3\u8f93\u51fa\u7684\u6a21\u578b\u4f1a\u63d0\u4f9b\u7ed9\u5728\u7ebf\u4eba\u8138\u8bc6\u522b\u7684\u5e94\u7528\u4f7f\u7528":11,"\u5728paddlepaddl":89,"\u5728paddlepaddle\u4e2d":[84,113],"\u5728paddlepaddle\u4e2d\u4f7f\u7528dropout\u6709\u4e24\u79cd\u65b9\u5f0f":82,"\u5728paddlepaddle\u4e2d\u5305\u542b\u4ee5\u4e0b":82,"\u5728paddlepaddle\u5185\u90e8":[89,90],"\u5728paddlepaddle\u7684\u6587\u6863\u4e2d":111,"\u5728paramet":10,"\u5728python\u811a\u672c\u4e2d\u5b9e\u73b0\u4e0e\u524d\u5411operator\u76f8\u540c\u7684\u8ba1\u7b97\u903b\u8f91":75,"\u5728rnn\u7684\u60c5\u51b5\u4e0b":41,"\u5728step\u51fd\u6570\u4e2d\u5b9a\u4e49":113,"\u5728step\u51fd\u6570\u4e2d\u5b9a\u4e49memori":113,"\u5728trainer":105,"\u5728trainer\u4e2d\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u53d6\u6a21\u7684\u65b9\u6cd5\u4e3a\u6bcf\u4e2atrainer\u5206\u914d\u8bad\u7ec3\u6570\u636e\u6587\u4ef6":91,"\u5747\u4f1a\u5b58\u653e\u4e8e":87,"\u5747\u4f1a\u88ab\u5b89\u88c5\u5230includ":46,"\u5747\u5300\u5206\u5e03":83,"\u5747\u5300\u5206\u5e03\u7684\u8303\u56f4\u662f":103,"\u5747\u662f\u5728":46,"\u5747\u6709\u4e09\u4e2a\u5b50\u5e8f\u5217":111,"\u5747\u6709\u4e24\u7ec4\u7279\u5f81":111,"\u57fa\u4e8e\u53cc\u5c42\u5e8f\u5217\u8f93\u5165":113,"\u57fa\u4e8e\u7c98\u6027\u4f1a\u8bdd\u7684\u8d1f\u8f7d\u5747\u8861\u529f\u80fd":27,"\u57fa\u672c\u4f7f\u7528\u6982\u5ff5":[85,90],"\u57fa\u7840\u4e0a":89,"\u57fa\u7c7b":75,"\u586b\u5199":72,"\u589e\u52a0":75,"\u589e\u52a0\u4e86\u4e00\u6761cd\u547d\u4ee4":96,"\u589e\u52a0\u4e86\u8bbe\u5907\u7c7b\u578b":75,"\u589e\u52a0\u5982\u4e0b\u53c2\u6570":105,"\u589e\u52a0\u68af\u5ea6\u68c0\u6d4b\u7684\u5355\u5143\u6d4b\u8bd5":74,"\u5904\u7406\u5668\u6709\u4e24\u4e2a\u5173\u952e\u6027\u80fd\u9650\u5236":108,"\u5904\u7406\u7684\u8f93\u5165\u5e8f\u5217\u4e3b\u8981\u5206\u4e3a\u4ee5\u4e0b\u4e09\u79cd\u7c7b\u578b":113,"\u5907\u6ce8":108,"\u590d\u6742\u5ea6\u6216\u65f6\u95f4\u590d\u6742\u5ea6":108,"\u5916\u5c42\u5e8f\u5217\u5728":89,"\u5916\u5c42memory\u662f\u4e00\u4e2a\u5143\u7d20":111,"\u5916\u5c42outer_step\u4e2d":111,"\u5916\u90e8\u5b58\u50a8":42,"\u591a\u4e2a":90,"\u591a\u4e2a\u503c":11,"\u591a\u4e2a\u5c42\u7684\u8f93\u51fa\u77e9\u9635\u7684\u9ad8\u5ea6\u4e0d\u4e00\u81f4\u5bfc\u81f4\u62fc\u63a5\u5931\u8d25":81,"\u591a\u4e2a\u6392\u6210\u4e00\u5217\u7684\u5143\u7d20":89,"\u591a\u4e2a\u8f93\u51fa\u5c42\u5904\u7406\u591a\u4e2a\u4e0d\u540c\u957f\u5ea6\u7684\u5e8f\u5217":81,"\u591a\u4e2aip\u4f7f\u7528":91,"\u591a\u4e2aparamet":10,"\u591a\u53e5\u8bdd\u8fdb\u4e00\u6b65\u6784\u6210\u4e86\u6bb5\u843d":113,"\u591a\u673a\u8bad\u7ec3":81,"\u591a\u6b21\u8c03\u7528":41,"\u591a\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1":90,"\u591a\u8f6e\u5bf9\u8bdd\u7b49\u66f4\u4e3a\u590d\u6742\u7684\u8bed\u8a00\u6570\u636e":113,"\u5927\u4e8e\u7b49\u4e8e\u4e00\u4e2a":90,"\u5927\u591a\u6570\u5c42\u4e0d\u9700\u8981\u8fdc\u7a0b\u7a00\u758f\u8bad\u7ec3\u51fd\u6570":74,"\u5927\u591a\u6570\u5c42\u9700\u8981\u8bbe\u7f6e\u4e3a":74,"\u5927\u591a\u6570\u7f51\u7edc\u5c42\u4e0d\u9700\u8981\u652f\u6301\u8fdc\u7a0b\u7a00\u758f\u66f4\u65b0":74,"\u5927\u591a\u6570\u8bed\u8a00\u90fd\u652f\u6301\u4f7f\u7528c\u8bed\u8a00api":45,"\u5927\u5bb6\u53ef\u4ee5\u7528\u628a\u5f00\u53d1\u5de5\u5177\u5b89\u88c5\u8fdb\u5165":0,"\u5927\u5bb6\u53ef\u4ee5\u901a\u8fc7\u5b83\u9605\u8bfb\u6559\u7a0b":1,"\u5927\u5c0f\u4e0d\u4e00\u6837\u65f6":81,"\u5927\u6982\u82b1\u5341\u5206\u949f\u770b\u4e00\u4e0b":0,"\u5927\u90e8\u5206":107,"\u5934\u4fe1\u606f\u4e2d":83,"\u5934\u6587\u4ef6\u4e2d\u628a\u53c2\u6570\u5b9a\u4e49\u4e3a\u7c7b\u7684\u6210\u5458\u53d8\u91cf":74,"\u5934\u6587\u4ef6\u5982\u4e0b":74,"\u5982":[72,75,105,114],"\u5982\u4e0a\u4e00\u5c0f\u8282\u6240\u793a":76,"\u5982\u4e0b":91,"\u5982\u4e0b\u56fe\u6240\u793a":[108,111],"\u5982\u4e0b\u6240\u793a":105,"\u5982\u4f55\u8d21\u732e\u4ee3\u7801":73,"\u5982\u4f55\u8d21\u732e\u6587\u6863":73,"\u5982\u56fe\u4e2dtrainer":10,"\u5982\u6709":75,"\u5982\u679c\u4e00\u4e2a\u7f51\u7edc\u5c42\u9700\u8981\u914d\u7f6e\u7684\u8bdd":74,"\u5982\u679c\u4e0a\u9762\u4e24\u6b65\u51fa\u73b0\u9519\u8bef":10,"\u5982\u679c\u4e0d\u4e3a0":103,"\u5982\u679c\u4e0d\u4f7f\u7528\u5206\u5e03\u5f0f\u5b58\u50a8":91,"\u5982\u679c\u4e0d\u60f3\u4f7f\u7528":77,"\u5982\u679c\u4e0d\u6307\u5b9a\u8fd9\u4e2a\u6587\u4ef6":107,"\u5982\u679c\u4e0d\u6536\u655b":83,"\u5982\u679c\u4e0d\u9700\u8981\u5916\u90e8\u5b58\u50a8\u7528\u4e8e\u8f6c\u6362":42,"\u5982\u679c\u4e3a0":103,"\u5982\u679c\u4e3a\u5426\u5219\u662f\u7528openbla":0,"\u5982\u679c\u4e3afals":103,"\u5982\u679c\u4e3atrue":103,"\u5982\u679c\u4e4b\u540e\u60f3\u8981\u91cd\u65b0\u8bbe\u7f6e":0,"\u5982\u679c\u4ec5\u4ec5\u4fee\u6539\u4e00\u4e2a\u6587\u4ef6\u4f46\u63d0\u4ea4\u4e86\u5341\u51e0\u4e2acommit":72,"\u5982\u679c\u4ecd\u7136\u5b58\u5728\u95ee\u9898":3,"\u5982\u679c\u4ed4\u7ec6\u8bbe\u7f6e\u7684\u8bdd":103,"\u5982\u679c\u4f60\u53ea\u9700\u8981\u4f7f\u7528\u7b80\u5355\u7684rnn":114,"\u5982\u679c\u4f60\u60f3\u4f7f\u7528\u8fd9\u4e9b\u7279\u6027":105,"\u5982\u679c\u4f60\u60f3\u8981\u4fdd\u5b58\u67d0\u4e9b\u5c42\u7684\u7279\u5f81\u56fe":103,"\u5982\u679c\u4f60\u66fe\u5728\u6e90\u7801\u76ee\u5f55\u4e0b\u7f16\u8bd1\u8fc7\u5176\u4ed6\u5e73\u53f0\u7684paddlepaddle\u5e93":117,"\u5982\u679c\u4f60\u66fe\u7ecf\u5728\u6e90\u7801\u76ee\u5f55\u4e0b\u7f16\u8bd1\u8fc7\u5176\u4ed6\u5e73\u53f0\u7684paddlepaddle\u5e93":[116,118],"\u5982\u679c\u4f60\u6b63\u5728\u5904\u7406\u5e8f\u5217\u6807\u8bb0\u4efb\u52a1":114,"\u5982\u679c\u4f60\u8981\u4e3a\u4e86\u6d4b\u8bd5\u800c\u589e\u52a0\u65b0\u7684\u6587\u4ef6":74,"\u5982\u679c\u4f7f\u7528":90,"\u5982\u679c\u4f7f\u7528\u81ea\u884c":0,"\u5982\u679c\u4f7f\u7528mkl\u5e76\u4e14\u673a\u5668\u542b\u6709avx2\u6307\u4ee4\u96c6":0,"\u5982\u679c\u4f7f\u7528swig\u6211\u4eec\u9700\u8981\u5c06\u5728interface\u6587\u4ef6\u91cc":45,"\u5982\u679c\u5173\u95edmkl":0,"\u5982\u679c\u51fa\u73b0\u4ee5\u4e0bpython\u76f8\u5173\u7684\u5355\u5143\u6d4b\u8bd5\u90fd\u8fc7\u4e0d\u4e86\u7684\u60c5\u51b5":78,"\u5982\u679c\u53c2\u6570\u4fdd\u5b58\u4e0b\u6765\u7684\u6a21\u578b\u76ee\u5f55":81,"\u5982\u679c\u53d1\u73b0\u6700\u65e9\u7684\u62a5\u9519\u5c31\u662f\u7f51\u7edc\u901a\u4fe1\u7684\u95ee\u9898":79,"\u5982\u679c\u540c\u65f6\u4f7f\u7528":91,"\u5982\u679c\u5728\u4f7f\u7528mkl":42,"\u5982\u679c\u5728\u5b89\u88c5\u8fc7\u7a0b\u4e2d\u9047\u5230\u4e86\u95ee\u9898":2,"\u5982\u679c\u5728\u70b9\u51fb\u4e0b\u9762\u94fe\u63a5\u65f6\u51fa\u73b0\u5982\u4e0b\u767b\u9646\u754c\u9762":3,"\u5982\u679c\u5728\u7f16\u8bd1":87,"\u5982\u679c\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u672a\u8bbe\u7f6easync":103,"\u5982\u679c\u5728\u8bad\u7ec3\u671f\u95f4\u540c\u65f6\u53d1\u8d77\u53e6\u5916\u4e00\u4e2a\u8fdb\u7a0b\u8fdb\u884c\u6d4b\u8bd5":103,"\u5982\u679c\u5728\u8bad\u7ec3\u914d\u7f6e\u4e2d\u8bbe\u7f6ebatch":103,"\u5982\u679c\u5728\u8bad\u7ec3nlp\u76f8\u5173\u6a21\u578b\u65f6":83,"\u5982\u679c\u591a\u4e2aop\u4f9d\u8d56\u4e00\u4e9b\u5171\u7528\u7684\u51fd\u6570":75,"\u5982\u679c\u5931\u8d25":63,"\u5982\u679c\u5b58\u5728\u6570\u636e\u6392\u5217\u683c\u5f0f\u4e0d\u4e00\u6837\u7684\u60c5\u51b5\u65f6":42,"\u5982\u679c\u5b58\u5728\u67d0\u4e9btrainer\u6267\u884c\u901f\u5ea6\u8fc7\u6162\u4f1a\u5f71\u54cd\u6574\u4f53\u96c6\u7fa4\u7684\u901f\u5ea6":10,"\u5982\u679c\u5c06\u8fd9\u4e2a\u5185\u5b58\u6c60\u51cf\u5c0f":81,"\u5982\u679c\u5c0f\u4e8e75m":78,"\u5982\u679c\u5df2\u7ecf\u6709pod\u8fd0\u884c":97,"\u5982\u679c\u5df2\u7ecf\u6b63\u5728\u6267\u884c\u4fdd\u5b58\u68c0\u67e5\u70b9\u7684\u7ebf\u7a0b":10,"\u5982\u679c\u5e0c\u671b\u53ef\u4ee5\u5728\u540e\u53f0\u8fd0\u884cpserver\u7a0b\u5e8f":91,"\u5982\u679c\u5f53\u524dmpi\u96c6\u7fa4\u5e76\u4e0d\u652f\u6301\u4efb\u52a1\u72ec\u5360\u6a21\u5f0f":79,"\u5982\u679c\u60a8\u5728\u4f7f\u7528window":1,"\u5982\u679c\u60a8\u60f3\u8981\u66f4\u6df1\u5165\u4e86\u89e3deep":1,"\u5982\u679c\u60a8\u671f\u671b\u5728\u7f16\u8bd1\u5b8c\u6210\u540e\u7acb\u5373\u6267\u884c\u6240\u6709\u7684\u5355\u5143\u6d4b\u8bd5":0,"\u5982\u679c\u60a8\u6ca1\u6709\u542c\u8bf4":0,"\u5982\u679c\u60a8\u7684\u7535\u8111\u4e0d\u652f\u6301avx":1,"\u5982\u679c\u60a8\u7684\u95ee\u9898\u672a\u5728\u6b64\u5904":80,"\u5982\u679c\u60a8\u7684gpu\u7406\u8bba\u53ef\u4ee5\u8fbe\u52306":108,"\u5982\u679c\u60a8\u9009\u62e9\u4e0d\u4f7f\u7528docker\u955c\u50cf":0,"\u5982\u679c\u60f3\u4f7f\u7528\u53ef\u89c6\u5316\u7684\u5206\u6790\u5668":108,"\u5982\u679c\u60f3\u5f88\u597d\u7684\u7406\u89e3\u7a0b\u5e8f\u7684\u884c\u4e3a":108,"\u5982\u679c\u60f3\u6539\u53d8\u539f\u6709tensor\u7684shape\u4fe1\u606f":76,"\u5982\u679c\u60f3\u8981\u4e86\u89e3\u53cc\u5c42rnn\u5728\u5177\u4f53\u95ee\u9898\u4e2d\u7684\u4f7f\u7528":111,"\u5982\u679c\u60f3\u8981\u542f\u7528paddlepaddle\u7684\u5185\u7f6e\u5b9a\u65f6\u5668":108,"\u5982\u679c\u60f3\u8be6\u7ec6\u4e86\u89e3":94,"\u5982\u679c\u6211\u77e5\u9053\u5185\u6838\u82b1\u4e8610ms\u6765\u79fb\u52a81gb\u6570\u636e":108,"\u5982\u679c\u6307\u5b9a\u4e862\u4e2alayer\u4f5c\u4e3a\u8f93\u51fa\u5c42":81,"\u5982\u679c\u63d0\u793a\u6b63\u786e":77,"\u5982\u679c\u652f\u6301\u589e\u52a0\u6b64\u53c2\u6570\u63d0\u4ea4":79,"\u5982\u679c\u662f\u4e00\u4e2a\u5e8f\u5217\u8f93\u5165":89,"\u5982\u679c\u662f\u5176\u5b83\u7c7b\u578b":11,"\u5982\u679c\u662f\u7528\u7f16\u8bd1\u65f6\u6307\u5b9acpu\u7248\u672c":87,"\u5982\u679c\u6709\u591a\u4e2a\u8f93\u5165":113,"\u5982\u679c\u6709\u591a\u4e2a\u8f93\u5165\u5e8f\u5217":113,"\u5982\u679c\u6709\u9700\u8981\u4fee\u6539\u7684\u5730\u65b9":72,"\u5982\u679c\u6709bugfix\u7684\u884c\u4e3a":63,"\u5982\u679c\u671f\u671b\u6267\u884c\u5176\u4e2d\u4e00\u4e2a\u5355\u5143\u6d4b\u8bd5":0,"\u5982\u679c\u672a\u8bbe\u7f6e":103,"\u5982\u679c\u672a\u8bbe\u7f6egpu":105,"\u5982\u679c\u673a\u5668\u4e2d\u5df2\u7ecf\u5b89\u88c5\u8fc7paddlepaddl":0,"\u5982\u679c\u67d0\u4e00\u4e2a\u7c7b\u578b\u9700\u8981\u5f15\u7528\u53e6\u4e00\u4e2a\u7c7b\u578b":46,"\u5982\u679c\u67d0\u4e00\u4e2apaddl":46,"\u5982\u679c\u67d0\u4e00\u4e2apaddle\u6982\u5ff5\u5fc5\u987b\u8981\u66b4\u9732":46,"\u5982\u679c\u67d0\u4e00\u5757\u6839\u672c\u5c31\u4e0d\u600e\u4e48\u8017\u65f6":108,"\u5982\u679c\u68c0\u67e5\u5230\u5206\u914d\u5728\u4e0d\u540c\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u7684\u53c2\u6570\u7684\u5206\u5e03\u4e0d\u5747\u5300\u6b21\u6570\u5927\u4e8echeck":103,"\u5982\u679c\u6ca1\u6709\u5b89\u88c5nvidia":1,"\u5982\u679c\u6ca1\u6709\u5b9a\u4e49memori":113,"\u5982\u679c\u6ca1\u8fc7":72,"\u5982\u679c\u6d88\u606f\u6570\u636e\u592a\u5c0f":103,"\u5982\u679c\u6ee1\u8db3\u6761\u4ef6":10,"\u5982\u679c\u7528\u516c\u7528\u7684\u7535\u8111\u5f00\u53d1":0,"\u5982\u679c\u7528\u6237\u4e0d\u9700\u8981\u8bbf\u95eelstm\u7684\u4e2d\u95f4\u53d8\u91cf":82,"\u5982\u679c\u7528\u6237\u60f3\u8981\u81ea\u5b9a\u4e49\u521d\u59cb\u5316\u65b9\u5f0f":83,"\u5982\u679c\u7528\u6237\u8981\u628apaddle\u7684\u9759\u6001\u5e93":45,"\u5982\u679c\u7528\u81ea\u5df1\u7684\u7535\u8111\u5f00\u53d1":0,"\u5982\u679c\u771f\u60f3\u6316\u6398\u5185\u6838\u6df1\u5904\u7684\u67d0\u4e2a\u79d8\u5bc6":108,"\u5982\u679c\u795e\u7ecf\u7f51\u7edc\u6709\u591a\u4e2a\u8f93\u5165\u6216\u8005\u591a\u4e2a\u8f93\u51fa":[89,90],"\u5982\u679c\u7a0b\u5e8f\u5d29\u6e83\u4f60\u4e5f\u53ef\u4ee5\u624b\u52a8\u7ec8\u6b62":93,"\u5982\u679c\u7cfb\u7edf\u5b89\u88c5\u4e86\u591a\u4e2apython\u7248\u672c":78,"\u5982\u679c\u7cfb\u7edf\u652f\u6301":[3,78],"\u5982\u679c\u7cfb\u7edf\u652f\u6301\u7684\u662f":[3,78],"\u5982\u679c\u7f16\u8bd1\u65f6\u6307\u5b9a\u7f16\u8bd1cpu\u7248\u672c":87,"\u5982\u679c\u7f16\u8bd1\u65f6\u6307\u5b9a\u7f16\u8bd1gpu\u7248\u672c":87,"\u5982\u679c\u7f51\u7edc\u5c42\u4e0d\u9700\u8981\u8fdc\u7a0b\u7a00\u758f\u66f4\u65b0":74,"\u5982\u679c\u7f51\u7edc\u67b6\u6784\u7b80\u5355":114,"\u5982\u679c\u8981\u4e0a\u4f20gpu\u7248\u672c\u7684\u5305":63,"\u5982\u679c\u8981\u542f\u7528gpu":101,"\u5982\u679c\u8981\u8fd0\u884c\u6240\u6709\u7684\u5355\u5143\u6d4b\u8bd5":72,"\u5982\u679c\u89e3\u51b3\u4e86\u67d0\u4e2aissue\u7684\u95ee\u9898":72,"\u5982\u679c\u8bad\u7ec3\u4e00\u4e2apass":83,"\u5982\u679c\u8bad\u7ec3\u8fc7\u7a0b\u7684\u7684cost\u660e\u663e\u9ad8\u4e8e\u8fd9\u4e2a\u5e38\u6570\u8f93\u51fa\u7684cost":83,"\u5982\u679c\u8bbe\u7f6e\u8be5\u53c2\u6570":103,"\u5982\u679c\u8bc4\u5ba1\u610f\u89c1\u6bd4\u8f83\u591a":72,"\u5982\u679c\u8c03\u7528\u9759\u6001\u5e93\u53ea\u80fd\u5c06\u9759\u6001\u5e93\u4e0e\u89e3\u91ca\u5668\u94fe\u63a5":45,"\u5982\u679c\u8f93\u5165\u662f\u5e8f\u5217\u6570\u636e":89,"\u5982\u679c\u8f93\u51fa\u662f\u4e00\u4e2a\u5e8f\u5217":89,"\u5982\u679c\u8f93\u51fa\u662fno":1,"\u5982\u679c\u8fd0\u884c":78,"\u5982\u679c\u8fd8\u4e0d\u884c":78,"\u5982\u679c\u95ee\u9898\u6ca1\u6709\u5f97\u5230\u89e3\u51b3":2,"\u5982\u679c\u9700\u8981":87,"\u5982\u679c\u9700\u8981\u5728c\u7ef4\u5ea6\u8ba1\u7b97softmax":82,"\u5982\u679c\u9700\u8981\u5b89\u88c5\u652f\u6301gpu\u7684\u7248\u672c":[3,86],"\u5982\u679c\u9700\u8981\u624b\u52a8\u7f16\u8bd1":63,"\u5982\u679c\u9700\u8981\u6269\u5927\u77e9\u9635":74,"\u5982\u679c\u9700\u8981\u7f29\u51cf\u77e9\u9635":74,"\u5982\u679c\u9700\u8981\u83b7\u53d6":3,"\u5982\u679ccuda":75,"\u5982\u679clearning_rate\u592a\u5927":83,"\u5982\u679clearning_rate\u592a\u5c0f":83,"\u5982\u679cmkl":42,"\u5982\u679cop\u6ca1\u6709\u5b9e\u73b0cuda":75,"\u5982\u679cop\u7684\u67d0\u4e2a\u8f93\u5165\u4e0d\u53c2\u4e0e\u53cd\u5411\u68af\u5ea6\u7684\u8ba1\u7b97":75,"\u5982\u679cpaddlepaddle\u5305\u5df2\u7ecf\u5728python\u7684sit":78,"\u5982\u679cpaddlepaddle\u5e93\u9700\u8981\u540c\u65f6\u652f\u6301\u771f\u673a\u548c\u6a21\u62df\u5668":117,"\u5982\u679cparamet":10,"\u5982\u6bcf\u4e2a\u6587\u4ef6\u53ea\u6709\u4e00\u4e2a":72,"\u5982\u795e\u7ecf\u5143\u6fc0\u6d3b\u503c\u7b49":81,"\u5982\u8981build\u8fd9\u4e2a\u5f00\u53d1\u955c\u50cf":72,"\u5982\u9ad8\u4eae\u90e8\u5206":108,"\u5982train":91,"\u5b50\u53e5":113,"\u5b50\u53e5\u7684\u5355\u8bcd\u6570\u548c\u6307\u5b9a\u7684\u4e00\u4e2a\u8f93\u5165\u5e8f\u5217\u4e00\u81f4":113,"\u5b50\u7c7b\u53ea\u9700\u8981\u4f7f\u7528\u5b9a\u4e49\u597d\u7684\u63a5\u53e3":42,"\u5b57\u6bb5\u4e2d":97,"\u5b57\u6bb5\u4e3a\u4f8b":81,"\u5b57\u6bb5\u7684\u53d6\u503c":89,"\u5b57\u6bb5\u8868\u793a\u5bb9\u5668\u7684\u73af\u5883\u53d8\u91cf":97,"\u5b57\u6bb5\u8868\u793a\u8fd9\u4e2ajob\u4f1a\u540c\u65f6\u5f00\u542f3\u4e2apaddlepaddle\u8282\u70b9":97,"\u5b57\u6bb5\u8bbe\u4e3a":63,"\u5b57\u7b26\u4e32":11,"\u5b58\u50a8":[11,89,90],"\u5b58\u50a8\u6d6e\u70b9\u7c7b\u578b\u8f93\u5165":90,"\u5b58\u50a8\u7684\u538b\u7f29\u6587\u4ef6":90,"\u5b58\u6570\u6570\u636e":89,"\u5b66\u4e60":0,"\u5b66\u4e60\u6210\u672c\u9ad8":45,"\u5b66\u4e60\u7387\u4e3a":83,"\u5b83\u4eec\u4e3b\u8981\u662f\u7528\u4e8e":42,"\u5b83\u4eec\u7684\u6587\u4ef6\u540d\u662f":11,"\u5b83\u5305\u542b\u4ee5\u4e0b\u51e0\u6b65":74,"\u5b83\u5305\u542b\u4ee5\u4e0b\u53c2\u6570":74,"\u5b83\u53ea\u4f1a\u5305\u62ec\u751f\u6210\u597d\u7684\u52a8\u6001\u5e93\u548c\u5934\u6587\u4ef6":42,"\u5b83\u53eb\u505a":114,"\u5b83\u53ef\u4ee5\u5e2e\u52a9\u51cf\u5c11\u5206\u53d1\u5ef6\u8fdf":93,"\u5b83\u53ef\u4ee5\u5e2e\u52a9\u6211\u4eec\u683c\u5f0f\u5316\u6e90\u4ee3\u7801":72,"\u5b83\u53ef\u4ee5\u6307\u6d4b\u91cf\u4e00\u4e2a\u7a0b\u5e8f\u7684\u7a7a\u95f4":108,"\u5b83\u53ef\u80fd\u6709\u4e0d\u6b62\u4e00\u4e2a\u6743\u91cd":74,"\u5b83\u5b9a\u4e49\u4e86":114,"\u5b83\u5b9a\u4e49\u89e3\u7801\u7f51\u7edc\u7684":114,"\u5b83\u5c06\u88ab\u5206\u53d1\u5230":93,"\u5b83\u5e76\u4e0d\u662f\u4e00\u4e2a\u5b8c\u6574\u7684recurr":82,"\u5b83\u5e94\u8be5\u6253\u5370\u51fa\u9884\u6d4b\u4f4f\u623f\u6570\u636e\u7684\u6e05\u5355":86,"\u5b83\u652f\u6301\u591a\u7ebf\u7a0b\u66f4\u65b0":74,"\u5b83\u662finteger_value\u7c7b\u578b\u7684":111,"\u5b83\u662finteger_value_sequence\u7c7b\u578b\u7684":111,"\u5b83\u6709\u52a9\u4e8e\u5e2e\u52a9\u9891\u7e41\u4fee\u6539\u548c\u8bbf\u95ee\u5de5\u4f5c\u533a\u6587\u4ef6\u7684\u7528\u6237\u51cf\u5c11\u8d1f\u62c5":93,"\u5b83\u7684":114,"\u5b83\u7684\u529f\u80fd\u662f":75,"\u5b83\u7684\u6bcf\u4e00\u4e2a\u5143\u7d20":110,"\u5b83\u7684\u8f93\u5165\u4e0e\u7ecf\u8fc7\u5b66\u4e60\u7684\u53c2\u6570\u505a\u5185\u79ef\u5e76\u52a0\u4e0a\u504f\u7f6e":74,"\u5b83\u8d1f\u8d23\u51b3\u5b9a\u7f16\u8bd1\u65f6\u662f\u5426\u4f7f\u7528mklml\u548cmkl":42,"\u5b83\u9996\u5148\u8c03\u7528\u57fa\u6784\u9020\u51fd\u6570":74,"\u5b89\u88c5":107,"\u5b89\u88c5\u4e0e\u7f16\u8bd1":115,"\u5b89\u88c5\u4e0e\u7f16\u8bd1c":88,"\u5b89\u88c5\u540e":1,"\u5b89\u88c5\u540e\u7684\u76ee\u5f55\u7ed3\u6784\u4e3a":46,"\u5b89\u88c5\u597ddocker\u4e4b\u540e\u53ca\u53ef\u7528\u4ee5\u4e0b\u547d\u4ee4\u542f\u52a8\u5de5\u5177":77,"\u5b89\u88c5\u597ddocker\u4e4b\u540e\u53ef\u4ee5\u4f7f\u7528\u6e90\u7801\u76ee\u5f55\u4e0b\u7684\u811a\u672c\u6784\u5efa\u6587\u6863":77,"\u5b89\u88c5\u5b8c\u6210\u4e4b\u540e":[101,117],"\u5b89\u88c5\u5b8c\u6bd5\u540e":107,"\u5b89\u88c5\u6587\u6863":84,"\u5b89\u88c5\u65b9\u5f0f\u6765\u5feb\u901f\u5b89\u88c5paddlepaddl":101,"\u5b8c\u6210":72,"\u5b8c\u6210\u4e00\u4e2a\u4f20\u8f93\u52a8\u4f5c\u5b8c\u6210\u7684\u65f6\u95f4\u4e5f\u6bd4\u8f83\u77ed":27,"\u5b8c\u6210\u4e0a\u8ff0\u51c6\u5907\u4e4b\u540e":90,"\u5b8c\u6210\u4efb\u610f\u7684\u8fd0\u7b97\u903b\u8f91":113,"\u5b8c\u6210\u540evolume\u4e2d\u7684\u6587\u4ef6\u5185\u5bb9\u5927\u81f4\u5982\u4e0b":97,"\u5b8c\u6210\u5728windows\u4e0a\u5b89\u88c5\u548c\u4f7f\u7528dock":1,"\u5b8c\u6210\u5b89\u88c5":3,"\u5b8c\u6210\u5e38\u7528layer\u7684mkl":42,"\u5b8c\u6210\u5e38\u89c1\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edcvgg":42,"\u5b8c\u6210\u6570\u636e\u7684\u9884\u5904\u7406":11,"\u5b8c\u6210\u76f8\u5e94\u7684\u8ba1\u7b97":110,"\u5b8c\u6210\u81ea\u52a8\u5316\u4e8c\u8fdb\u5236\u7f16\u8bd1":63,"\u5b8c\u6210paddlepaddle\u7684\u5b89\u88c5":84,"\u5b8c\u6574\u4ee3\u7801\u53ef\u4ee5\u53c2\u8003\u793a\u4f8b":81,"\u5b8c\u6574\u4ee3\u7801\u53ef\u4ee5\u67e5\u770b":90,"\u5b8c\u6574\u6e90\u7801\u53ef\u53c2\u8003":83,"\u5b8c\u6574\u7684\u53c2\u6570\u77e9\u9635\u88ab\u5206\u5e03\u5728\u4e0d\u540c\u7684\u53c2\u6570\u670d\u52a1\u5668\u4e0a":74,"\u5b8c\u6574\u7684\u914d\u7f6e\u6587\u4ef6\u5728":114,"\u5b98\u65b9\u6587\u6863":0,"\u5b9a\u4e49":42,"\u5b9a\u4e49\u4e00\u4e2a\u65f6\u95f4\u6b65\u4e4b\u5185rnn\u5355\u5143\u5b8c\u6210\u7684\u8ba1\u7b97":113,"\u5b9a\u4e49\u4e00\u4e9b\u9664\u4e86layer\u548cmemory\u76f8\u5173\u7684\u7c7b\u548c\u51fd\u6570":42,"\u5b9a\u4e49\u4e86\u4e00\u4e2a\u53ea\u8bfb\u7684memori":113,"\u5b9a\u4e49\u4e86lstm\u5355\u5143\u5728\u4e00\u4e2a\u65f6\u95f4\u6b65\u5185\u7684\u8ba1\u7b97\u8fc7\u7a0b":82,"\u5b9a\u4e49\u4f4d\u7f6e":75,"\u5b9a\u4e49\u5728\u5916\u5c42":113,"\u5b9a\u4e49\u5f02\u6b65\u8bad\u7ec3\u7684\u957f\u5ea6":103,"\u5b9a\u4e49\u6e90\u8bed\u53e5\u7684\u6570\u636e\u5c42":114,"\u5b9a\u4e49\u7684\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784":90,"\u5b9a\u4e49\u7c7b\u578b":75,"\u5b9a\u4e49\u89e3\u7801\u5668\u7684memori":114,"\u5b9a\u4e49\u8f93\u5165":75,"\u5b9a\u4e49\u8f93\u51fa":75,"\u5b9a\u4e49\u8f93\u51fa\u51fd\u6570":114,"\u5b9a\u4e49\u95e8\u63a7\u5faa\u73af\u5355\u5143\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u5355\u6b65\u51fd\u6570":114,"\u5b9d\u5854\u7684\u5e95\u7aef\u9700\u8981\u575a\u5b9e\u7684\u57fa\u5ea7\u6765\u652f\u6491":84,"\u5b9e\u73b0\u4e24\u4e2a\u5b8c\u5168\u7b49\u4ef7\u7684\u5168\u8fde\u63a5rnn":111,"\u5b9e\u73b0\u5177\u4f53\u7684\u51fd\u6570\u529f\u80fd\u5373\u53ef":42,"\u5b9e\u73b0\u524d\u5411\u4f20\u64ad\u7684\u90e8\u5206\u6709\u4e0b\u9762\u51e0\u4e2a\u6b65\u9aa4":74,"\u5b9e\u73b0\u5355\u6b65\u51fd\u6570":114,"\u5b9e\u73b0\u540e\u5411\u4f20\u64ad\u7684\u90e8\u5206\u6709\u4e0b\u9762\u51e0\u4e2a\u6b65\u9aa4":74,"\u5b9e\u73b0\u5728":75,"\u5b9e\u73b0\u5bf9":76,"\u5b9e\u73b0\u65b0\u7684op\u90fd\u6dfb\u52a0\u81f3\u76ee\u5f55":75,"\u5b9e\u73b0\u6784\u9020\u51fd\u6570":74,"\u5b9e\u73b0\u7684":82,"\u5b9e\u73b0\u7b80\u5355":45,"\u5b9e\u73b0\u7ec6\u8282":74,"\u5b9e\u73b0\u7f51\u7edc\u5c42\u7684\u524d\u5411\u4f20\u64ad":74,"\u5b9e\u73b0\u7f51\u7edc\u5c42\u7684\u540e\u5411\u4f20\u64ad":74,"\u5b9e\u73b0\u8bcd\u8bed\u548c\u53e5\u5b50\u4e24\u4e2a\u7ea7\u522b\u7684\u53cc\u5c42rnn\u7ed3\u6784":113,"\u5b9e\u73b0\u8be5\u5c42\u7684c":74,"\u5b9e\u9645\u4e0a\u4f7f\u7528\u4e86":82,"\u5b9e\u9645\u4e0a\u9700\u8981\u7684\u8f93\u51fa\u7ed3\u679c\u662f\u4e24\u4e2a\u77e9\u9635":81,"\u5bb9\u5668\u8fd0\u884c\u90fd\u8fd0\u884c":97,"\u5bb9\u5668\u9ed8\u8ba4\u6267\u884c":116,"\u5bbd\u5ea6":89,"\u5bbd\u5ea6\u4e3a":89,"\u5bbd\u5ea6\u7b49\u4e8e\u914d\u7f6e\u4e2dlayer\u7684s":81,"\u5bbf\u4e3b\u673a\u7684c":[116,117,118],"\u5bc4\u5b58\u5668\u4f7f\u7528\u60c5\u51b5\u548c\u5171\u4eab\u5185\u5b58\u4f7f\u7528\u60c5\u51b5\u80fd\u8ba9\u6211\u4eec\u5bf9gpu\u7684\u6574\u4f53\u4f7f\u7528\u6709\u66f4\u597d\u7684\u7406\u89e3":108,"\u5bf9":90,"\u5bf9\u4e00\u4e2a5\u7ef4\u975e\u5e8f\u5217\u7684\u7a00\u758f01\u5411\u91cf":84,"\u5bf9\u4e00\u4e2a5\u7ef4\u975e\u5e8f\u5217\u7684\u7a00\u758f\u6d6e\u70b9\u5411\u91cf":84,"\u5bf9\u4e8e":114,"\u5bf9\u4e8e\u4e0d\u540c\u7684\u8bad\u7ec3\u4efb\u52a1":91,"\u5bf9\u4e8e\u4e0d\u540c\u8bed\u8a00":45,"\u5bf9\u4e8e\u4e24\u79cd\u4e0d\u540c\u7684\u8f93\u5165\u6570\u636e\u7c7b\u578b":111,"\u5bf9\u4e8e\u4e60\u60ef\u4f7f\u7528windows\u548cmacos\u7684\u5f00\u53d1\u8005\u6765\u8bf4":0,"\u5bf9\u4e8e\u5355\u5c42rnn":111,"\u5bf9\u4e8e\u5355\u5c42rnn\u7684\u6570\u636e\u4e00\u5171\u6709\u4e24\u4e2a\u6837\u672c":111,"\u5bf9\u4e8e\u53cc\u5c42rnn":111,"\u5bf9\u4e8e\u540c\u4e00\u6bb5c":45,"\u5bf9\u4e8e\u540c\u6837\u7684\u6570\u636e":111,"\u5bf9\u4e8e\u540c\u6837\u8bbe\u7f6e\u7684\u7f51\u7edc\u6a21\u578b":41,"\u5bf9\u4e8e\u56fd\u5185\u7528\u6237":[1,116],"\u5bf9\u4e8e\u591a\u8bed\u8a00\u63a5\u53e3":45,"\u5bf9\u4e8e\u5927\u591a\u6570\u8bed\u8a00":45,"\u5bf9\u4e8e\u5e8f\u5217\u957f\u5ea6":41,"\u5bf9\u4e8e\u6027\u80fd\u7684\u5173\u952e\u8def\u5f84\u90fd\u505a\u51fa\u4e86\u7ea2\u8272\u6807\u8bb0":107,"\u5bf9\u4e8e\u6211\u4eec\u652f\u6301\u7684\u5168\u90e8\u77e9\u9635\u64cd\u4f5c":74,"\u5bf9\u4e8e\u6709\u53c2\u6570\u7684\u5c42":42,"\u5bf9\u4e8e\u6709\u5b9a\u5236\u5316\u4e8c\u8fdb\u5236\u6587\u4ef6\u9700\u6c42\u7684\u7528\u6237":2,"\u5bf9\u4e8e\u672c\u6837\u4f8b\u4ee3\u7801":91,"\u5bf9\u4e8e\u6bb5\u843d\u7684\u6587\u672c\u5206\u7c7b":111,"\u5bf9\u4e8e\u6bcf\u4e00\u4e2a\u5355\u5c42rnn\u7684\u6570\u636e":111,"\u5bf9\u4e8e\u6bcf\u4e00\u4e2a\u65b0\u52a0\u7684rnn":41,"\u5bf9\u4e8e\u6bcf\u79cd\u7c7b\u578b":46,"\u5bf9\u4e8e\u6bcf\u79cdc":46,"\u5bf9\u4e8e\u8fd9\u6837\u7684\u9700\u6c42":90,"\u5bf9\u4e8e\u914d\u5907\u6709\u6ce8\u610f\u529b\u673a\u5236\u7684\u89e3\u7801\u5668":114,"\u5bf9\u4e8enchw":82,"\u5bf9\u4ee3\u7801\u8fdb\u884c\u6027\u80fd\u5206\u6790":108,"\u5bf9\u4f7f\u7528\u7684\u4e2d\u95f4\u53d8\u91cf\u548c\u8d44\u6e90\u8fdb\u884c\u6e05\u7406\u548c\u91ca\u653e":90,"\u5bf9\u5168\u8fde\u63a5\u5c42\u6765\u8bf4":74,"\u5bf9\u52a0\u8f7d\u9884\u8bad\u7ec3\u53c2\u6570\u7684\u5c42":83,"\u5bf9\u53cc\u5c42\u5e8f\u5217\u6765\u8bb2":89,"\u5bf9\u5df2\u7ecfpush\u5230\u8fdc\u7a0b\u4ed3\u5e93\u7684\u591a\u4e2acommit":72,"\u5bf9\u5e94":117,"\u5bf9\u5e94\u4e00\u4e2a\u5b50\u53e5":113,"\u5bf9\u5e94\u4e00\u4e2a\u8bcd":113,"\u5bf9\u5e94\u4e8e\u8c03\u7528c":89,"\u5bf9\u5e94\u7684\u68af\u5ea6op\u8ba1\u7b97\u4e4b\u4e2d":75,"\u5bf9\u5e94\u7740\u4e0a\u6587\u63d0\u5230\u7684\u4e00\u7ef4\u6574\u578b\u6570\u7ec4":89,"\u5bf9\u5e94\u7740\u4e0a\u6587\u63d0\u5230\u7684\u4e8c\u7ef4\u6d6e\u70b9\u578b\u77e9\u9635":89,"\u5bf9\u63a8\u8350\u914d\u7f6e\u4e2d\u7684\u9009\u9879\u5efa\u8bae\u6309\u7167\u8bbe\u7f6e":87,"\u5bf9\u65b0\u7684\u6743\u91cd\u8fdb\u884c\u8f6c\u6362\u7528\u4e8e\u4e0b\u6b21\u8fed\u4ee3":41,"\u5bf9\u6bcf\u4e2a\u8f93\u5165":74,"\u5bf9\u6bcf\u4e2a\u8f93\u5165\u4e58\u4e0a\u53d8\u6362\u77e9\u9635":74,"\u5bf9\u6bd4":45,"\u5bf9\u6bd4\u4f18\u5316\u540elayer\u4e0e\u76f8\u5bf9\u5e94\u7684paddlepaddle\u539f\u6709lay":41,"\u5bf9\u6bd4\u4f18\u5316\u540elayer\u81ea\u8eab":41,"\u5bf9\u6bd4\u53cd\u5411op\u4e0d\u540c\u8bbe\u5907":75,"\u5bf9\u6fc0\u6d3b\u6c42\u5bfc":74,"\u5bf9\u795e\u7ecf\u7f51\u7edc\u6765\u8bf4":89,"\u5bf9\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u8fdb\u884c\u5e8f\u5217\u5316":90,"\u5bf9\u8bc4\u5ba1\u610f\u89c1\u4e0d\u540c\u610f\u7684":72,"\u5bf9\u8bc4\u5ba1\u610f\u89c1\u540c\u610f\u4e14\u6309\u5176\u4fee\u6539\u5b8c\u7684":72,"\u5bf9\u8c61":83,"\u5bf9\u8c61\u5206\u914d\u7a7a\u95f4":90,"\u5bf9\u8f93\u5165\u53c2\u6570\u7684\u5b89\u5168\u6027\u8fdb\u884c\u4e86\u5fc5\u8981\u7684\u5224\u65ad":46,"\u5bf9\u8f93\u5165\u6570\u636e\u7684\u683c\u5f0f\u505a\u6e05\u6670\u7b80\u6d01\u7684\u5c01\u88c5":88,"\u5bf9\u8f93\u51fa\u7684\u5408\u5e76":113,"\u5bf9\u8fd9\u4e2a\u7248\u672c\u7684\u63d0\u4ea4":63,"\u5bf9sparse_binary_vector\u548csparse_float_vector":84,"\u5bfb\u627e\u6709\u6ca1\u6709\u5176\u4ed6\u53ef\u4ee5\u4f18\u5316\u7684\u53ef\u80fd":42,"\u5bfb\u627epython\u4e0ec":107,"\u5bfc\u51fa\u8fd9\u4e9b\u63a5\u53e3":46,"\u5bfc\u81f4\u4e86\u6d6e\u70b9\u6570\u6ea2\u51fa":81,"\u5bfc\u81f4\u53c2\u6570\u6536\u655b\u5230\u4e86\u4e00\u4e9b\u5947\u5f02\u7684\u60c5\u51b5":81,"\u5bfc\u81f4\u53c2\u6570\u7d2f\u52a0":81,"\u5bfc\u81f4\u7f16\u8bd1paddlepaddle\u5931\u8d25":78,"\u5bfc\u81f4\u8bad\u7ec3\u65f6\u95f4\u8fc7\u957f":83,"\u5bfc\u81f4mklml\u5e93\u4e0b\u8f7d\u4e0d\u6210\u529f":78,"\u5c01\u88c5\u4e86":108,"\u5c01\u88c5\u8be5\u5c42\u7684python\u63a5\u53e3":74,"\u5c06":[63,83,108],"\u5c06\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u53c2\u6570\u62c6\u5206\u6210\u591a\u4efd":10,"\u5c06\u4e0a\u4e00\u65f6\u95f4\u6b65\u6240\u751f\u6210\u7684\u8bcd\u7684\u5411\u91cf\u6765\u4f5c\u4e3a\u5f53\u524d\u65f6\u95f4\u6b65\u7684\u8f93\u5165":114,"\u5c06\u4f1a\u4f18\u5148\u4f7f\u7528":91,"\u5c06\u4f1a\u59cb\u7ec8\u4f7f\u7528":116,"\u5c06\u4f1a\u5c06\u7528\u6237\u4f20\u8fdb\u6765\u7684\u914d\u7f6e\u53c2\u6570\u4f20\u9012cmake\u7cfb\u7edf":116,"\u5c06\u4f1a\u81ea\u52a8\u8ba1\u7b97\u51fa\u4e00\u4e2a\u5408\u9002\u7684\u503c":103,"\u5c06\u4f1a\u88ab\u629b\u5f03":91,"\u5c06\u5176\u8bbe\u7f6e\u6210":81,"\u5c06\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u5148\u53d8\u6362\u6210\u5355\u5c42\u65f6\u95f4\u5e8f\u5217\u6570\u636e":111,"\u5c06\u542b\u6709\u5b50\u53e5":113,"\u5c06\u542b\u6709\u8bcd\u8bed\u7684\u53e5\u5b50\u5b9a\u4e49\u4e3a\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":113,"\u5c06\u56fe\u7247\u5206\u7c7b\u5230":90,"\u5c06\u591a\u53e5\u8bdd\u770b\u6210\u4e00\u4e2a\u6574\u4f53\u540c\u65f6\u4f7f\u7528encoder\u538b\u7f29":111,"\u5c06\u591a\u53f0\u673a\u5668\u7684\u6d4b\u8bd5\u7ed3\u679c\u5408\u5e76":103,"\u5c06\u5927\u91cf\u7684":45,"\u5c06\u5b57\u5178\u7684\u5730\u5740\u4f5c\u4e3aargs\u4f20\u7ed9dataprovid":83,"\u5c06\u5b83\u4eec\u653e\u5728\u540c\u4e00\u76ee\u5f55\u4e2d":90,"\u5c06\u5bf9\u5e94\u6570\u636e\u5c42\u7684\u7ef4\u6570\u8bbe\u7f6e\u6210\u4e00\u4e2a\u5927\u4e8e\u8f93\u5165\u6570\u636e\u7ef4\u6570\u7684\u503c\u7528\u4e8e\u5360\u4f4d\u5373\u53ef":82,"\u5c06\u5e8f\u5217\u5316\u7ed3\u679c\u5199\u5165\u4e00\u4e2a\u6587\u4ef6\u5185":90,"\u5c06\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u524d\u5411\u548c\u53cd\u5411\u90e8\u5206\u6df7\u5408\u5728\u4e00\u8d77":114,"\u5c06\u6027\u80fd\u5206\u6790\u7ed3\u679c\u4ee5\u7f51\u9875\u7684\u5f62\u5f0f\u5c55\u793a\u51fa\u6765":107,"\u5c06\u6027\u80fd\u5206\u6790\u7ed3\u679c\u6309\u7167tottime\u6392\u5e8f":107,"\u5c06\u6570\u636e\u5207\u5206\u6210\u591a\u4efd":91,"\u5c06\u65b0\u5206\u652f\u7684\u7248\u672c\u6253\u4e0atag":63,"\u5c06\u65b0\u5efa\u7684\u6743\u91cd\u52a0\u5165\u6743\u91cd\u8868":74,"\u5c06\u660e\u6587\u53c2\u6570\u8f6c\u5316\u4e3apaddlepaddle\u53ef\u52a0\u8f7d\u7684\u6a21\u578b\u53c2\u6570\u65f6":83,"\u5c06\u672c\u5730\u7684\u4fee\u6539\u63a8\u9001\u5230":72,"\u5c06\u6b64\u76ee\u5f55\u6302\u8f7d\u4e3a\u5bb9\u5668\u7684":97,"\u5c06\u73af\u5883\u53d8\u91cf\u8f6c\u6362\u6210paddle\u7684\u547d\u4ee4\u884c\u53c2\u6570":97,"\u5c06\u7528\u4e8epython":75,"\u5c06\u7ed3\u679c\u4fdd\u5b58\u5230\u6b64\u76ee\u5f55\u91cc":97,"\u5c06\u7f51\u7edc\u7ed3\u6784\u5b9a\u4e49\u548c\u8bad\u7ec3\u7ed3\u675f\u5b58\u50a8\u4e0b\u6765\u7684\u6a21\u578b\u53c2\u6570\u6587\u4ef6":90,"\u5c06\u8bad\u7ec3\u6587\u4ef6\u4e0e\u5207\u5206\u597d\u7684\u6570\u636e\u4e0a\u4f20\u5230\u5171\u4eab\u5b58\u50a8":97,"\u5c06\u8df3\u8fc7\u5206\u53d1\u9636\u6bb5\u76f4\u63a5\u542f\u52a8\u6240\u6709\u8282\u70b9\u7684\u96c6\u7fa4\u4f5c\u4e1a":93,"\u5c06\u8fd9\u79cd\u8de8\u8d8a\u65f6\u95f4\u6b65\u7684\u8fde\u63a5\u7528\u4e00\u4e2a\u7279\u6b8a\u7684\u795e\u7ecf\u7f51\u7edc\u5355\u5143\u5b9e\u73b0":111,"\u5c06\u8fdc\u7a0b\u4ed3\u5e93":72,"\u5c06\u900f\u660e":93,"\u5c06\u9700\u8981\u8f93\u51fa\u7684\u5c42\u4f5c\u4e3a":81,"\u5c06cuda\u5e93\u548clinux\u8bbe\u5907\u6302\u8f7d\u5230docker\u5bb9\u5668\u5185":1,"\u5c06ip\u6392\u5e8f\u751f\u6210\u7684\u5e8f\u53f7\u4f5c\u4e3atrain":97,"\u5c06master\u5206\u652f\u7684\u5408\u5165commit\u6253\u4e0atag":63,"\u5c06node\u8282\u70b9\u7684ip\u5730\u5740\u4fdd\u5b58\u5230machines\u6587\u4ef6\u4e2d":98,"\u5c06paddlepaddle\u4fdd\u5b58\u7684\u6a21\u578b\u53c2\u6570\u8fd8\u539f\u56de\u660e\u6587\u65f6":83,"\u5c06recurr":82,"\u5c0f\u4e8e\u67d0\u4e2a\u6bd4\u8f83\u5c0f\u7684\u9608\u503c\u8ba4\u4e3a\u901a\u8fc7":42,"\u5c31\u4f1a\u5728\u5b8c\u6210\u7f16\u8bd1\u4e4b\u540e":0,"\u5c31\u53ef\u4ee5\u4e86\u89e3\u5230\u95ee\u9898\u4ee3\u7801\u5728\u54ea\u91cc":107,"\u5c31\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u7684\u547d\u4ee4\u5f00\u59cb\u6267\u884c\u8bad\u7ec3":1,"\u5c31\u53ef\u4ee5\u6309":0,"\u5c31\u5c06\u8fd9\u4e9b\u5c42\u52a0\u5165\u4e00\u4e2apython":90,"\u5c31\u5f88\u5bb9\u6613\u5bfc\u81f4\u5185\u5b58\u8d85\u9650":81,"\u5c31\u662f\u7528\u4e8e\u5c55\u793a\u4e0a\u8ff0\u5206\u6790\u5de5\u5177\u7684\u7528\u6cd5":108,"\u5c31\u662fpaddlepaddle\u4e2d\u6240\u6307\u7684":89,"\u5c31\u8fd9\u4e48\u7b80\u5355":1,"\u5c31\u901a\u5e38\u7684gpu\u6027\u80fd\u5206\u6790\u6765\u8bf4":108,"\u5c31\u9700\u8981\u5bf9\u8fd9\u4e2a\u7b2c\u4e09\u65b9\u8bed\u8a00\u589e\u52a0\u4e00\u4e9b\u5b9a\u4e49":45,"\u5c31\u9700\u8981\u9009\u62e9\u4f7f\u7528no":1,"\u5c3d\u65e9\u62a5\u9519":75,"\u5c42\u524d\u5411\u8ba1\u7b97\u7684\u7ed3\u679c":90,"\u5c42\u548c\u8f93\u5165\u7684\u914d\u7f6e":74,"\u5c42\u6b21\u5316\u7684rnn":113,"\u5c42\u7684\u540d\u79f0\u4e0e":114,"\u5c42\u7684\u5927\u5c0f":74,"\u5c42\u7684\u7c7b\u578b":74,"\u5c42\u7684\u8f93\u51fa\u88ab\u7528\u4f5c\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684":114,"\u5c55\u793a\u4e86\u4e00\u4e2a\u542b\u67094\u4e2a\u5e8f\u5217\u7684":89,"\u5c55\u793a\u4e86\u90e8\u5206\u547d\u4ee4\u884c\u53c2\u6570\u7684\u4f7f\u7528":104,"\u5c55\u793a\u7684\u8c03\u7528\u56fe\u4e5f\u53ef\u4ee5\u5e2e\u52a9\u6211\u4eec\u53d1\u73b0\u6027\u80fd\u4e2d\u7684\u95ee\u9898":107,"\u5c5e\u4e8e\u8fd9\u4e00\u7c7b\u7684\u5b9e\u73b0":82,"\u5c5e\u6027":75,"\u5de5\u4f5c\u6a21\u5f0f":103,"\u5de5\u4f5c\u7a7a\u95f4\u4e2d\u7684":93,"\u5de5\u4f5c\u7a7a\u95f4\u5e94\u5982\u4e0b\u6240\u793a":91,"\u5de5\u5177\u4e0a\u4f20\u5373\u53ef":63,"\u5de5\u5177\u5408\u5e76fat\u5e93":117,"\u5de5\u5177\u670d\u52a1\u5668\u5c06\u8bfb\u53d6\u73af\u5883\u53d8\u91cf":77,"\u5de5\u5177\u6765\u7ba1\u7406":72,"\u5de5\u5177\u6765\u7f16\u8bd1\u6587\u6863":77,"\u5de5\u5177\u94fe":116,"\u5de5\u5177\u94fe\u7684android":116,"\u5de6\u53f3\u7684\u8ba1\u7b97\u65f6\u95f4":107,"\u5df2\u6253\u5f00":72,"\u5df2\u7ecf\u5728\u96c6\u7fa4\u63d0\u4ea4\u73af\u5883\u4e2d\u5b8c\u6210\u8bbe\u7f6e":103,"\u5e02\u9762\u4e0a\u5df2\u7ecf\u6709nvidia\u6216\u7b2c\u4e09\u65b9\u63d0\u4f9b\u7684\u4f17\u591a\u5de5\u5177":108,"\u5e0c\u671b\u80fd\u591f\u5c06\u5e8f\u5217\u5316\u540e\u7684\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u548c\u8bad\u7ec3\u597d\u7684\u6a21\u578b\u53c2\u6570\u6253\u5305\u8fdb\u4e00\u4e2a\u6587\u4ef6":90,"\u5e26\u6709\u4e0b\u9762\u4e24\u4e2a\u6a21\u677f\u53c2\u6570":75,"\u5e2e\u52a9\u6211\u4eec\u5b8c\u6210\u5bf9\u8f93\u5165\u5e8f\u5217\u7684\u62c6\u5206":113,"\u5e2e\u52a9\u6211\u4eec\u66f4\u597d\u5730\u63cf\u8ff0\u6bb5\u843d":113,"\u5e2e\u52a9\u6211\u4eec\u6784\u9020\u4e00\u4e9b\u590d\u6742\u7684\u8f93\u5165\u4fe1\u606f":110,"\u5e38\u5e38\u51fa\u73b0":78,"\u5e38\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1":89,"\u5e38\u7528\u7684cmake\u914d\u7f6e\u5982\u4e0b":[116,117],"\u5e38\u89c1\u7684\u5305\u62ec":107,"\u5e38\u89c1\u95ee\u9898\u89e3\u7b54":2,"\u5e73\u5747\u6545\u969c\u4fee\u590d\u65f6\u95f4":10,"\u5e73\u5747\u6545\u969c\u7387":10,"\u5e76\u4e0d\u4fdd\u8bc1":74,"\u5e76\u4e0d\u662f\u4f7f\u7528\u53cc\u5c42rnn\u89e3\u51b3\u5b9e\u9645\u7684\u95ee\u9898":111,"\u5e76\u4e0d\u662fkubernetes\u4e2d\u7684node\u6982\u5ff5":97,"\u5e76\u4e0d\u771f\u6b63":[89,90],"\u5e76\u4e0d\u771f\u6b63\u7684\u548c":111,"\u5e76\u4e0d\u96be":0,"\u5e76\u4e14":114,"\u5e76\u4e14\u4e5f\u53ef\u4ee5\u5728windows\u7684docker\u4e2d\u8fd0\u884c":1,"\u5e76\u4e14\u4e66\u5199\u4e00\u4efd\u4ee3\u7801":76,"\u5e76\u4e14\u4f1a\u5199\u597d":42,"\u5e76\u4e14\u4f1a\u6839\u636e":116,"\u5e76\u4e14\u4f7f\u7528":46,"\u5e76\u4e14\u5185\u5c42\u7684\u5e8f\u5217\u64cd\u4f5c\u4e4b\u95f4\u72ec\u7acb\u65e0\u4f9d\u8d56":111,"\u5e76\u4e14\u52a0\u4e0a\u4e0b\u9762\u7684\u547d\u4ee4\u884c\u53c2\u6570":105,"\u5e76\u4e14\u5305\u62ecunit":72,"\u5e76\u4e14\u53ea\u9700\u8981\u5728\u5fc5\u8981\u7684\u65f6\u5019\u8f6c\u6362\u8fd9\u79cd\u683c\u5f0f":42,"\u5e76\u4e14\u53ef\u80fd\u4f1a\u52a0\u901f\u8bad\u7ec3\u8fc7\u7a0b":81,"\u5e76\u4e14\u542f\u52a8\u8bad\u7ec3":97,"\u5e76\u4e14\u5728\u5e38\u89c1\u7684\u5e73\u53f0\u4e0a":45,"\u5e76\u4e14\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528paddlepaddle\u6765\u89e3\u51b3\u4e00\u4e2a\u7ecf\u5178\u7684\u7ebf\u6027\u56de\u5f52\u95ee\u9898":84,"\u5e76\u4e14\u5f3a\u5236\u8bbe\u7f6e\u4e00\u4e9bpaddlepaddle\u53c2\u6570\u7684\u503c":117,"\u5e76\u4e14\u5f53\u7f16\u8bd1\u65f6":41,"\u5e76\u4e14\u628a\u7cfb\u7edf\u751f\u6210\u7684ca":27,"\u5e76\u4e14\u628a\u7ed3\u679c\u8fd4\u56depfsclient\u7aef":27,"\u5e76\u4e14\u67e5\u8be2paddlepaddle\u5355\u5143\u6d4b\u8bd5\u7684\u65e5\u5fd7":78,"\u5e76\u4e14\u7f16\u8bd1\u65f6\u9700\u8981\u6253\u5f00":75,"\u5e76\u4e14\u7f16\u8bd1\u80fd\u901a\u8fc7\u4ee3\u7801\u6837\u5f0f\u68c0\u67e5":72,"\u5e76\u4e14\u8ba9\u63a5\u53e3\u8131\u79bb\u5b9e\u73b0\u7ec6\u8282":45,"\u5e76\u4e14\u8bbe\u7f6e\u9ed8\u8ba4\u503c\u4e3a1":75,"\u5e76\u4e14\u8f93\u5165\u8f93\u51fa\u90fd\u662f\u5171\u7528\u4e00\u5757\u5185\u5b58":42,"\u5e76\u4e14\u8f93\u51fa\u4e00\u4e2a":72,"\u5e76\u4e14\u9700\u8981\u91cd\u5199\u57fa\u7c7b\u4e2d\u7684\u4ee5\u4e0b\u51e0\u4e2a\u865a\u51fd\u6570":74,"\u5e76\u4e14cpu":75,"\u5e76\u4e14softmax\u5c42\u7684\u4e24\u4e2a\u8f93\u5165\u4e5f\u4f7f\u7528\u4e86\u540c\u6837\u7684\u53c2\u6570":83,"\u5e76\u4f7f\u7528":93,"\u5e76\u4fdd\u5b58\u8f93\u51fa\u5230\u4e00\u4e2a\u65e5\u5fd7\u6587\u4ef6":91,"\u5e76\u5177\u5907\u4ee5\u4e0b\u7279\u70b9":88,"\u5e76\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u6587\u4ef6":72,"\u5e76\u521b\u5efaoptim":84,"\u5e76\u521d\u59cb\u5316":75,"\u5e76\u5220\u9664":63,"\u5e76\u5220\u9664\u66f4\u65e9\u7684\u5feb\u7167":10,"\u5e76\u52a0\u8f7d\u5176\u4e2d\u7684\u53c2\u6570":10,"\u5e76\u53d1\u5e03\u5230pypi":63,"\u5e76\u53ef\u4ee5\u5728\u5927\u591a\u6570\u4e3b\u6d41\u7684linux\u64cd\u4f5c\u7cfb\u7edf\u4ee5\u53camacos\u4e0a\u6267\u884c":3,"\u5e76\u548c\u53c2\u6570\u670d\u52a1\u5668\u901a\u4fe1":92,"\u5e76\u5728\u4e58\u79ef\u7ed3\u679c\u4e0a\u518d\u52a0\u4e0a\u7ef4\u5ea6\u4e3a":74,"\u5e76\u5728\u6700\u5f00\u59cb\u521d\u59cb\u5316\u4e3a\u8d77\u59cb\u8bcd":114,"\u5e76\u5728\u6bcf\u6b21\u6743\u91cd\u66f4\u65b0\u540e":41,"\u5e76\u5728\u7c7b\u6784\u5efa\u51fd\u6570\u4e2d\u628a\u5b83\u653e\u5165\u4e00\u4e2a\u7c7b\u6210\u5458\u53d8\u91cf\u91cc":74,"\u5e76\u5728\u8be5layer\u91cc\u91c7\u7528\u7b2c\u4e00\u79cd\u65b9\u5f0f\u8bbe\u7f6e":82,"\u5e76\u5728\u96c6\u7fa4\u4e2d\u8fd0\u884c\u591a\u4e2a\u5206\u5e03\u5f0f\u6570\u636e\u5904\u7406\u4efb\u52a1":11,"\u5e76\u5728python\u811a\u672c\u4e2d\u5b8c\u6210\u4e0eoperator\u540c\u6837\u7684\u8ba1\u7b97\u903b\u8f91":75,"\u5e76\u5904\u7406\u4e0e\u4e4b\u76f8\u5173\u7684\u6240\u6709\u7ec6\u8282":90,"\u5e76\u5b89\u88c5\u4e86python":78,"\u5e76\u5b89\u88c5\u6700\u65b0":3,"\u5e76\u5b89\u88c5\u6709python2":86,"\u5e76\u5b8c\u6210\u53c2\u6570\u4f18\u5316\u66f4\u65b0":92,"\u5e76\u5bf9\u6bd4\u662f\u5426\u548c\u6b63\u5728\u5b89\u88c5\u7684\u540e\u7f00\u4e00\u81f4":78,"\u5e76\u5bf9\u76f8\u5e94\u7684\u53c2\u6570\u8c03\u7528":74,"\u5e76\u5c06":63,"\u5e76\u5c06\u5176\u6295\u5c04\u5230":114,"\u5e76\u5c06\u8be5layer\u4e0a\u4e00\u65f6\u95f4\u6b65\u7684\u8f93\u51fa\u4f5c\u4e3a\u81ea\u8eab\u5f53\u524d\u65f6\u95f4\u6b65\u7684\u8f93\u51fa":82,"\u5e76\u5c06c":46,"\u5e76\u624b\u52a8\u751f\u6210download\u6210\u529f\u6807\u7b7e":78,"\u5e76\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4":1,"\u5e76\u628a\u5feb\u7167\u4fdd\u5b58\u5230\u8fd9\u4e2a\u76ee\u5f55\u4e0b":10,"\u5e76\u628a\u7ed3\u679c\u653e\u5230\u5f53\u524d\u5c42\u7684":42,"\u5e76\u628a\u8fd9\u4e2a\u5305\u542b\u4e86\u8bad\u7ec3\u6570\u636e\u7684container\u4fdd\u5b58\u4e3a\u4e00\u4e2a\u65b0\u7684\u955c\u50cf":96,"\u5e76\u66f4\u6362job":79,"\u5e76\u6839\u636e\u5206\u5e03\u5f0f\u8bad\u7ec3\u5e76\u53d1\u6570":91,"\u5e76\u68c0\u67e5\u548c\u9700\u5b89\u88c5\u7684\u5305\u662f\u5426\u5339\u914d":3,"\u5e76\u6ca1\u6709paddle\u7279\u522b\u9700\u8981\u7684\u7279\u6027":45,"\u5e76\u6dfb\u52a0\u5934\u6587\u4ef6":41,"\u5e76\u6dfb\u52a0\u6ce8\u91ca":75,"\u5e76\u7279\u5316\u6a21\u677f\u53c2\u6570\u4e3a":75,"\u5e76\u7c98\u8d34\u6b64python\u4ee3\u7801":86,"\u5e76\u81ea\u52a8\u4e0b\u8f7d\u5b89\u88c5\u4f9d\u8d56\u8f6f\u4ef6":3,"\u5e76\u81ea\u52a8\u7f16\u8bd1\u5bbf\u4e3b\u673a\u7248protoc\u53ef\u6267\u884c\u6587\u4ef6":118,"\u5e76\u81ea\u52a8\u7f16\u8bd1paddlepaddle\u6240\u9700\u7684\u6240\u6709\u7b2c\u4e09\u65b9\u5e93":116,"\u5e76\u884c\u5730\u6267\u884c\u6a21\u578b\u7684\u8bad\u7ec3":92,"\u5e76\u884c\u5730\u63a5\u6536\u68af\u5ea6\u548c\u66f4\u65b0\u53c2\u6570":92,"\u5e76\u88ab\u5b58\u50a8\u5728\u8bf8\u5982hadoop":11,"\u5e76\u89c2\u5bdf\u7ed3\u679c":108,"\u5e76\u89e3\u91ca\u4e86\u5404\u81ea\u542b\u4e49":75,"\u5e76\u8bb0\u5f55\u5b83\u7684\u7f16\u53f7":72,"\u5e76\u8fdb\u884c\u521d\u59cb\u5316\u64cd\u4f5c":84,"\u5e76\u9002\u5e94github\u7684\u7279\u6027\u505a\u4e86\u4e00\u4e9b\u533a\u522b":63,"\u5e76\u91cd\u65b0\u6253\u5305wheel\u5305":63,"\u5e76\u94fe\u63a5\u5230\u751f\u6210\u7684lib\u5e93\u4e2d":75,"\u5e78\u800cpython\u7684\u4e00\u4e2a\u7b2c\u4e09\u65b9\u5e93":107,"\u5e8f\u5217\u4e2d\u542b\u6709\u5143\u7d20\u7684\u6570\u76ee\u540c":110,"\u5e8f\u5217\u4e2d\u7684\u4e00\u4e2a\u5143\u7d20":89,"\u5e8f\u5217\u4e2d\u7684\u5143\u7d20\u662f\u8bcd\u8bed":89,"\u5e8f\u5217\u4e2d\u7684\u6bcf\u4e00\u4e2a\u5143\u7d20\u53c8\u662f\u4e00\u4e2a\u5e8f\u5217":89,"\u5e8f\u5217\u4e2d\u7684\u6bcf\u4e00\u4e2a\u5143\u7d20\u662f\u975e\u5e8f\u5217":89,"\u5e8f\u5217\u4fe1\u606f":89,"\u5e8f\u5217\u5316\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u914d\u7f6e":90,"\u5e8f\u5217\u5316\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u65f6":90,"\u5e8f\u5217\u5316\u7ed3\u679c\u4f1a\u5199\u5165\u5f53\u524d\u8fd0\u884c\u76ee\u5f55\u4e0b\u7684":90,"\u5e8f\u5217\u6570\u636e\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u9762\u5bf9\u7684\u4e00\u79cd\u4e3b\u8981\u8f93\u5165\u6570\u636e\u7c7b\u578b":113,"\u5e8f\u5217\u662f\u4e00\u79cd\u5e38\u89c1\u7684\u6570\u636e\u7c7b\u578b":110,"\u5e8f\u5217\u751f\u6210\u4efb\u52a1\u5927\u591a\u9075\u5faaencod":113,"\u5e8f\u5217\u751f\u6210\u4efb\u52a1\u7684\u8f93\u5165":113,"\u5e8f\u5217\u7684\u6bcf\u4e2a\u5143\u7d20\u662f\u539f\u6765\u53cc\u5c42\u5e8f\u5217\u6bcf\u4e2asubseq\u5143\u7d20\u7684\u5e73\u5747\u503c":110,"\u5e8f\u5217\u8f93\u5165":89,"\u5e8f\u5217\u8f93\u5165\u65f6\u7b49\u4e8e":81,"\u5e8f\u5217\u8f93\u5165\u793a\u610f\u56fe":89,"\u5e93\u6709\u81ea\u5df1\u72ec\u7acb\u7684\u52a8\u6001\u5e93\u6587\u4ef6":87,"\u5e94\u7528\u524d\u5411\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":114,"\u5e94\u7528\u53cd\u5411\u9012\u5f52\u795e\u7ecf\u7f51\u7edc":114,"\u5e94\u80fd\u53cd\u6620\u5f53\u524dcommit\u7684\u5185\u5bb9":72,"\u5e94\u8be5\u4e0e\u5b83\u7684memory\u540d\u5b57\u76f8\u540c":114,"\u5e94\u8be5\u8bf4\u8c22\u8c22":72,"\u5e94\u8be5\u964d\u4f4e\u5b66\u4e60\u7387":83,"\u5e95\u5c42\u8fdb\u7a0b":93,"\u5efa\u7acb\u4e00\u4e2a":72,"\u5efa\u8bae":[63,72],"\u5efa\u8bae\u5c06\u8be5\u53c2\u6570\u8bbe\u4e3atrue":103,"\u5f00\u53d1\u4e86\u6a21\u578b\u9884\u6d4b\u7684\u6837\u4f8b\u4ee3\u7801":46,"\u5f00\u53d1\u4eba\u5458\u4f7f\u7528":72,"\u5f00\u53d1\u5206\u652f":3,"\u5f00\u53d1\u6807\u51c6":115,"\u5f00\u53d1\u8005\u4f7f\u7528":0,"\u5f00\u53d1\u8005\u4fee\u6539\u81ea\u5df1\u7684\u4ee3\u7801":63,"\u5f00\u53d1\u8005fork\u7684\u7248\u672c\u5e93\u4e2d":63,"\u5f00\u53d1\u8005fork\u7684\u7248\u672c\u5e93\u4f7f\u7528":63,"\u5f00\u53d1\u955c\u50cf":72,"\u5f00\u53d1\u9884\u6d4b\u5e8f":90,"\u5f00\u53d1\u9884\u6d4b\u7a0b\u5e8f\u94fe\u63a5":87,"\u5f00\u542f":0,"\u5f00\u5934":[41,42],"\u5f00\u5934\u7684\u90e8\u5206":91,"\u5f00\u5934\u90e8\u5206\u6307\u5b9a":91,"\u5f00\u59cb\u63d0\u4f9b\u670d\u52a1":10,"\u5f00\u59cb\u6807\u8bb0":114,"\u5f00\u59cb\u795e\u7ecf\u7f51\u7edc\u7684":92,"\u5f00\u59cb\u9636\u6bb5":108,"\u5f02\u6b65\u968f\u673a\u68af\u5ea6\u4e0b\u964d":102,"\u5f02\u6b65sgd\u66f4\u65b0\u7684\u6b65\u957f\u63a7\u5236":91,"\u5f15\u5165\u4e86\u4ee5\u4e0b\u56db\u4e2aapi":41,"\u5f15\u5165\u4e86\u7c7b\u578b\u7684\u5934\u6587\u4ef6":46,"\u5f15\u5bfc\u5c42":114,"\u5f15\u7528memory\u5f97\u5230\u8fd9layer\u4e0a\u4e00\u65f6\u523b\u8f93\u51fa":113,"\u5f39\u51fa\u4e0b\u9762\u7684\u9009\u62e9\u6846":63,"\u5f52\u4e00\u5316\u6982\u7387\u5411\u91cf":114,"\u5f53":105,"\u5f53\u4e00\u4e2a":89,"\u5f53\u4e0a\u8ff0\u63a5\u53e3\u7b2c4\u4e2a\u53c2\u6570":89,"\u5f53\u4f60\u6267\u884c\u547d\u4ee4":74,"\u5f53\u4fdd\u5b58\u7684\u7f51\u7edc\u53c2\u6570\u4e3afloat\u7c7b\u578b\u65f6\u4e3a4":83,"\u5f53\u524d\u65f6\u95f4\u6b65\u5904\u7684memory\u7684\u8f93\u51fa\u4f5c\u4e3a\u4e0b\u4e00\u65f6\u95f4\u6b65memory\u7684\u8f93\u5165":114,"\u5f53\u524d\u7684\u5b66\u4e60\u7387\u4e3a\u6240\u8bbe\u7f6e":83,"\u5f53\u524d\u7684\u5b9e\u73b0\u65b9\u5f0f\u4e0b":74,"\u5f53\u524d\u7684\u8f93\u5165y\u548c\u4e0a\u4e00\u4e2a\u65f6\u95f4\u6b65\u7684\u8f93\u51farnn_state\u505a\u4e86\u4e00\u4e2a\u5168\u94fe\u63a5":111,"\u5f53\u524d\u8bad\u7ec3\u4efb\u52a1\u542f\u52a8\u7684pserver\u7684ip\u5217\u8868":91,"\u5f53\u524d\u8bad\u7ec3\u4efb\u52a1pserver\u603b\u6570":91,"\u5f53\u524d\u8bad\u7ec3\u4efb\u52a1trainer\u603b\u6570":91,"\u5f53\u524dtrainer\u7684\u7ebf\u7a0b\u6570\u76ee":91,"\u5f53\u529f\u80fd\u5206\u652f\u5f00\u53d1\u5b8c\u6bd5\u540e":63,"\u5f53\u53ea\u505a\u63a8\u65ad":41,"\u5f53\u5728\u7f51\u7edc\u5c42\u914d\u7f6e\u4e2d\u8bbe\u7f6e":103,"\u5f53\u5728\u8bad\u7ec3\u914d\u7f6e\u4e2d\u8bbe\u7f6e":103,"\u5f53\u5df2\u8bad\u7ec3\u6837\u672c\u6570\u5927\u4e8e1000\u5c0f\u4e8e\u7b49\u4e8e2000\u65f6":83,"\u5f53\u5df2\u8bad\u7ec3\u6837\u672c\u6570\u5927\u4e8e2000\u65f6":83,"\u5f53\u5df2\u8bad\u7ec3\u6837\u672c\u6570\u5c0f\u4e8e\u7b49\u4e8e1000\u65f6":83,"\u5f53\u5df2\u8bad\u7ec3pass\u6570\u5927\u4e8e1\u5c0f\u4e8e\u7b49\u4e8e2\u65f6":83,"\u5f53\u5df2\u8bad\u7ec3pass\u6570\u5927\u4e8e2\u65f6":83,"\u5f53\u5df2\u8bad\u7ec3pass\u6570\u5c0f\u4e8e\u7b49\u4e8e1\u65f6":83,"\u5f53\u5f00\u542f":42,"\u5f53\u6211\u4eec\u505a\u51fa\u6027\u80fd\u4fee\u6b63\u540e":107,"\u5f53\u6211\u4eec\u8bad\u7ec3\u5b8c\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u4e4b\u540e":88,"\u5f53\u6240\u6709pod\u90fd\u5904\u4e8erunning\u72b6\u6001":97,"\u5f53\u6253\u5f00":42,"\u5f53\u6570\u636e\u683c\u5f0f\u4e0epaddlepaddle\u9ed8\u8ba4\u7684":42,"\u5f53\u6a21\u578b\u53c2\u6570\u4e0d\u5b58\u5728\u65f6":103,"\u5f53\u6a21\u5f0f\u4e3a":103,"\u5f53\u7136":[1,108],"\u5f53\u7136\u53ef\u4ee5":0,"\u5f53\u7136\u8fd9\u4e24\u8005\u4e5f\u53ef\u4ee5\u76f8\u7b49":42,"\u5f53\u7528\u6237\u4f7f\u7528\u5b8c\u8fd9\u4e2a\u53c2\u6570\u540e":46,"\u5f53\u7528\u6237\u6ca1\u6709\u663e\u5f0f\u8bbe\u5b9a\u65f6":82,"\u5f53\u7f51\u7edc\u51fa\u73b0\u5206\u652f\u4e14\u5728":42,"\u5f53\u7f51\u7edc\u5c42\u7528\u4e00\u4e2a\u6279\u6b21\u505a\u8bad\u7ec3\u65f6":74,"\u5f53\u89e3\u8bfb\u6bcf\u4e00\u4e2a":114,"\u5f53\u8d85\u8fc7\u8be5\u9608\u503c\u65f6":103,"\u5f53\u8f93\u5165\u662f\u7ef4\u5ea6\u5f88\u9ad8\u7684\u7a00\u758f\u6570\u636e\u65f6":105,"\u5f53\u9700\u8981\u5b8c\u6210\u8ba1\u7b97\u65f6":76,"\u5f53\u975e\u5e8f\u5217\u8f93\u5165\u65f6":89,"\u5f53destination\u6587\u4ef6\u4e0d\u5b58\u5728\u6216\u8005\u5927\u5c0f\u548csource\u6587\u4ef6\u4e0d\u4e00\u81f4\u65f6":27,"\u5f53n1":81,"\u5f62\u6210recurr":113,"\u5f62\u6210recurrent\u8fde\u63a5":113,"\u5f88\u591a":0,"\u5f88\u6709\u53ef\u80fd\u5b9e\u9645\u5e94\u7528\u5c31\u662f\u6ca1\u6709\u6309\u7167\u60a8\u7684\u9884\u671f\u60c5\u51b5\u8fd0\u884c":108,"\u5f88\u6709\u53ef\u80fd\u662f\u975e\u72ec\u5360\u65b9\u5f0f\u6267\u884c\u5bfc\u81f4\u7684\u7aef\u53e3\u51b2\u7a81":79,"\u5f88\u96be\u4fdd\u8bc1\u591a\u8bed\u8a00\u4ee3\u7801\u98ce\u683c\u7684\u4e00\u81f4\u6027":45,"\u5f97\u4f7f\u7528":45,"\u5f97\u5230\u8f93\u51fa\u503c":75,"\u5f97\u5230\u9884\u6d4b\u7ed3\u679c\u7684\u8fc7\u7a0b":88,"\u5faa\u73af\u5c55\u5f00\u7684\u6bcf\u4e2a\u65f6\u95f4\u6b65\u603b\u662f\u80fd\u591f\u5f15\u7528\u6240\u6709\u8f93\u5165":113,"\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u4e2d":114,"\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u4f5c\u4e3a\u4f7f\u7528":114,"\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u548c":114,"\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u9aa4\u987a\u5e8f\u5730\u5904\u7406\u5e8f\u5217":114,"\u5faa\u73af\u7f51\u7edc\u4ece":114,"\u5fc5\u8981":46,"\u5fc5\u9009":91,"\u5fc5\u987b":74,"\u5fc5\u987b\u5148\u6267\u884c\u7b2c2\u6b65":0,"\u5fc5\u987b\u5206\u522b\u4e0e":42,"\u5fc5\u987b\u5c06\u524d\u4e00\u4e2a\u5b50\u53e5\u7684\u6700\u540e\u4e00\u4e2a\u5143\u7d20":111,"\u5fc5\u987b\u6307\u5411\u4e00\u4e2apaddlepaddle\u5b9a\u4e49\u7684lay":113,"\u5fc5\u987b\u6307\u5b9a\u4e3a":90,"\u5fc5\u987b\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":113,"\u5fc5\u987b\u662f\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":110,"\u5fc5\u987b\u7531\u53ea\u8bfbmemory\u7684":114,"\u5fc5\u987b\u8bbe\u7f6e\u4e3a":[116,117],"\u5fc5\u987b\u8bbe\u7f6e\u4e3aon":117,"\u5fc5\u987b\u914d\u7f6e\u4e3a":[87,118],"\u5fc5\u987b\u914d\u7f6e\u9009\u9879":87,"\u5feb\u901f\u5f00\u59cb":85,"\u6027\u80fd\u4f18\u5316\u7684\u8fc7\u7a0b\u901a\u5e38\u662f\u4e0d\u65ad\u91cd\u590d\u5730":107,"\u6027\u80fd\u5206\u6790":108,"\u6027\u80fd\u5206\u6790\u5de5\u5177\u662f\u7528\u4e8e\u7ed9\u5e94\u7528\u7a0b\u5e8f\u7684\u6027\u80fd\u505a\u5b9a\u91cf\u5206\u6790\u7684":108,"\u6027\u80fd\u5206\u6790\u662f\u6027\u80fd\u4f18\u5316\u7684\u5173\u952e\u4e00\u6b65":108,"\u6027\u80fd\u548c\u628a\u7f16\u8bd1\u5de5\u5177\u5b89\u88c5\u5728\u672c\u673a\u8fd0\u884c\u4e00\u6837":0,"\u6027\u80fd\u8c03\u4f18":102,"\u6027\u80fdtip":[116,117],"\u60a8\u4e5f\u53ef\u4ee5\u8fdb\u5165\u5230docker\u5bb9\u5668\u4e2d":1,"\u60a8\u4f1a\u5728\u63a5\u4e0b\u6765\u7684\u90e8\u5206\u4e2d\u83b7\u5f97\u66f4\u591a\u7684\u7ec6\u8282\u4ecb\u7ecd":108,"\u60a8\u53ef\u4ee5\u4ece\u4e0b\u9762\u7684\u8868\u683c\u4e2d\u627e\u5230\u9700\u8981\u7684\u7248\u672c":3,"\u60a8\u53ef\u4ee5\u4efb\u610f\u4f7f\u7528\u4e00\u4e2a\u6216\u4e24\u4e2a\u6765\u5bf9\u611f\u5174\u8da3\u7684\u4ee3\u7801\u6bb5\u505a\u6027\u80fd\u5206\u6790":108,"\u60a8\u53ef\u4ee5\u5728":[1,94],"\u60a8\u53ef\u4ee5\u5728\u5bb9\u5668\u4e2d\u6267\u884c":1,"\u60a8\u53ef\u4ee5\u5bfc\u5165":108,"\u60a8\u53ef\u4ee5\u6309\u7167\u4e0b\u9762\u7684\u6b65\u9aa4\u5728openmpi\u96c6\u7fa4\u4e2d\u63d0\u4ea4paddle\u8bad\u7ec3\u4efb\u52a1":98,"\u60a8\u53ef\u4ee5\u91c7\u7528\u4e0b\u9762\u4e94\u4e2a\u6b65\u9aa4":108,"\u60a8\u53ef\u80fd\u9700\u8981\u4fee\u6539":91,"\u60a8\u5c06\u4e86\u89e3\u5982\u4f55":114,"\u60a8\u5c31\u80fd\u83b7\u5f97\u5982\u4e0b\u7684\u5206\u6790\u7ed3\u679c":108,"\u60a8\u6309\u5982\u4e0b\u6b65\u9aa4\u64cd\u4f5c\u5373\u53ef":108,"\u60a8\u6700\u597d\u5148\u786e\u8ba4":108,"\u60a8\u9996\u5148\u9700\u8981\u5728\u76f8\u5173\u4ee3\u7801\u6bb5\u4e2d\u52a0\u5165":108,"\u60c5\u611f\u5206\u6790":63,"\u60f3\u4e86\u89e3\u66f4\u591apaddlepaddl":77,"\u610f\u5473\u7740\u4e0d\u540c\u65f6\u95f4\u6b65\u7684\u8f93\u5165\u90fd\u662f\u76f8\u540c\u7684\u503c":114,"\u610f\u601d\u662f\u4e0d\u4f7f\u7528\u5e73\u5747\u53c2\u6570\u6267\u884c\u6d4b\u8bd5":103,"\u610f\u601d\u662f\u4e0d\u4fdd\u5b58\u7ed3\u679c":103,"\u610f\u601d\u662f\u4f7f\u7528\u7b2ctest":103,"\u610f\u601d\u662f\u5728gpu\u6a21\u5f0f\u4e0b\u4f7f\u75284\u4e2agpu":103,"\u6210\u529f\u7f16\u8bd1\u540e":87,"\u6210\u529f\u8bad\u7ec3\u4e14\u9000\u51fa\u7684pod\u6570\u76ee\u4e3a3\u65f6":97,"\u6210\u5458":75,"\u6211\u4eec\u4e0d\u80fd\u901a\u8fc7\u5e38\u89c4\u7684\u68af\u5ea6\u68c0\u67e5\u7684\u65b9\u5f0f\u6765\u8ba1\u7b97\u68af\u5ea6":74,"\u6211\u4eec\u4e3b\u8981\u4f1a\u4ecb\u7ecdnvprof\u548cnvvp":108,"\u6211\u4eec\u4e5f\u53ef\u4ee5\u786e\u5b9a\u6bcf\u4e00\u4e2a\u53c2\u6570\u7684\u7c7b\u578b":46,"\u6211\u4eec\u4e5f\u5c06mklml\u5373":42,"\u6211\u4eec\u4e5f\u652f\u6301\u5728aws\u4e0a\u90e8\u7f72paddlepaddl":94,"\u6211\u4eec\u4ec5\u4ec5\u5bf9\u795e\u7ecf\u7f51\u7edc\u7684\u8f93\u5165\u8fdb\u884c\u4e86\u63cf\u8ff0":84,"\u6211\u4eec\u4ec5\u6709\u4e00\u4e2a\u8f93\u5165":74,"\u6211\u4eec\u4ecb\u7ecd\u5982\u4f55\u5728":96,"\u6211\u4eec\u4ecb\u7ecd\u5982\u4f55\u5728kubernetes\u96c6\u7fa4\u4e0a\u8fdb\u884c\u5206\u5e03\u5f0fpaddlepaddle\u8bad\u7ec3\u4f5c\u4e1a":97,"\u6211\u4eec\u4ee5\u624b\u5199\u6570\u5b57\u8bc6\u522b\u4efb\u52a1\u4e3a\u4f8b\u8fdb\u884c\u4ecb\u7ecd":90,"\u6211\u4eec\u4f1a\u4fdd\u8bc1":42,"\u6211\u4eec\u4f1a\u53ca\u65f6\u8fdb\u884c\u56de\u590d":80,"\u6211\u4eec\u4f1a\u5728\u7f51\u7edc\u8bad\u7ec3\u4e4b\u524d\u628a\u683c\u5f0f\u8f6c\u6362\u4e3amkl":42,"\u6211\u4eec\u4f1a\u5bf9\u6bcf\u4e2a\u8bad\u7ec3\u4efb\u52a1\u90fd\u4f1a\u5728\u6bcf\u4e2a\u8282\u70b9\u4e0a\u521b\u5efa\u4e00\u4e2a\u5de5\u4f5c\u7a7a\u95f4":91,"\u6211\u4eec\u4f1a\u5bf9\u6bd4\u5982\u4e0b2\u4e2a\u65b9\u9762":41,"\u6211\u4eec\u4f1a\u628amkl":42,"\u6211\u4eec\u4f1a\u6dfb\u52a0":[41,42],"\u6211\u4eec\u4f1a\u7ee7\u7eed\u4f7f\u7528\u73b0\u6709\u7684\u5185\u5b58\u5757":74,"\u6211\u4eec\u4f1a\u91cd\u65b0\u5206\u914d\u5185\u5b58":74,"\u6211\u4eec\u4f7f\u7528":74,"\u6211\u4eec\u4f7f\u7528\u4e0d\u540c\u7684layer\u8fdb\u884c\u7ec4\u5408":84,"\u6211\u4eec\u4f7f\u7528\u4e86":111,"\u6211\u4eec\u4f7f\u7528\u52a8\u6001\u5e93\u6765\u5206\u53d1paddl":45,"\u6211\u4eec\u4f7f\u7528paddl":91,"\u6211\u4eec\u5047\u8bbe\u4e00\u53f0\u673a\u5668\u4e0a\u67094\u4e2agpu":105,"\u6211\u4eec\u5148\u8c03\u7528\u6bcf\u4e2a":76,"\u6211\u4eec\u51b3\u5b9a\u4f7f\u7528\u5df2\u6709\u7684":42,"\u6211\u4eec\u5373\u53ef\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u7684\u642d\u5efa":84,"\u6211\u4eec\u53ea\u6f14\u793a\u4e00\u4e2a\u5355\u673a\u4f5c\u4e1a":96,"\u6211\u4eec\u53ea\u9700\u8981\u4f7f\u7528lstm":111,"\u6211\u4eec\u53ea\u9700\u8981\u8fd0\u884c\u4e0b\u9762\u547d\u4ee4\u628a\u7f16\u8bd1\u597d\u7684paddlepaddle\u6253\u5305\u6210\u4e00\u4e2a":72,"\u6211\u4eec\u53ea\u9700\u8981\u914d\u7f6e":0,"\u6211\u4eec\u53ef\u4ee5\u4f7f\u7528":107,"\u6211\u4eec\u53ef\u4ee5\u4f7f\u7528\u5176\u4ed6layer\u8fdb\u884c\u7ec4\u5408":84,"\u6211\u4eec\u53ef\u4ee5\u4f7f\u7528\u5b83\u6765\u751f\u6210\u5e8f\u5217":114,"\u6211\u4eec\u53ef\u4ee5\u5148\u5b8c\u6210\u5bf9\u539f\u6570\u636e\u7684packing\u64cd\u4f5c":41,"\u6211\u4eec\u53ef\u4ee5\u521b\u5efatrainer\u6765\u5bf9\u7f51\u7edc\u8fdb\u884c\u8bad\u7ec3":84,"\u6211\u4eec\u53ef\u4ee5\u53c2\u8003tensorflow\u7684":76,"\u6211\u4eec\u53ef\u4ee5\u5728":72,"\u6211\u4eec\u53ef\u4ee5\u5728\u547d\u4ee4\u884c\u4e2d\u7b80\u5355\u7684\u770b\u4e00\u4e0b\u751f\u6210\u6548\u679c":107,"\u6211\u4eec\u53ef\u4ee5\u5b9a\u4e49\u5982\u4e0b\u7684layer\u7ec4\u5408":84,"\u6211\u4eec\u53ef\u4ee5\u5b9a\u4e49\u5982\u4e0blayer\u6765\u63cf\u8ff0\u795e\u7ecf\u7f51\u7edc\u7684\u8f93\u5165":84,"\u6211\u4eec\u53ef\u4ee5\u6309\u7167\u5982\u4e0b\u5c42\u6b21\u5b9a\u4e49\u975e\u5e8f\u5217":110,"\u6211\u4eec\u53ef\u4ee5\u67e5\u770b\u6027\u80fd\u5206\u6790\u7684\u7ed3\u679c":107,"\u6211\u4eec\u53ef\u4ee5\u8bbe\u8ba1\u642d\u5efa\u4e00\u4e2a\u7075\u6d3b\u7684":113,"\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7":107,"\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u65e5\u5fd7\u67e5\u770b\u5bb9\u5668\u8bad\u7ec3\u7684\u60c5\u51b5":97,"\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u8bbe\u7f6e":91,"\u6211\u4eec\u540c\u6837\u63d0\u4f9b\u4e86\u4ece\u6e90\u7801\u7f16\u8bd1\u5b89\u88c5paddlepaddle\u7684\u65b9\u6cd5":2,"\u6211\u4eec\u5728":76,"\u6211\u4eec\u5728\u51fd\u6570\u7684\u7ed3\u5c3e\u8fd4\u56de":114,"\u6211\u4eec\u5bf9\u6a21\u578b\u8fdb\u884c\u4e86\u4ee5\u4e0b\u66f4\u6539":114,"\u6211\u4eec\u5c06":97,"\u6211\u4eec\u5c06\u4e00\u6bb5\u8bdd\u770b\u6210\u53e5\u5b50\u7684\u6570\u7ec4":111,"\u6211\u4eec\u5c06\u4ecb\u7ecd\u5982\u4f55\u542f\u52a8\u5206\u5e03\u5f0f\u8bad\u7ec3\u4f5c\u4e1a":96,"\u6211\u4eec\u5c06\u4f7f\u7528":114,"\u6211\u4eec\u5c06\u4f7f\u7528\u7b80\u5355\u7684":114,"\u6211\u4eec\u5c06\u539f\u59cb\u6570\u636e\u7684\u6bcf\u4e00\u7ec4":111,"\u6211\u4eec\u5c06\u5b83\u4eec\u5212\u5206\u4e3a\u4e0d\u540c\u7684\u7c7b\u522b":102,"\u6211\u4eec\u5c06\u795e\u7ecf\u7f51\u7edc\u4e00\u6b21\u8ba1\u7b97\u63a5\u53d7\u7684\u6240\u6709\u8f93\u5165\u6837\u672c\u79f0\u4e4b\u4e3a\u4e00\u4e2a":89,"\u6211\u4eec\u5c06\u8bad\u7ec3\u7ed3\u675f\u540e\u5b58\u50a8\u4e0b\u6765\u7684\u6a21\u578b\u8f6c\u6362\u6210\u9884\u6d4b\u6a21\u578b":90,"\u6211\u4eec\u5c31\u5b8c\u6210\u4e86\u4e00\u6b21\u4ee3\u7801\u8d21\u732e\u7684\u8fc7\u7a0b":72,"\u6211\u4eec\u5df2\u7ecf\u5b9e\u73b0\u4e86\u5927\u591a\u6570\u5e38\u7528\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u67b6\u6784":114,"\u6211\u4eec\u5efa\u8bae\u4f60\u4e3a\u4f60\u7684python\u5c01\u88c5\u5b9e\u73b0\u4e00\u4e2a":74,"\u6211\u4eec\u5efa\u8bae\u4f60\u5728\u5199\u65b0\u7f51\u7edc\u5c42\u65f6\u628a\u6d4b\u8bd5\u4ee3\u7801\u653e\u5165\u65b0\u7684\u6587\u4ef6\u4e2d":74,"\u6211\u4eec\u5efa\u8bae\u4f7f\u7528\u7b2c\u4e8c\u7c7b\u5b9e\u73b0":82,"\u6211\u4eec\u603b\u7ed3\u51fa\u4e00\u4e9b\u7279\u522b\u9700\u8981\u6ce8\u610f\u7684\u70b9":42,"\u6211\u4eec\u628apaddlepaddle\u7684\u4ea4\u53c9\u7f16\u8bd1\u73af\u5883\u6253\u5305\u6210\u4e00\u4e2a\u955c\u50cf":116,"\u6211\u4eec\u63a8\u8350\u4f7f\u7528":[1,101],"\u6211\u4eec\u63a8\u8350\u4f7f\u7528\u6700\u65b0\u7248\u672c\u7684cudnn":0,"\u6211\u4eec\u63a8\u8350\u5728docker\u4e2d\u8fd0\u884cpaddlepaddl":2,"\u6211\u4eec\u63d0\u4f9b\u4e24\u4e2a\u8f6c\u6362\u65b9\u5f0f":11,"\u6211\u4eec\u63d0\u4f9b\u4e86\u4f7f\u7528fabric":94,"\u6211\u4eec\u63d0\u4f9b\u4e86\u52a0\u901f\u8bbf\u95ee\u7684\u955c\u50cf\u6e90":[1,116],"\u6211\u4eec\u63d0\u4f9b\u4e86\u591a\u79cd\u7684\u96c6\u7fa4\u90e8\u7f72\u65b9\u5f0f":94,"\u6211\u4eec\u63d0\u4f9b\u4e86\u5982\u4e0b\u6307\u5357":88,"\u6211\u4eec\u63d0\u4f9b\u53ef\u4ee5\u76f4\u63a5\u8fd0\u884cpaddlepaddl":1,"\u6211\u4eec\u63d0\u51fa\u4e86chunk\u7684\u6982\u5ff5":27,"\u6211\u4eec\u662f\u5bf9\u6bcf\u4e00\u4e2a\u5b50\u5e8f\u5217\u53d6\u6700\u540e\u4e00\u4e2a\u5143\u7d20":111,"\u6211\u4eec\u6700\u7ec8\u7684\u52a8\u6001\u5e93\u4e2d\u4e0d\u5d4c\u5165python\u6216\u8005\u5176\u4ed6\u4efb\u4f55\u8bed\u8a00\u7684\u89e3\u91ca\u5668":45,"\u6211\u4eec\u6709\u4e00\u4e2a\u5e8f\u5217\u4f5c\u4e3a\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u72b6\u6001":114,"\u6211\u4eec\u7684":0,"\u6211\u4eec\u7684\u6807\u51c6\u5f00\u53d1\u6d41\u7a0b\u662f\u628a\u8fd9\u4e9b\u5de5\u5177\u90fd\u88c5\u8fdb\u4e00\u4e2adocker":72,"\u6211\u4eec\u770b\u4e00\u4e0b\u5355\u5c42rnn\u7684\u914d\u7f6e":111,"\u6211\u4eec\u770b\u4e00\u4e0b\u8bed\u4e49\u76f8\u540c\u7684\u53cc\u5c42rnn\u7684\u7f51\u7edc\u914d\u7f6e":111,"\u6211\u4eec\u771f\u8bda\u5730\u611f\u8c22\u60a8\u7684\u8d21\u732e":72,"\u6211\u4eec\u79f0\u4e4b\u4e3a\u4e00\u4e2a0\u5c42\u7684\u5e8f\u5217":110,"\u6211\u4eec\u8ba1\u5212\u5c06":41,"\u6211\u4eec\u8ba1\u5212\u5c06\u82f1\u7279\u5c14\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\u6570\u5b66\u5e93":42,"\u6211\u4eec\u8bbe\u8ba1\u8bf4\u660e\u4e86\u540d\u4e3afilemanager\u7cfb\u7edf":27,"\u6211\u4eec\u8c03\u7528\u4e86eigenvector\u7684flatten\u63a5\u53e3":76,"\u6211\u4eec\u8fd8\u53ef\u4ee5\u767b\u5f55\u5230\u5bbf\u4e3b\u673a\u4e0a\u67e5\u770b\u8bad\u7ec3\u7ed3\u679c":96,"\u6211\u4eec\u8fd8\u5c06\u7f16\u7801\u5411\u91cf\u6295\u5c04\u5230":114,"\u6211\u4eec\u9009\u53d6\u5355\u53cc\u5c42\u5e8f\u5217\u914d\u7f6e\u4e2d\u7684\u4e0d\u540c\u90e8\u5206":111,"\u6211\u4eec\u9009\u62e9":11,"\u6211\u4eec\u901a\u5e38\u501f\u52a9":75,"\u6211\u4eec\u901a\u5e38\u5c06\u4e00\u53e5\u8bdd\u7406\u89e3\u6210\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217":111,"\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u65b0\u5f15\u5165\u7684gemm":41,"\u6211\u4eec\u901a\u8fc7\u8bfb\u53d6":97,"\u6211\u4eec\u90fd\u63d0\u4f9bpython\u7684\u8f6c\u6362\u5e93":11,"\u6211\u4eec\u9700\u8981":0,"\u6211\u4eec\u9700\u8981\u5148\u628a\u8f93\u5165tensor\u548c\u8f93\u51fatensor\u8f6c\u6362\u4e3aeigen\u652f\u6301\u7684\u683c\u5f0f":76,"\u6211\u4eec\u9700\u8981\u5236\u4f5c\u4e00\u4e2a\u5305\u542b\u8bad\u7ec3\u6570\u636e\u7684paddlepaddle\u955c\u50cf":96,"\u6211\u4eec\u9700\u8981\u5728\u96c6\u7fa4\u7684\u6240\u6709\u8282\u70b9\u4e0a\u5b89\u88c5":101,"\u6211\u4eec\u9700\u8981\u7b49\u5f0f\u5de6\u8fb9\u7684eigentensor\u8c03\u7528device\u63a5\u53e3":76,"\u6211\u4eec\u9700\u8981\u8ba1\u7b97":74,"\u6211\u4eec\u9996\u5148\u9700\u8981\u6839\u636e\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u6765\u521b\u5efa\u6240\u9700\u8981\u4f18\u5316\u7684paramet":84,"\u6211\u5220\u9664\u4e86":72,"\u6211\u53ef\u4ee5\u7528":0,"\u6211\u53ef\u4ee5\u9009\u62e9\u4e0d\u7528docker\u5417":0,"\u6216":[89,108,117],"\u6216\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"\u6216\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a0\u5c42\u5e8f\u5217":110,"\u6216\u4e00\u4e2a\u5411\u91cf":113,"\u6216\u5355\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a0\u5c42\u5e8f\u5217":110,"\u6216\u662f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u8868\u793a\u8bcd\u8bed\u5728\u5b57\u5178\u4e2d\u7684\u5e8f\u53f7":89,"\u6216\u6700\u5927\u503c":110,"\u6216\u79f0\u4f5cweight":81,"\u6216\u7b2c\u4e00\u4e2a":110,"\u6216\u7b2c\u4e00\u4e2a\u5143\u7d20":110,"\u6216\u7f16\u5199\u7a0b\u5e8f\u65f6":91,"\u6216\u8005":[0,42,45,46,72,75,81,89,107,108,110,111],"\u6216\u8005\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":113,"\u6216\u8005\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":[110,113],"\u6216\u8005\u4e5f\u53ef\u4ee5\u4f7f\u7528\u4e3a\u4e0a\u8ff0\u53ef\u9009\u6b65\u9aa4\u6784\u5efa\u7684\u955c\u50cf":0,"\u6216\u8005\u4ece\u5de5\u5177\u7684\u754c\u9762\u91cc\u8fd0\u884c\u60a8\u7684\u5e94\u7528":108,"\u6216\u8005\u5236\u4f5c\u548c\u5206\u4eab\u5e26\u6709\u4ee3\u7801":1,"\u6216\u8005\u53cd\u5411\u5730\u4ece":114,"\u6216\u8005\u53ef\u88abdns\u89e3\u6790\u7684\u4e3b\u673a\u540d":101,"\u6216\u8005\u5728cpu\u6a21\u5f0f\u4e0b\u4f7f\u75284\u4e2a\u7ebf\u7a0b":103,"\u6216\u8005\u5c06\u8fd9\u53f0\u8282\u70b9\u8fc1\u79fb\u5230\u53e6\u4e00\u4e2a\u8282\u70b9\u5e76\u542f\u52a8\u5373\u53ef\u6062\u590d\u8bad\u7ec3\u4efb\u52a1":10,"\u6216\u8005\u5df2\u7ecf\u5728\u96c6\u7fa4\u63d0\u4ea4\u73af\u5883\u4e2d\u81ea\u52a8\u8bbe\u7f6e":102,"\u6216\u8005\u5f15\u8d77\u884c\u65f6\u9519\u8bef":89,"\u6216\u8005\u6570\u7ec4\u7684\u6570\u7ec4\u8fd9\u4e2a\u6982\u5ff5":111,"\u6216\u8005\u662f\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":110,"\u6216\u8005\u662f\u51fd\u6570\u8c03\u7528\u7684\u9891\u7387\u548c\u8017\u65f6\u7b49":108,"\u6216\u8005\u66f4\u65e9":83,"\u6216\u8005\u6bcf\u4e00\u4e2a\u7cfb\u5217\u91cc\u7684\u7279\u5f81\u6570\u636e":111,"\u6216\u8005\u7528tuple\u8868\u793a\u7684\u591a\u4e2a\u503c":11,"\u6216\u8005\u7531\u5b83\u4eec\u7ec4\u6210\u7684list":11,"\u6216\u8005\u76f4\u63a5\u6254\u6389\u975e\u5e38\u957f\u7684\u5e8f\u5217":81,"\u6216\u8005\u76f8\u5bf9\u4e8e\u6784\u5efa\u76ee\u5f55\u7684\u76f8\u5bf9\u8def\u5f84":[116,118],"\u6216\u8005\u8f93\u5165\u6570\u636e\u5c3a\u5ea6\u8fc7\u5927":81,"\u6216\u8005\u8fd0\u884c":78,"\u6216\u8005\u91c7\u7528\u5e76\u884c\u8ba1\u7b97\u6765\u52a0\u901f\u67d0\u4e9b\u5c42\u7684\u66f4\u65b0":105,"\u6216activ":42,"\u6216gpu":103,"\u622a\u65ad\u5bf9\u8c61\u4e0d\u540c":81,"\u6240\u4ee5":[1,42,63,81,107],"\u6240\u4ee5\u4e00\u4e2a\u7248\u672c\u53f7\u7684wheel\u5305\u53d1\u5e03\u4e4b\u540e":63,"\u6240\u4ee5\u4e0d\u5b58\u5728\u8fd9\u4e2a\u95ee\u9898":42,"\u6240\u4ee5\u4e0d\u80fd\u91c7\u7528\u7b2c\u4e00\u79cd\u65b9\u5f0f\u5728\u8fd9\u51e0\u4e2alayer\u91cc\u8bbe\u7f6e":82,"\u6240\u4ee5\u505a\u6cd5\u53ef\u4ee5\u6709\u4e24\u79cd":81,"\u6240\u4ee5\u53ef\u4ee5\u7b80\u5316\u5bf9\u73af\u5883\u7684\u8981\u6c42":96,"\u6240\u4ee5\u5728":42,"\u6240\u4ee5\u5728\u5199\u5165\u5feb\u7167\u7684\u8fc7\u7a0b\u4e2d":10,"\u6240\u4ee5\u5916\u5c42\u8f93\u51fa\u7684\u5e8f\u5217\u5f62\u72b6":111,"\u6240\u4ee5\u5bf9":111,"\u6240\u4ee5\u5f00\u53d1\u8005\u9700\u8981\u6839\u636e\u81ea\u5df1\u8bad\u7ec3\u4efb\u52a1\u7684\u5b9e\u9645\u573a\u666f\u5b8c\u6210\u8bad\u7ec3\u6570\u636e\u7684\u5206\u5272\u548c":91,"\u6240\u4ee5\u6027\u80fd\u4e5f\u5c31\u9010\u6b65\u53d8\u6210\u4e86\u6df1\u5ea6\u5b66\u4e60\u9886\u57df\u6700\u91cd\u8981\u7684\u6307\u6807":108,"\u6240\u4ee5\u6211\u4eec\u53ef\u4ee5\u5728\u8fd9\u4e2a\u57fa\u7840\u4e0a":97,"\u6240\u4ee5\u6211\u4eec\u5b9a\u4e49\u4e86\u4e00\u4e2a":42,"\u6240\u4ee5\u6211\u4eec\u786e\u4fdd\u53d1\u5e03\u7684\u4e8c\u8fdb\u5236\u5305\u53ef\u4ee5\u652f\u6301\u4e3b\u6d41\u7684linux\u64cd\u4f5c\u7cfb\u7edf":3,"\u6240\u4ee5\u6211\u4eec\u9700\u8981\u5c06\u8f93\u5165\u6570\u636e\u6807\u8bb0\u6210":111,"\u6240\u4ee5\u6211\u4eec\u9ed8\u8ba4\u4f7f\u7528cento":3,"\u6240\u4ee5\u6574\u4f53\u4e0a":42,"\u6240\u4ee5\u6dfb\u52a0\u4e86\u5bf9\u5e94\u7684":42,"\u6240\u4ee5\u7528\u6237\u9700\u8981\u9996\u5148\u5728":27,"\u6240\u4ee5\u76f8\u6bd4\u4e8erecurr":82,"\u6240\u4ee5\u8fd9\u4e00\u6b65\u662f\u5fc5\u8981\u7684":74,"\u6240\u4ee5\u9700\u8981\u5f15\u5165\u4e00\u4e2a\u8f6c\u6362\u65b9\u6cd5":42,"\u6240\u4f7f\u7528":117,"\u6240\u4f9d\u8d56\u7684\u7b2c\u4e09\u65b9\u5e93\u540c\u65f6\u4e5f\u88ab\u5b89\u88c5\u5230":116,"\u6240\u5bf9\u5e94\u7684\u8bcd\u8868index\u6570\u7ec4":111,"\u6240\u6709\u4e0e\u7c7b\u578b\u76f8\u5173\u7684\u51fd\u6570":46,"\u6240\u6709\u4ee3\u7801\u5fc5\u987b\u5177\u6709\u5355\u5143\u6d4b\u8bd5":72,"\u6240\u6709\u53c2\u6570\u7f6e\u4e3a\u96f6":103,"\u6240\u6709\u547d\u4ee4\u884c\u9009\u9879\u53ef\u4ee5\u8bbe\u7f6e\u4e3a":93,"\u6240\u6709\u5916\u90e8\u7684\u8f6c\u6362\u5de5\u4f5c\u90fd\u4f1a\u5728reset\u7cfb\u5217\u51fd\u6570\u4e2d\u90fd\u51c6\u5907\u597d":42,"\u6240\u6709\u67b6\u6784":116,"\u6240\u6709\u7684":[41,72,74],"\u6240\u6709\u7684\u5355\u6d4b\u90fd\u4f1a\u88ab\u6267\u884c\u4e00\u6b21":74,"\u6240\u6709\u7684\u63a5\u53e3\u5747\u4e3ac\u63a5\u53e3":46,"\u6240\u6709\u7684\u64cd\u4f5c\u90fd\u662f\u9488\u5bf9\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u6765\u8fdb\u884c\u7684":111,"\u6240\u6709\u7684python\u5c01\u88c5\u90fd\u4f7f\u7528":74,"\u6240\u6709\u7684python\u5c01\u88c5\u90fd\u5728":74,"\u6240\u6709\u76f8\u5173\u7684":41,"\u6240\u6709\u7c7b\u578b\u540d\u4e3a":46,"\u6240\u6709\u7f51\u7edc\u5c42\u7684\u68af\u5ea6\u68c0\u67e5\u5355\u6d4b\u90fd\u4f4d\u4e8e":74,"\u6240\u6709\u8f93\u5165\u5e8f\u5217\u5e94\u8be5\u6709\u76f8\u540c\u7684\u957f\u5ea6":114,"\u6240\u6709mkl":42,"\u6240\u9700\u652f\u6301\u7684\u6700\u4f4eandroid":116,"\u6240\u9700\u7684\u5f00\u53d1\u5de5\u5177\u548c\u7b2c\u4e09\u65b9\u5e93\u53ef\u4ee5\u53c2\u8003":118,"\u624b\u5199\u591a\u8bed\u8a00\u7ed1\u5b9a":45,"\u624b\u5199\u6570\u5b57\u8bc6\u522b":90,"\u624b\u5199\u6570\u5b57\u8bc6\u522b\u4efb\u52a1\u5b9a\u4e49\u4e86\u4e00\u4e2a\u542b\u6709":90,"\u624b\u52a8\u4e0b\u8f7d\u4e14\u89e3\u538b\u7f29":78,"\u624b\u52a8\u4e0b\u8f7d\u5e76\u5b89\u88c5":78,"\u624d\u53ef\u4ee5\u5b89\u88c5":3,"\u624d\u80fd\u4fdd\u8bc1\u548c\u5355\u5c42\u5e8f\u5217\u7684\u914d\u7f6e\u4e2d":111,"\u624d\u80fd\u53d1\u6325\u5176\u5168\u90e8\u80fd\u529b":108,"\u624d\u80fd\u66f4\u597d\u7684\u53d1\u6325mkl":42,"\u6253\u5f00":108,"\u6253\u5f00\u6d4f\u89c8\u5668\u8bbf\u95ee\u5bf9\u5e94\u76ee\u5f55\u4e0b\u7684index":77,"\u6253\u5f00\u8fd9\u4e2a\u7f16\u8bd1\u9009\u9879":46,"\u6267\u884c":[63,86,87,93],"\u6267\u884c\u4e0a\u8ff0":116,"\u6267\u884c\u4e0a\u8ff0\u4ee3\u7801\u751f\u6210makefile\u6587\u4ef6\u540e":87,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u4ee5\u542f\u52a83\u4e2a\u8282\u70b9\u7684openmpi\u96c6\u7fa4\u548c\u4e00\u4e2a":98,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u5373\u53ef\u5728\u5f53\u524d\u673a\u5668\u4e0a\u5b89\u88c5paddlepaddle\u7684\u8fd0\u884c\u65f6\u73af\u5883":3,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u53ef\u4ee5\u67e5\u770b\u5df2\u7ecf\u5b89\u88c5\u7684\u7248\u672c":101,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u5b8c\u6210\u5feb\u901f\u5b89\u88c5":86,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u6765\u8fd0\u884c\u5355\u5143\u6d4b\u8bd5":75,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u7f16\u8bd1cpu":0,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u83b7\u53d6\u6700\u65b0\u7684paddlepaddl":1,"\u6267\u884c\u4ee5\u4e0b\u547d\u4ee4":[116,117,118],"\u6267\u884c\u4ee5\u4e0b\u547d\u4ee4\u542f\u52a8\u4f7f\u7528python\u7f16\u5199\u7684trainer\u7a0b\u5e8f":91,"\u6267\u884c\u4ee5\u4e0b\u64cd\u4f5c":114,"\u6267\u884c\u4ee5\u4e0b\u7684\u547d\u4ee4\u542f\u52a8\u4e00\u4e2a\u53c2\u6570\u670d\u52a1\u5668\u5e76\u7b49\u5f85\u548c\u8ba1\u7b97\u8282\u70b9\u7684\u6570\u636e\u4ea4\u4e92":91,"\u6267\u884c\u5b8c\u5b89\u88c5\u547d\u4ee4\u540e":[116,117,118],"\u6267\u884c\u60a8\u7684\u4ee3\u7801":108,"\u627e\u5230":[0,114],"\u627e\u5230\u4ee5\u4e0a\u76f8\u5173\u7684\u4f8b\u5b50":94,"\u627e\u5230\u6700\u65e9\u62a5\u9519\u7684\u5730\u65b9":79,"\u627e\u5230\u8fd0\u884c\u6162\u7684\u539f\u56e0":108,"\u627e\u5230\u8fd0\u884c\u6162\u7684\u90e8\u5206":108,"\u628a":[11,74],"\u628a\u4e4b\u524d\u793a\u4f8b\u4e2d\u8f6c\u6362\u5b8c\u6bd5\u7684random":11,"\u628a\u4efb\u610f\u7ef4\u5ea6\u7684tensor\u8f6c\u4e3a\u4e86\u4e00\u7ef4\u7684eigenvector":76,"\u628a\u5de5\u5177\u548c\u914d\u7f6e\u90fd\u5b89\u88c5\u5728\u4e00\u4e2a":0,"\u628a\u8bad\u7ec3\u6570\u636e\u76f4\u63a5\u653e\u5728":96,"\u628a\u8fd9\u4e9b\u5de5\u5177\u5b89\u88c5\u5230\u672c\u673a":0,"\u6295\u5c04\u53cd\u5411rnn\u7684\u7b2c\u4e00\u4e2a\u5b9e\u4f8b\u5230":114,"\u6295\u5c04\u7f16\u7801\u5411\u91cf\u5230":114,"\u62c6\u6210\u4ee5\u4e0a\u4e24\u4e2a\u9759\u6001\u94fe\u63a5\u5e93":87,"\u62c6\u89e3":113,"\u62c6\u89e3\u6210\u7684\u6bcf\u4e00\u53e5\u8bdd\u518d\u901a\u8fc7\u4e00\u4e2alstm\u7f51\u7edc":111,"\u62f7\u8d1d\u5230numpi":81,"\u62f7\u8d1d\u5fc5\u8981\u7684\u6587\u4ef6\u5230head\u8282\u70b9":98,"\u62f7\u8d1d\u8bad\u7ec3\u6570\u636e\u5230\u5404\u81ea\u7684\u8282\u70b9":98,"\u62f7\u8d1d\u8bad\u7ec3\u6587\u4ef6\u5230\u5bb9\u5668\u5185":97,"\u62f7\u8d1d\u8bad\u7ec3\u7a0b\u5e8f\u548c\u5b57\u5178\u6587\u4ef6\u5230\u6bcf\u53f0mpi\u8282\u70b9":98,"\u62fc\u63a5":81,"\u6302\u8f7d\u5230\u5bb9\u5668\u5185\u90e8\u7684":1,"\u6302\u8f7d\u6216\u4e0b\u8f7d\u7684\u8bad\u7ec3\u6570\u636e\u5206\u7247":91,"\u6307\u53d1\u73b0\u6027\u80fd\u74f6\u9888":107,"\u6307\u5411\u4e00\u4e2alayer":113,"\u6307\u5b9a":[81,82,113,114],"\u6307\u5b9a\u4e00\u53f0\u673a\u5668\u4e0a\u4f7f\u7528\u7684\u7ebf\u7a0b\u6570":103,"\u6307\u5b9a\u4e3a":89,"\u6307\u5b9a\u4f7f\u75282":81,"\u6307\u5b9a\u524d\u5411\u7f51\u7edc\u6700\u7ec8\u7684\u8f93\u51fa\u76ee\u6807\u53d8\u91cf":75,"\u6307\u5b9a\u52a0\u8f7d\u7684\u65b9\u5f0f":103,"\u6307\u5b9a\u5728\u751f\u6210\u6027\u80fd\u5206\u6790\u6587\u4ef6\u4e4b\u540e":107,"\u6307\u5b9a\u5bf9\u8f93\u5165\u53d8\u91cf":75,"\u6307\u5b9a\u5c06\u5f53\u524d\u8def\u5f84":1,"\u6307\u5b9a\u68c0\u6d4b\u68af\u5ea6\u65f6\u80fd\u5bb9\u5fcd\u7684\u6700\u5927\u9519\u8bef\u503c":75,"\u6307\u5b9a\u7684\u5185\u5bb9\u5b58\u50a8\u5e93\u8fd0\u884c\u547d\u4ee4":77,"\u6307\u5b9a\u7684\u8f93\u5165\u4e0d\u4f1a\u88ab":113,"\u6307\u5b9a\u8981\u8f93\u51fa\u7684\u5b57\u6bb5\u8fdb\u884c\u8f93\u51fa":81,"\u6307\u5b9a\u9700\u8981\u4f7f\u7528\u7684\u5bb9\u5668":1,"\u6307\u5b9acudnn\u7684\u6700\u5927\u5de5\u4f5c\u7a7a\u95f4\u5bb9\u9650":103,"\u6307\u5bf9\u4e8e\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217\u8f93\u5165\u6570\u636e":111,"\u6307\u5f00\u542fhttp\u670d\u52a1":107,"\u6307\u6d88\u9664\u74f6\u9888":107,"\u6307\u6df1\u5ea6\u5b66\u4e60\u8bad\u7ec3\u4e4b\u540e\u5f97\u5230\u7684\u6240\u6709\u53c2\u6570":10,"\u6307\u793a\u4f7f\u7528\u54ea\u4e2agpu\u6838":103,"\u6307\u793a\u5728\u7b80\u5355\u7684recurrentlayer\u5c42\u7684\u8ba1\u7b97\u4e2d\u662f\u5426\u4f7f\u7528\u6279\u5904\u7406\u65b9\u6cd5":103,"\u6307\u793a\u5f53\u6307\u5b9a\u8f6e\u7684\u6d4b\u8bd5\u6a21\u578b\u4e0d\u5b58\u5728\u65f6":103,"\u6307\u793a\u662f\u5426\u4f7f\u7528\u591a\u7ebf\u7a0b\u6765\u8ba1\u7b97\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc":103,"\u6307\u793a\u662f\u5426\u5f00\u542f\u53c2\u6570\u670d\u52a1\u5668":103,"\u6307\u793a\u662f\u5426\u663e\u793a\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u7684\u7a00\u758f\u53c2\u6570\u5206\u5e03\u7684\u65e5\u5fd7\u7ec6\u8282":103,"\u6307\u793a\u662f\u5426\u68c0\u67e5\u6240\u6709\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u7684\u7a00\u758f\u53c2\u6570\u7684\u5206\u5e03\u662f\u5747\u5300\u7684":103,"\u6309\u542f\u53d1\u5f0f\u635f\u5931\u7684\u5927\u5c0f\u9012\u589e\u6392\u5e8f":103,"\u6309\u7167\u4e0b\u9762\u6b65\u9aa4\u5373\u53ef":97,"\u6309\u7167\u5177\u4f53\u5b9e\u73b0\u65b9\u5f0f\u53ef\u4ee5\u5f52\u7eb3\u4e3a2\u7c7b":82,"\u6309\u7167\u57fa\u672c\u6570\u636e\u7c7b\u578b\u5728paddlepaddle\u5185\u90e8\u7684\u5b9a\u4e49\u548c\u5b9e\u73b0":89,"\u6309\u94ae":[63,72],"\u635f\u5931\u51fd\u6570\u5c42":90,"\u6392\u6210\u4e00\u5217\u7684\u591a\u4e2a\u5143\u7d20":110,"\u63a5\u4e0a\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42":84,"\u63a5\u4e0a\u5e73\u65b9\u8bef\u5dee\u5c42":84,"\u63a5\u4e0b\u6765":75,"\u63a5\u4e0b\u6765\u53ef\u4ee5\u8003\u8651\u4e0b\u65f6\u95f4\u7ebf\u7684\u5206\u6790":108,"\u63a5\u4e0b\u6765\u5c31\u53ef\u4ee5\u4f7f\u7528":108,"\u63a5\u4e0b\u6765\u6211\u4eec\u521b\u5efa\u4e00\u4e2a\u539f\u59cb":72,"\u63a5\u4e0b\u6765\u6211\u4eec\u53d6\u6d88\u5bf9":72,"\u63a5\u4e0b\u6765\u7b49\u5f85":72,"\u63a5\u53d7\u4e00\u4e2a\u8f93\u5165\u53c2\u6570":75,"\u63a5\u53e3":[45,46,75,76,90],"\u63a5\u53e3\u4f1a\u88ab\u8c03\u7528":76,"\u63a5\u53e3\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u7684\u524d\u5411\u8ba1\u7b97":90,"\u63a5\u53e3\u5bf9\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u548c\u8bad\u7ec3\u597d\u7684\u53c2\u6570\u8fdb\u884c\u5e8f\u5217\u5316":90,"\u63a5\u53e3\u5c42\u505a\u8fc7\u591a\u5c01\u88c5":46,"\u63a5\u53e3\u662f":11,"\u63a5\u53e3\u6700\u7ec8\u4f1a\u8c03\u7528\u5bf9\u5e94":76,"\u63a5\u53e3\u6709\u4e00\u4e2a":81,"\u63a5\u53e3\u7684":81,"\u63a5\u53e3\u8bf4\u660e\u8bf7\u67e5\u770b":89,"\u63a5\u6536\u5230\u8db3\u591f\u7684gradient":91,"\u63a5\u6536\u5904\u7406pfsclient\u7aef\u7684\u6587\u4ef6\u7ba1\u7406\u8bf7\u6c42":27,"\u63a5\u7740\u5bf9\u6240\u6709\u53c2\u6570\u7684\u4f7f\u7528\u573a\u5408\u8fdb\u884c\u6982\u8ff0\u548c\u5206\u7c7b":104,"\u63a5\u7740\u7f16\u8bd1\u5373\u53ef":78,"\u63a7\u5236":103,"\u63a7\u5236\u662f\u5426\u4f7f\u7528mkl":42,"\u63a7\u5236\u662f\u5426\u4f7f\u7528mklml\u5e93":42,"\u63a7\u5236\u7528\u6237\u6743\u9650":11,"\u63a8\u5bfc\u8be5\u5c42\u524d\u5411\u548c\u540e\u5411\u4f20\u9012\u7684\u65b9\u7a0b":74,"\u63a8\u8350\u4f7f\u7528\u6b64\u65b9\u5f0f":87,"\u63a8\u8350\u4f7f\u7528centos\u7684devtools2":0,"\u63a8\u8350\u6e05\u7406\u6574\u4e2a\u7f16\u8bd1\u76ee\u5f55":0,"\u63a8\u8350\u8bbe\u7f6e\u4e3a":87,"\u63a8\u8350\u914d\u7f6e\u4e3a":87,"\u63a8\u8350\u914d\u7f6e\u9009\u9879":87,"\u63a8\u9001\u5230\u8fdc\u7a0b\u4ed3\u5e93":72,"\u63cf\u8ff0\u7684\u9ed8\u8ba4\u5165\u53e3\u7a0b\u5e8f":0,"\u63cf\u8ff0\u8be5op\u7684\u8f93\u5165":75,"\u63cf\u8ff0\u95ee\u9898":72,"\u63d0\u4ea4\u65b9\u5f0f\u53c2\u89c1":77,"\u63d0\u4ea4pull":72,"\u63d0\u4f9b":93,"\u63d0\u4f9b\u4e03\u5c42\u534f\u8bae\u7684\u53cd\u5411\u4ee3\u7406":27,"\u63d0\u4f9b\u4e86\u4e00\u4e2a\u542f\u52a8\u811a\u672c":97,"\u63d0\u4f9b\u4e86\u547d\u4ee4\u6837\u4f8b\u6765\u8fd0\u884c":93,"\u63d0\u4f9b\u4e86\u65b9\u4fbf\u7684\u548c":107,"\u63d0\u4f9b\u4e86\u81ea\u52a8\u5316\u811a\u672c\u6765\u542f\u52a8\u4e0d\u540c\u8282\u70b9\u4e2d\u7684\u6240\u6709":93,"\u63d0\u4f9b\u51e0\u4e4e\u6240\u6709\u8bad\u7ec3\u7684\u5185\u90e8\u8f93\u51fa\u65e5\u5fd7":93,"\u63d0\u4f9b\u5e38\u7528\u7684\u547d\u4ee4\u884c\u7ba1\u7406\u547d\u4ee4\u7ba1\u7406\u6587\u4ef6\u548c\u76ee\u5f55":27,"\u63d0\u4f9b\u6269\u5c55\u7684\u957f\u5ea6\u4fe1\u606f":110,"\u63d0\u4f9b\u7528\u6237\u7ba1\u7406\u6587\u4ef6\u7684\u547d\u4ee4":27,"\u63d0\u4f9b\u7ed9paddle\u4f5c\u4e3a\u8bad\u7ec3\u6570\u636e":11,"\u63d0\u4f9b\u8bad\u7ec3\u8fc7\u7a0b\u7684":93,"\u63d0\u793a":78,"\u641c\u7d22\u4ee3\u7801\u5e93":77,"\u642d\u5efa\u795e\u7ecf\u7f51\u7edc\u5c31\u50cf\u4f7f\u7528\u79ef\u6728\u642d\u5efa\u5b9d\u5854\u4e00\u6837":84,"\u64cd\u4f5c":111,"\u64cd\u4f5c\u7cfb\u7edf":[0,3],"\u652f\u6301\u4e24\u79cd\u5e8f\u5217\u7c7b\u578b":89,"\u652f\u6301\u4ea4\u53c9\u7f16\u8bd1":118,"\u652f\u6301\u53cc\u5c42\u5e8f\u5217\u4f5c\u4e3a\u8f93\u5165\u7684layer":[112,113],"\u652f\u6301\u5927\u6587\u4ef6\u7684\u65ad\u70b9\u4e0a\u4f20":27,"\u652f\u6301\u5927\u89c4\u6a21\u96c6\u7fa4\u751f\u4ea7\u73af\u5883\u7684\u5b8c\u6574\u96c6\u7fa4\u65b9\u6848":94,"\u652f\u6301\u7684\u6700\u5c0f\u7684android":116,"\u652f\u6301\u7684\u6700\u5c0fandroid":116,"\u652f\u6301\u7ef4\u6570\u53ef\u53d8\u7684\u6570\u636e\u8f93\u5165":82,"\u652f\u6301\u7f16\u8bd1\u5668":116,"\u6539\u53d8\u7ef4\u5ea6\u987a\u5e8f":82,"\u653e\u5728\u8fd9\u4e2a\u76ee\u5f55\u91cc\u7684\u6587\u4ef6\u5176\u5b9e\u662f\u4fdd\u5b58\u5230\u4e86mfs\u4e0a":97,"\u6545\u800c\u662f\u4e00\u4e2a\u5355\u5c42\u65f6\u95f4\u5e8f\u5217":111,"\u6548\u679c\u5982\u4e0b":107,"\u6559\u7a0b":1,"\u6570":113,"\u6570\u5b66\u5e93":87,"\u6570\u5fc5\u987b\u4e25\u683c\u76f8\u7b49":113,"\u6570\u636e":[27,89,90],"\u6570\u636e\u4e2d0":83,"\u6570\u636e\u5206\u7247":92,"\u6570\u636e\u63d0\u4f9b\u5668":102,"\u6570\u636e\u8bfb\u53d6\u5747\u4ea4\u7531\u5176\u4ed6\u8bed\u8a00\u5b8c\u6210":45,"\u6570\u636e\u8f93\u5165":[89,113],"\u6570\u636e\u957f\u5ea6\u53ca\u6821\u9a8c\u503c\u7ec4\u6210":27,"\u6570\u636e\u96c6":90,"\u6570\u636e\u96c6\u9700\u8981\u9884\u5148\u88ab\u8f6c\u6362\u6210paddlepaddle\u5206\u5e03\u5f0f\u8bad\u7ec3\u4f7f\u7528\u7684\u5b58\u50a8\u683c":11,"\u6570\u636e\u9884\u5904\u7406\u4efb\u52a1":11,"\u6570\u76ee":105,"\u6574\u4e2a\u5b89\u88c5\u8fc7\u7a0b\u8017\u65f6\u8f83\u957f":2,"\u6574\u4f53\u4f7f\u7528\u6d41\u7a0b":90,"\u6574\u4f53\u6570\u636e\u548c\u539f\u59cb\u6570\u636e\u5b8c\u5168\u4e00\u6837":111,"\u6574\u4f53\u7684\u7ed3\u6784\u56fe\u5982\u4e0b":97,"\u6574\u578b\u6570\u7ec4":89,"\u6574\u6570":74,"\u6574\u6570\u6807\u7b7e":84,"\u6587\u4ef6":[45,72,75,89,96],"\u6587\u4ef6\u4e2d":[75,90,97],"\u6587\u4ef6\u4e2d\u6ce8\u518c\u524d\u5411":75,"\u6587\u4ef6\u4e2d\u6ce8\u518c\u8be5op\u548ckernel":75,"\u6587\u4ef6\u4e2d\u6ce8\u518ccuda":75,"\u6587\u4ef6\u4e3a":81,"\u6587\u4ef6\u4e4b\u5916":72,"\u6587\u4ef6\u4f20\u8f93\u7684\u7684\u5173\u952e\u5728\u4e8e\u9700\u8981pfsclient\u7aef\u5bf9\u6bd4source\u548cdestination\u7684\u6587\u4ef6chunks\u7684checksum\u662f\u5426\u4fdd\u6301\u4e00\u81f4":27,"\u6587\u4ef6\u5185\u5bb9\u4e3a":45,"\u6587\u4ef6\u540d":107,"\u6587\u4ef6\u540d\u4e3a\u4efb\u610f\u6587\u4ef6\u540d":91,"\u6587\u4ef6\u540d\u4e3a\u6b64uuid":10,"\u6587\u4ef6\u547d\u540d\u4ee5":75,"\u6587\u4ef6\u59390":97,"\u6587\u4ef6\u5bf9\u5e94\u7684data":11,"\u6587\u4ef6\u5de5\u5177\u662f\u4f7f\u7528docker":77,"\u6587\u4ef6\u7684\u4e0a\u4f20\u548c\u4e0b\u8f7d\u90fd\u662f\u901a\u8fc7\u5bf9chunk\u7684\u64cd\u4f5c\u6765\u5b9e\u73b0\u7684":27,"\u6587\u4ef6\u7684\u6539\u53d8":72,"\u6587\u4ef6\u7684\u8def\u5f84\u6765\u52a0\u8f7d\u9884\u6d4b\u6a21\u578b":90,"\u6587\u4ef6model":105,"\u6587\u5b57\u7684\u4ea4\u4e92\u5f0f\u6587\u6863":1,"\u6587\u6863":78,"\u6587\u68631":76,"\u6587\u68632":76,"\u6587\u6863\u8f83\u5c11":76,"\u6587\u6863\u90fd\u662f\u901a\u8fc7":77,"\u6587\u7ae0":97,"\u65b0\u5efa\u4e00\u4e2a\u6743\u91cd":74,"\u65b0\u624b\u5165\u95e8":115,"\u65b0\u624b\u5165\u95e8\u7ae0\u8282":63,"\u65b0\u7248\u672c":42,"\u65b9\u4fbf\u4f7f\u7528":88,"\u65b9\u4fbf\u5206\u4eab\u8fd0\u884c\u65f6\u73af\u5883":2,"\u65b9\u4fbf\u6392\u67e5\u4ee5\u53ca\u5feb\u901f\u5b9a\u4f4d\u95ee\u9898":81,"\u65b9\u4fbf\u63d0\u4ea4\u96c6\u7fa4\u8bad\u7ec3\u4efb\u52a1":94,"\u65b9\u4fbf\u6d4b\u8bd5\u4eba\u5458\u6d4b\u8bd5paddlepaddle\u7684\u884c\u4e3a":63,"\u65b9\u4fbf\u7528\u6237\u4e0a\u4f20\u81ea\u5df1\u7684\u8bad\u7ec3\u6570\u636e\u4ee5\u8fdb\u884c\u5206\u5e03\u5f0f\u8bad\u7ec3":27,"\u65b9\u4fbf\u7528\u6237\u5728python\u7aef\u9009\u62e9\u662f\u5426\u542f\u7528\u8fd9\u4e2a\u529f\u80fd":41,"\u65b9\u4fbf\u7528\u6237\u9009\u62e9\u4f7f\u7528mkl":42,"\u65b9\u5f0f1":81,"\u65b9\u5f0f2":81,"\u65b9\u5f0f\u5c06\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u548c\u8bad\u7ec3\u597d\u7684\u53c2\u6570\u5e8f\u5217\u5316\u5230\u4e00\u4e2a\u6587\u4ef6":90,"\u65b9\u5f0f\u7c7b\u4f3c\u4e8e":42,"\u65b9\u6cd5\u4e00":105,"\u65b9\u6cd5\u4e09":105,"\u65b9\u6cd5\u4e8c":105,"\u65e0\u5ef6\u8fdf":103,"\u65e0\u6cd5\u505a\u5230\u5bf9\u4e8e\u5404\u79cd\u8bed\u8a00\u9519\u8bef\u5904\u7406\u65b9\u5f0f\u7684\u9002\u914d":45,"\u65e0\u8bba\u5728\u672c\u5730\u8fd8\u662f\u5728\u4e91\u7aef":11,"\u65e0\u8bba\u662f\u4ece":11,"\u65e0\u8bba\u662f\u5728\u672c\u5730\u6216\u662f\u4e91\u7aef\u8f6c\u6362":11,"\u65e0\u8bba\u662f\u91cd\u6784\u524d\u7684layer\u8fd8\u662f\u91cd\u6784\u540e\u7684op":42,"\u65e0\u9700\u5173\u5fc3\u548c\u5904\u7406\u5e8f\u5217\u4fe1\u606f":89,"\u65e0\u9700\u5355\u72ec\u5b89\u88c5\u7b2c\u4e09\u65b9\u4f9d\u8d56":2,"\u65e0\u9700\u63d0\u4f9b\u975e\u96f6\u5143\u7684\u503c":89,"\u65e0\u9700\u9644\u52a0\u5e8f\u5217\u4fe1\u606f":89,"\u65e0\u9ed8\u8ba4\u503c":[116,118],"\u65e5\u5fd7\u62a5\u9519\u4e3a\u7f51\u7edc\u901a\u4fe1\u7c7b\u9519\u8bef":79,"\u65f6":[10,41,42,74,81,83,87,89,97,103,110,114,116],"\u65f6\u4e00\u8d77\u66f4\u65b0":42,"\u65f6\u4f7f\u7528openblas\u6570\u5b66\u5e93":87,"\u65f6\u5982\u4f55\u7ec4\u7ec7\u8f93\u5165\u6570\u636e":89,"\u65f6\u6709\u6548":117,"\u65f6\u88ab\u8bad\u7ec3\u7684":74,"\u65f6\u8bbe\u5907id\u53f7\u7684\u5206\u914d":105,"\u65f6\u95f4":111,"\u65f6\u95f4\u6b65\u7684\u6982\u5ff5":111,"\u65f6\u987b\u4ece\u7b2c17\u5b57\u8282\u5f00\u59cb":83,"\u6613\u4e8e\u95ee\u9898\u7684\u590d\u73b0":2,"\u6620\u5c04\u4e3a":0,"\u6620\u5c04\u5230\u4e00\u4e2a\u7ef4\u5ea6\u4e3a":74,"\u662f":[3,27,42,78],"\u662f\u4e00\u4e2a\u51681\u7684\u5411\u91cf":74,"\u662f\u4e00\u4e2a\u5185\u7f6e\u7684\u5b9a\u65f6\u5668\u5c01\u88c5":108,"\u662f\u4e00\u4e2a\u52a8\u6001\u7a0b\u5e8f\u5206\u6790\u7684\u672f\u8bed":108,"\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"\u662f\u4e00\u4e2a\u53cc\u5c42\u7684\u5e8f\u5217":110,"\u662f\u4e00\u4e2a\u591a\u8bed\u8a00\u63a5\u53e3\u7684\u4ee3\u7801\u751f\u6210\u5668":45,"\u662f\u4e00\u4e2a\u5c01\u88c5\u5bf9\u8c61":108,"\u662f\u4e00\u4e2a\u5f88\u6709\u7528\u7684\u53c2\u6570":105,"\u662f\u4e00\u4e2a\u65b9\u4fbf\u7684\u7a0b\u5e8f\u90e8\u7f72\u548c\u7ba1\u7406\u5de5\u5177":94,"\u662f\u4e00\u4e2a\u7c7b\u578b\u7684\u6807\u5fd7":46,"\u662f\u4e00\u4e2a\u975e\u7ebf\u6027\u7684":74,"\u662f\u4e00\u4e2apython\u7684\u7b2c\u4e09\u65b9\u5e93":107,"\u662f\u4e00\u4e2aunbound":113,"\u662f\u4e00\u6761\u65f6\u95f4\u5e8f\u5217":84,"\u662f\u4e00\u6b21\u9884\u6d4b\u63a5\u53d7\u7684\u6837\u672c\u6570\u76ee":89,"\u662f\u4e00\u79cd\u4efb\u610f\u590d\u6742\u7684rnn\u5355\u5143":113,"\u662f\u4e0d\u5305\u62ec\u6e90\u7801\u7684":96,"\u662f\u4e0d\u5e38\u89c1\u7684\u505a\u6cd5":45,"\u662f\u4f7f\u5f97\u8981\u5171\u4eab\u7684\u53c2\u6570\u4f7f\u7528\u540c\u6837\u7684":83,"\u662f\u4f7f\u7528mkl\u6570\u5b66\u5e93":87,"\u662f\u504f\u5dee":114,"\u662f\u5404\u4e2a\u5b9e\u73b0\u4e2d\u5171\u4eab\u7684\u5934\u6587\u4ef6":46,"\u662f\u5411\u91cf":74,"\u662f\u5426\u4e3a\u5f02\u6b65sgd\u66f4\u65b0\u6a21\u5f0f":91,"\u662f\u5426\u4ec5\u7f16\u8bd1capi":0,"\u662f\u5426\u4ee5\u9006\u5e8f\u5904\u7406\u8f93\u5165\u5e8f\u5217":113,"\u662f\u5426\u4f7f\u7528":117,"\u662f\u5426\u4f7f\u7528\u53cc\u7cbe\u5ea6\u6d6e\u70b9\u6570":0,"\u662f\u5426\u4f7f\u7528\u65e7\u7684remoteparameterupdat":103,"\u662f\u5426\u4f7f\u7528\u6743\u91cd":74,"\u662f\u5426\u4f7f\u7528arm\u6a21\u5f0f":116,"\u662f\u5426\u4f7f\u7528eigen\u5e93\u8fdb\u884c\u77e9\u9635\u8ba1\u7b97":[116,117],"\u662f\u5426\u4f7f\u7528mkl\u6570\u5b66\u5e93":0,"\u662f\u5426\u4f7f\u7528neon\u6307\u4ee4":[116,118],"\u662f\u5426\u4f7f\u80fd":117,"\u662f\u5426\u5185\u5d4cpython\u89e3\u91ca\u5668":0,"\u662f\u5426\u5219\u5171\u4eab\u540c\u4e00\u4e2a":75,"\u662f\u5426\u542f\u7528gpu\u8bad\u7ec3":91,"\u662f\u5426\u5c06\u5168\u5c40\u79cd\u5b50\u5e94\u7528\u4e8e\u672c\u5730\u7ebf\u7a0b\u7684\u968f\u673a\u6570":103,"\u662f\u5426\u5f00\u542f\u5355\u5143\u6d4b\u8bd5":0,"\u662f\u5426\u6253\u5370\u7248\u672c\u4fe1\u606f":103,"\u662f\u5426\u6253\u5f00":41,"\u662f\u5426\u652f\u6301gpu":0,"\u662f\u5426\u663e\u793a":103,"\u662f\u5426\u7a00\u758f":74,"\u662f\u5426\u7f16\u8bd1\u4e2d\u82f1\u6587\u6587\u6863":0,"\u662f\u5426\u7f16\u8bd1\u542b\u6709avx\u6307\u4ee4\u96c6\u7684paddlepaddle\u4e8c\u8fdb\u5236\u6587\u4ef6":0,"\u662f\u5426\u7f16\u8bd1\u65f6\u8fdb\u884c\u4ee3\u7801\u98ce\u683c\u68c0\u67e5":0,"\u662f\u5426\u7f16\u8bd1c":117,"\u662f\u5426\u7f16\u8bd1go\u8bed\u8a00\u7684\u53ef\u5bb9\u9519paramet":0,"\u662f\u5426\u7f16\u8bd1python\u7684swig\u63a5\u53e3":0,"\u662f\u5426\u8fd0\u884c\u65f6\u52a8\u6001\u52a0\u8f7dcuda\u52a8\u6001\u5e93":0,"\u662f\u5426\u9700\u8981\u7b49\u5f85\u8be5\u8f6e\u6a21\u578b\u53c2\u6570":103,"\u662f\u56e0\u4e3a\u8fd9\u4e2a\u6d41\u7a0b\u6bd4\u5176\u4ed6\u65b9\u6cd5\u90fd\u66f4\u7b80\u4fbf":0,"\u662f\u56e0\u4e3ac99\u652f\u6301":45,"\u662f\u5728paddlepaddle\u4e2d\u6784\u9020\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u65f6\u6700\u91cd\u8981\u7684\u6982\u5ff5":114,"\u662f\u5b58\u6709\u4e00\u7cfb\u5217\u53d8\u6362\u77e9\u9635\u7684\u6743\u91cd":74,"\u662f\u5b58\u6709\u504f\u7f6e\u5411\u91cf\u7684\u6743\u91cd":74,"\u662f\u5bf9\u7528\u6237\u6587\u4ef6\u5b58\u50a8\u7a7a\u95f4\u7684\u62bd\u8c61":27,"\u662f\u5bfb\u627e\u74f6\u9888\u7684\u5173\u952e\u6307\u6807":107,"\u662f\u5f00\u542favx\u7f16\u8bd1\u7684":1,"\u662f\u5f85\u6269\u5c55\u7684\u6570\u636e":110,"\u662f\u6210\u719f\u7684\u9ad8\u6027\u80fd\u5e76\u884c\u8ba1\u7b97\u6846\u67b6":94,"\u662f\u6211\u4eec":72,"\u662f\u6211\u4eec\u8981\u5206\u6790\u7684\u7a0b\u5e8f":107,"\u662f\u6307":46,"\u662f\u6307\u4e00\u7cfb\u5217\u7684\u7279\u5f81\u6570\u636e":111,"\u662f\u6307recurrent_group\u7684\u591a\u4e2a\u8f93\u5165\u5e8f\u5217":111,"\u662f\u6570\u636e\u8f93\u5165":114,"\u662f\u6709\u610f\u4e49\u7684":111,"\u662f\u6784\u5efa\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u6700\u91cd\u8981\u7684\u5de5\u5177":114,"\u662f\u6ca1\u6709\u540d\u5b57\u7684":0,"\u662f\u7528\u6237\u4f7f\u7528c":46,"\u662f\u7684":0,"\u662f\u77e9\u9635":74,"\u662f\u795e\u7ecf\u7f51\u7edc\u5b9a\u4e49\u65f6":89,"\u662f\u7f51\u7edc\u5c42\u5b9e\u4f8b\u7684\u540d\u5b57\u6807\u8bc6\u7b26":74,"\u662f\u7f51\u7edc\u5c42\u7684\u6807\u8bc6\u7b26":74,"\u662f\u7f51\u7edc\u5c42\u7684\u7c7b\u578b":74,"\u662f\u7f51\u7edc\u5c42\u8f93\u51fa\u7684\u5927\u5c0f":74,"\u662f\u8be5\u5c42\u7684\u6807\u8bc6\u7b26":74,"\u662f\u8be5\u5c42\u7684\u7c7b\u540d":74,"\u662f\u8be5\u7f51\u7edc\u5c42\u7684":74,"\u662f\u8f93\u5165":114,"\u662f\u8fd9\u4e00\u7c7b\u7684":82,"\u662f\u8fdb\u884c\u8ba1\u7b97\u7684\u57fa\u672c\u5355\u4f4d":89,"\u662f\u9700\u8981\u4e86\u89e3\u54ea\u4e9b\u6b65\u9aa4\u62d6\u6162\u4e86\u6574\u4f53":108,"\u662fc":46,"\u662fdecoder\u7684\u6570\u636e\u8f93\u5165":113,"\u662fgoogle\u5f00\u6e90\u7684\u5bb9\u5668\u96c6\u7fa4\u7684\u8c03\u5ea6\u6846\u67b6":94,"\u662fnvidia\u6027\u80fd\u5206\u6790\u5de5\u5177":108,"\u662fpaddlepaddle\u4e2d\u5355\u5c42\u5e8f\u5217\u548c\u53cc\u5c42\u5e8f\u5217\u5b58\u50a8\u793a\u610f\u56fe":89,"\u662fpaddlepaddle\u652f\u6301\u7684\u4e00\u79cd\u4efb\u610f\u590d\u6742\u7684rnn\u5355\u5143":113,"\u662fpython\u5c01\u88c5\u7684\u7c7b\u540d":74,"\u662frnn\u72b6\u6001":114,"\u663e\u5f97\u76f8\u5bf9\u6765\u8bf4\u8f83\u4e3a\u8017\u65f6":41,"\u663e\u7136":107,"\u6682\u4e0d\u8003\u8651\u5728\u5185":81,"\u6682\u65e0":3,"\u6682\u65f6\u4e0d\u652f\u6301python3":3,"\u6682\u65f6\u4e0d\u8003\u8651\u591a\u4e2aparamet":10,"\u66b4\u9732\u8fd9\u4e2a\u6982\u5ff5\u5fc5\u8981\u51fd\u6570":46,"\u66f4\u522b\u63d0\u7b80\u5316\u95ee\u9898\u590d\u73b0\u5e26\u6765\u7684\u597d\u5904\u4e86":0,"\u66f4\u591a\u5173\u4e8edocker\u7684\u5b89\u88c5\u4e0e\u4f7f\u7528":78,"\u66f4\u591a\u7684\u8f6c\u6362\u65b9\u6cd5\u8bf7\u53c2\u8003eigen":76,"\u66f4\u597d\u5730\u5b8c\u6210\u4e00\u4e9b\u590d\u6742\u7684\u8bed\u8a00\u7406\u89e3\u4efb\u52a1":113,"\u66f4\u5feb":114,"\u66f4\u65b0":78,"\u66f4\u65b0\u53ef\u80fd\u5bfc\u81f4\u9700\u8981\u65b0\u7684\u5f00\u53d1\u5de5\u5177":0,"\u66f4\u65b0\u6a21\u5f0f":81,"\u66f4\u65b0\u7684\u6587\u6863\u4ee5pr\u7684\u5f62\u5f0f\u63d0\u4ea4\u5230github\u4e2d":77,"\u66f4\u65b0\u7f51\u7edc\u53c2\u6570\u65f6\u5e94\u7528":81,"\u66f4\u65b9\u4fbf\u7684\u8bbe\u7f6e\u65b9\u5f0f":83,"\u66f4\u8be6\u7ec6\u7684\u5b89\u88c5\u548c\u7f16\u8bd1\u65b9\u6cd5\u53c2\u8003":86,"\u66f4\u8fdb\u4e00\u6b65":113,"\u66f4\u9ad8":114,"\u66ff\u6211\u4eec\u5b8c\u6210\u4e86\u539f\u59cb\u8f93\u5165\u6570\u636e\u7684\u62c6\u5206":113,"\u6700\u4e3b\u8981\u7684\u5de5\u4f5c\u5c31\u662f\u89e3\u6790\u51fa":97,"\u6700\u540e":[1,72,74,91],"\u6700\u540e\u4e00\u4e2a":110,"\u6700\u540e\u4e00\u5c42cost\u4e2d\u8bb0\u5f55\u4e86\u795e\u7ecf\u7f51\u7edc\u7684\u6240\u6709\u62d3\u6251\u7ed3\u6784":84,"\u6700\u540e\u518d\u8c03\u7528mutabl":76,"\u6700\u540e\u5220\u9664":63,"\u6700\u540e\u6211\u4eec\u4f7f\u7528\u94fe\u5f0f\u6cd5\u5219\u8ba1\u7b97":74,"\u6700\u540e\u7684\u6267\u884c\u811a\u672c\u7684\u547d\u4ee4":0,"\u6700\u540e\u7ed9\u51fa\u7ec6\u8282\u63cf\u8ff0":104,"\u6700\u540e\u8ba1\u7b97softmax":82,"\u6700\u5c0f\u5316\u751f\u6210\u7684\u5e93\u7684\u5927\u5c0f":116,"\u6700\u5c0f\u7684ios\u90e8\u7f72\u7248\u672c":117,"\u6700\u5c11\u663e\u793a\u591a\u5c11\u4e2a\u8282\u70b9":103,"\u6700\u5e38\u89c1\u7684\u9519\u8bef\u5904\u7406\u65b9\u5f0f\u662fexcept":45,"\u6700\u65b0\u7684\u4ee3\u7801":72,"\u6700\u65b0\u7684paddlepaddl":[1,78],"\u6700\u7ec8":74,"\u6700\u7ec8\u5b9e\u73b0\u4e00\u4e2a\u5c42\u6b21\u5316\u7684\u590d\u6742rnn":113,"\u6700\u7ec8\u6211\u4eec\u53ef\u4ee5\u8c03\u7528trainer\u7684train\u65b9\u6cd5\u542f\u52a8\u8bad\u7ec3":84,"\u6700\u7ec8\u7684\u8f93\u51fa\u7ed3\u679c":113,"\u6709\u4e00\u4e9b\u5fc5\u987b\u914d\u7f6e\u7684\u53c2\u6570":[116,117,118],"\u6709\u4e24\u79cd\u65b9\u6cd5":0,"\u6709\u4e9b\u5c42\u53ef\u80fd\u9700\u8981\u9ad8\u7cbe\u5ea6\u6765\u4fdd\u8bc1\u68af\u5ea6\u68c0\u67e5\u5355\u6d4b\u6b63\u786e\u6267\u884c":74,"\u6709\u4e9b\u5c42\u6216\u8005\u6fc0\u6d3b\u9700\u8981\u505a\u5f52\u4e00\u5316\u4ee5\u4fdd\u8bc1\u5b83\u4eec\u7684\u8f93\u51fa\u7684\u548c\u662f\u4e00\u4e2a\u5e38\u6570":74,"\u6709\u4e9b\u7279\u5f81\u7684\u53d6\u503c\u8fbe\u5230\u6570\u767e\u4e07":81,"\u6709\u4eba\u7528\u865a\u62df\u673a\u6765\u7c7b\u6bd4":0,"\u6709\u4ee5\u4e0b\u5efa\u8bae":[116,117],"\u6709\u5173":111,"\u6709\u5173\u53c2\u6570\u914d\u7f6e\u7684\u8be6\u7ec6\u8bf4\u660e\u89c1":116,"\u6709\u5173\u7ebf\u6027\u56de\u5f52\u7684\u5b9e\u9645\u5e94\u7528":84,"\u6709\u52a9\u4e8e\u5728\u8bad\u7ec3\u65f6\u89c2\u5bdf\u5177\u4f53\u6570\u503c":81,"\u6709\u52a9\u4e8e\u8bca\u65ad\u5206\u5e03\u5f0f\u9519\u8bef":93,"\u6709\u591a\u96be":0,"\u6709\u6548\u63d0\u5347paddlepaddle\u5728\u82f1\u7279\u5c14\u67b6\u6784\u4e0a\u7684\u6027\u80fd":[41,42],"\u6709\u6807\u51c6\u7684":45,"\u6709\u7684\u65f6\u5019":45,"\u6709\u7684\u65f6\u5019\u7b80\u7b80\u5355\u5355\u7684\u6539\u53d8\u5c31\u80fd\u5728\u6027\u80fd\u4e0a\u4ea7\u751f\u660e\u663e\u7684\u4f18\u5316\u6548\u679c":108,"\u6709\u7684\u8bdd\u9700\u8981\u5148\u5378\u8f7d":78,"\u6709\u975e\u5e38\u5927\u7684\u5dee\u522b":107,"\u670d\u52a1\u5668\u4e4b\u95f4\u53ef\u4ee5\u901a\u8fc7\u5c40\u57df\u7f51":101,"\u672a\u6307\u5b9a\u6309\u7167double\u7cbe\u5ea6\u7f16\u8bd1":83,"\u672a\u8bbe\u7f6e":117,"\u672c\u4f8b\u4e2d\u7684\u539f\u59cb\u6570\u636e\u4e00\u5171\u670910\u4e2a\u6837\u672c":111,"\u672c\u5217\u8868\u8bf4\u660epaddlepaddle\u53d1\u7248\u4e4b\u524d\u9700\u8981\u6d4b\u8bd5\u7684\u529f\u80fd\u70b9":63,"\u672c\u5730":[3,78],"\u672c\u5730\u6d4b\u8bd5":102,"\u672c\u5730\u8bad\u7ec3":[89,102],"\u672c\u5730\u8bad\u7ec3\u4e0e\u9884\u6d4b":80,"\u672c\u5730\u8bad\u7ec3\u7684\u5b9e\u9a8c":105,"\u672c\u6559\u7a0b\u4e3b\u8981\u4ecb\u7ecd\u5e26kernel\u7684op\u5982\u4f55\u5199":75,"\u672c\u6559\u7a0b\u5c06\u6307\u5bfc\u4f60\u5982\u4f55\u5728":114,"\u672c\u6587\u4e2d\u6240\u6709\u7684\u4f8b\u5b50":111,"\u672c\u6587\u4e2d\u7684\u4f8b\u5b50\u91cc":0,"\u672c\u6587\u4e2d\u793a\u4f8b\u6240\u4f7f\u7528\u7684\u5355\u5143\u6d4b\u8bd5\u6587\u4ef6\u662f":111,"\u672c\u6587\u4ee5paddlepaddle\u7684\u53cc\u5c42rnn\u5355\u5143\u6d4b\u8bd5\u4e3a\u793a\u4f8b":111,"\u672c\u6587\u5c06\u4ecb\u7ecd\u5728kubernetes\u5bb9\u5668\u7ba1\u7406\u5e73\u53f0\u4e0a\u5feb\u901f\u6784\u5efapaddlepaddle\u5bb9\u5668\u96c6\u7fa4":97,"\u672c\u6587\u6863\u5bf9\u5173\u4e8epaddlepaddle\u7684\u4e00\u4e9b\u5e38\u89c1\u95ee\u9898\u63d0\u4f9b\u4e86\u89e3\u7b54":80,"\u672c\u6587\u6863\u5c06\u4ee5linux":116,"\u672c\u6587\u6863\u63cf\u8ff0paddl":46,"\u672c\u6587\u7684\u5c06\u4ecb\u7ecd\u5728macos\u4e0a":117,"\u672c\u6b21\u8bad\u7ec3\u6587\u4ef6\u6240\u5728\u76ee\u5f55":97,"\u672c\u6b21\u8bad\u7ec3\u7684yaml\u6587\u4ef6\u53ef\u4ee5\u5199\u6210":97,"\u672c\u6b21\u8bad\u7ec3\u8981\u6c42\u67093\u4e2apaddlepaddle\u8282\u70b9":97,"\u672c\u793a\u4f8b\u4e2d\u4f7f\u7528\u7684\u539f\u59cb\u6570\u636e\u5982\u4e0b":111,"\u672c\u793a\u4f8b\u610f\u56fe\u4f7f\u7528\u5355\u5c42rnn\u548c\u53cc\u5c42rnn\u5b9e\u73b0\u4e24\u4e2a\u5b8c\u5168\u7b49\u4ef7\u7684\u5168\u8fde\u63a5rnn":111,"\u672c\u8282\u5c06\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528paddlepaddle\u5728\u4e0d\u540c\u7684\u96c6\u7fa4\u6846\u67b6\u4e0b\u5b8c\u6210\u5206\u5e03\u5f0f\u8bad\u7ec3":92,"\u673a\u5668\u4e0a\u4ee5\u53ca":118,"\u673a\u5668\u7684\u8bbe\u5907":105,"\u673a\u5668\u7ffb\u8bd1":63,"\u6743\u91cd\u66f4\u65b0\u7684\u68af\u5ea6":103,"\u6765\u4e3a\u4e00\u4e2a":89,"\u6765\u4ee3\u66ff":72,"\u6765\u4f20\u8f93\u7f51\u7edc\u914d\u7f6e\u6587\u4ef6\u4e2d\u5b9a\u4e49\u7684\u7f51\u7edc\u7ed3\u6784\u548c\u76f8\u5173\u53c2\u6570":90,"\u6765\u4f7f\u7528dropout":82,"\u6765\u4f7f\u7528dropout\u7684":82,"\u6765\u4fdd\u8bc1\u8bad\u7ec3\u8fc7\u7a0b\u53ef\u4ee5\u4ece\u4e2d\u95f4\u72b6\u6001\u91cd\u65b0\u542f\u52a8":10,"\u6765\u505a\u68af\u5ea6\u68c0\u67e5":74,"\u6765\u51b3\u5b9a\u662f\u5426\u5f00\u542fmkl":41,"\u6765\u5206\u6790\u6267\u884c\u6587\u4ef6":108,"\u6765\u521d\u59cb\u5316\u53c2\u6570":83,"\u6765\u542f\u52a8\u548c":0,"\u6765\u5b58\u50a8":89,"\u6765\u5b58\u50a8\u6570\u636e":[89,90],"\u6765\u5b8c\u6210\u524d\u5411\u548c\u53cd\u5411\u8ba1\u7b97":90,"\u6765\u5b8c\u6210\u7f51\u7edc\u7684\u8bad\u7ec3":84,"\u6765\u5b9a\u4e49\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":114,"\u6765\u5b9e\u73b0":42,"\u6765\u5b9e\u9645\u5b58\u50a8\u6570\u636e":[89,90],"\u6765\u5bf9\u6bd4\u5206\u6790\u4e24\u8005\u8bed\u4e49\u76f8\u540c\u7684\u539f\u56e0":111,"\u6765\u5f71\u54cdpaddlepaddle\u7684\u7f16\u8bd1\u8fc7\u7a0b":[116,117],"\u6765\u5f97\u5230\u67d0\u4e2a\u7279\u5b9a\u53c2\u6570\u7684\u68af\u5ea6\u77e9\u9635":74,"\u6765\u63cf\u8ff0\u7684":76,"\u6765\u63cf\u8ff0\u8be5op\u7684\u8f93\u5165":75,"\u6765\u63cf\u8ff0\u8f93\u5165":89,"\u6765\u642d\u5efa\u795e\u7ecf\u7f51\u7edc":84,"\u6765\u663e\u793a\u6027\u80fd\u5206\u6790\u7ed3\u679c":107,"\u6765\u67e5\u770b\u6027\u80fd\u5206\u6790\u7ed3\u679c":107,"\u6765\u6ce8\u518c\u8be5\u5c42":74,"\u6765\u6df7\u5408\u4f7f\u7528gpu\u548ccpu\u8ba1\u7b97\u7f51\u7edc\u5c42\u7684\u53c2\u6570":105,"\u6765\u6e05\u7406\u8fd9\u4e9b\u5185\u5bb9":0,"\u6765\u7279\u6307":89,"\u6765\u7279\u6307\u8c03\u7528paddlepaddl":90,"\u6765\u7279\u6307paddlepaddl":90,"\u6765\u7279\u6307paddlepaddle\u4e2d\u7684\u4e00\u7ef4\u6574\u578b\u6570\u7ec4":89,"\u6765\u7279\u6307paddlepaddle\u4e2d\u7684\u4e8c\u7ef4\u6d6e\u70b9\u578b\u77e9\u9635":89,"\u6765\u7279\u6307paddlepaddle\u4e2d\u795e\u7ecf\u7f51\u7edc\u8ba1\u7b97\u5c42\u4e00\u4e2a\u8f93\u5165":89,"\u6765\u786e\u4fdd\u628a":45,"\u6765\u786e\u5b9a\u7a00\u758f\u77e9\u9635\u7684\u5185\u5bb9":89,"\u6765\u83b7\u5f97\u8f93\u51fa\u7684\u68af\u5ea6":74,"\u6765\u8868\u793a":114,"\u6765\u8868\u793a\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":114,"\u6765\u8868\u793apaddle\u5185\u90e8\u7c7b":45,"\u6765\u89e3\u51b3\u4e0a\u9762\u7684\u95ee\u9898":81,"\u6765\u8ba1\u7b97\u68af\u5ea6":74,"\u6765\u8bb0\u5f55\u8f93\u5165":89,"\u6765\u8bb2\u89e3\u5982\u4f55\u4f7f\u7528\u53cc\u5c42rnn":111,"\u6765\u8bbe\u7f6e":83,"\u6765\u8bbf\u95ee\u7528\u6237\u81ea\u5df1\u7684\u6570\u636e":11,"\u6765\u8bfb\u53d6\u4e00\u4e2a":89,"\u6765\u8c03\u6574":72,"\u6765\u8c03\u7528":0,"\u6765\u8fd0\u884c\u6027\u80fd\u5206\u6790\u548c\u8c03\u4f18":108,"\u6765\u8fd0\u884c\u955c\u50cf":1,"\u6765\u8fdb\u884c\u8ba8\u8bba":46,"\u6765\u9884\u6d4b\u8fd9\u4e2a\u4e2d\u95f4\u7684\u8bcd":81,"\u6784\u5efa":116,"\u6784\u5efa\u597d\u5f00\u53d1\u955c\u50cf\u540e":116,"\u6784\u5efa\u7684\u955c\u50cf":0,"\u6784\u5efa\u76ee\u6807\u4e3a":117,"\u6784\u6210\u4e00\u4e2a\u5e8f\u5217":89,"\u6784\u6210\u4e86\u8f93\u51fa\u53cc\u5c42\u5e8f\u5217\u7684\u7b2ci\u4e2a":110,"\u6784\u9020":97,"\u6784\u9020\u51fd\u6570\u542b\u67092\u4e2a\u53c2\u6570":75,"\u6784\u9020\u51fd\u6570\u91cc\u901a\u8fc7":75,"\u67b6\u6784\u7684\u6a21\u62df\u5668\u5e73\u53f0":117,"\u67b6\u6784\u7684iphone\u6216\u8005ipad\u7b49\u7269\u7406\u8bbe\u5907":117,"\u67d0\u4e00\u4e2a\u795e\u7ecf\u5143\u7684\u4e00\u4e2a\u8f93\u5165\u4e3a\u4e0a\u4e00\u4e2a\u65f6\u95f4\u6b65\u7f51\u7edc\u4e2d\u67d0\u4e00\u4e2a\u795e\u7ecf\u5143\u7684\u8f93\u51fa":111,"\u67d0\u4e9b\u53c2\u6570\u53ea\u53ef\u7528\u4e8e\u7279\u5b9a\u7684\u5c42\u4e2d":102,"\u67e5\u627e\u7b54\u6848\u6216\u76f4\u63a5\u63d0":80,"\u67e5\u770b":72,"\u67e5\u770b\u5305\u7684\u5927\u5c0f":78,"\u67e5\u770b\u5f53\u524d\u72b6\u6001":72,"\u67e5\u770b\u5f53\u524d\u8fdc\u7a0b\u4ed3\u5e93\u7684\u540d\u5b57":72,"\u67e5\u770b\u6587\u4ef6\u5177\u4f53\u88ab\u4fee\u6539\u7684\u5185\u5bb9":72,"\u67e5\u770b\u662f\u5426\u662f\u5176\u4ed6\u9519\u8bef\u5f15\u53d1\u7684\u62a5\u9519":79,"\u67e5\u770bjob\u7684\u8be6\u7ec6\u60c5\u51b5":96,"\u67e5\u770blatest":63,"\u6807\u51c6":3,"\u6807\u51c6\u5dee\u4e3a":83,"\u6807\u51c6\u8868\u793apaddlepaddle\u7248\u672c\u53f7":63,"\u6807\u8bc6\u4e86\u4e00\u4e2a\u8f93\u51fa\u7684\u6587\u4ef6\u540d":107,"\u6807\u8bc6\u6027\u80fd\u5206\u6790\u7684\u7ed3\u679c\u6587\u4ef6":107,"\u6807\u8bc6\u662f\u5426\u4e3a\u8fde\u7eed\u7684batch\u8ba1\u7b97":103,"\u6807\u8bc6\u88ab\u6027\u80fd\u5206\u6790\u7684\u6e90\u6587\u4ef6":107,"\u6807\u8bc6http\u670d\u52a1\u7684\u7aef\u53e3":107,"\u6807\u8bc6http\u670d\u52a1\u7ed1\u5b9a\u7684ip":107,"\u6838\u4e00\u6837\u591a\u7684\u8fdb\u7a0b\u6765\u5e76\u884c\u7f16\u8bd1":0,"\u6838\u5fc3\u4ee3\u7801\u7f16\u8bd1\u6210\u94fe\u63a5\u5e93":87,"\u6839\u636e\u4e2a\u4eba\u7684\u9700\u6c42\u4fee\u6539\u5b9a\u5236docker\u5bb9\u5668\u6240\u6267\u884c\u7684\u811a\u672c":116,"\u6839\u636e\u4f60\u7684\u4efb\u52a1":105,"\u6839\u636e\u524d\u6587\u7684\u63cf\u8ff0":97,"\u6839\u636e\u7f51\u7edc\u914d\u7f6e\u4e2d\u7684":103,"\u6839\u636e\u8f93\u5165tensor\u7684\u5927\u5c0f\u6765\u8bbe\u7f6e\u8f93\u51fatensor\u7684\u5927\u5c0f":76,"\u6839\u636e\u8fd9\u4e9b\u53c2\u6570\u7684\u4f7f\u7528\u573a\u5408":102,"\u6839\u636e\u9ed8\u8ba4\u503c\u9012\u589e":103,"\u6839\u636e\u9ed8\u8ba4\u7aef\u53e3\u53f7\u9012\u589e":103,"\u6839\u636ejob\u5bf9\u5e94\u7684pod\u4fe1\u606f":96,"\u6839\u636eport":91,"\u683c\u5f0f":103,"\u683c\u5f0f\u4e0d\u5339\u914d\u65f6":42,"\u683c\u5f0f\u5b58\u50a8":89,"\u683c\u5f0f\u7684\u6587\u4ef6\u6765\u5b58\u653e":75,"\u6846\u67b6\u63d0\u4f9b\u7684blas\u51fd\u6570\u8fdb\u884c\u77e9\u9635\u8ba1\u7b97":117,"\u6846\u67b6\u8fdb\u884cblas\u77e9\u9635\u8ba1\u7b97":117,"\u68af\u5ea6\u4f1a\u5c31\u5730":74,"\u68af\u5ea6\u4f1a\u6709\u566a\u58f0":92,"\u68af\u5ea6\u53c2\u6570\u7684\u5206\u5757\u6570\u76ee":103,"\u68af\u5ea6\u5c31\u53ef\u4ee5\u901a\u8fc7\u8fd9\u4e2a\u65b9\u7a0b\u8ba1\u7b97\u5f97\u5230":74,"\u68af\u5ea6\u670d\u52a1\u5668\u7684\u6570\u91cf":103,"\u68af\u5ea6\u68c0\u67e5\u5355\u5143\u6d4b\u8bd5\u901a\u8fc7\u6709\u9650\u5dee\u5206\u6cd5\u6765\u9a8c\u8bc1\u4e00\u4e2a\u5c42\u7684\u68af\u5ea6":74,"\u68af\u5ea6\u68c0\u67e5\u7684\u8f93\u5165\u6570\u636e\u7684\u6279\u6b21\u5927\u5c0f":74,"\u68c0\u67e5\u70b9\u4fdd\u5b58\u7a0b\u5e8f\u6d41\u7a0b":10,"\u68c0\u67e5\u8f93\u5165\u6570\u636e\u7ef4\u5ea6":75,"\u6982\u5ff5\u4e0a":90,"\u6982\u5ff5\u4e0a\u53ef\u4ee5\u5c06":89,"\u6a21\u5757\u4e0b\u7684\u76f8\u5173":76,"\u6a21\u578b\u4e00\u76f4\u4e0d\u6536\u655b":81,"\u6a21\u578b\u4e2d\u6240\u6709\u53ef\u5b66\u4e60\u53c2\u6570\u4f1a\u88ab\u5b58\u4e3a\u4e00\u4e2a\u538b\u7f29\u6587\u4ef6":90,"\u6a21\u578b\u53c2\u6570\u68c0\u67e5\u70b9\u901a\u8fc7\u5b9a\u671f\u5411\u78c1\u76d8\u4e0a\u4fdd\u5b58\u4e00\u4efd\u5b58\u50a8\u5728paramet":10,"\u6a21\u578b\u6570\u636e\u68c0\u67e5\u70b9\u7684\u5b9e\u73b0":10,"\u6a21\u578b\u6587\u4ef6\u5c06\u88ab\u5199\u5165\u8282\u70b9":93,"\u6a21\u578b\u6765\u6307\u5bfc\u4f60\u5b8c\u6210\u8fd9\u4e9b\u6b65\u9aa4":114,"\u6a21\u578b\u6f14\u793a\u5982\u4f55\u914d\u7f6e\u590d\u6742\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u6a21\u578b":114,"\u6a21\u578b\u7684\u4ee3\u7801\u53ef\u4ee5\u5728":114,"\u6a21\u578b\u7684\u7f16\u7801\u5668\u90e8\u5206\u5982\u4e0b\u6240\u793a":114,"\u6a21\u578b\u7ed3\u6784":104,"\u6a21\u578b\u8bad\u7ec3\u7b49\u4efb\u52a1":84,"\u6a21\u578b\u914d\u7f6e":80,"\u6a21\u578b\u914d\u7f6e\u89e3\u6790":45,"\u6a21\u578b\u9884\u6d4bsdk\u9700\u8981\u5355\u72ec\u8bbe\u8ba1":88,"\u6a21\u5f0f\u4e0b\u7684\u6027\u80fd\u6d4b\u8bd5\u662f\u6ca1\u6709\u610f\u4e49\u7684":107,"\u6a2a\u5411\u62fc\u63a5":81,"\u6b21\u8fed\u4ee3\u6267\u884c\u7684\u8f6c\u6362\u6b21\u6570\u4e3a":41,"\u6b22\u8fce\u5411paddlepaddle\u793e\u533a\u53cd\u9988\u95ee\u9898":2,"\u6b22\u8fce\u901a\u8fc7":72,"\u6b63\u5728\u7b49\u5f85\u672a\u5b8c\u6210\u7684\u4efb\u52a1":78,"\u6b63\u5e38\u60c5\u51b5\u4e0b\u662f75m":78,"\u6b63\u786e\u7684\u89e3\u51b3\u65b9\u6cd5\u662f":78,"\u6b63\u8d1f\u5bf9\u9a8c\u8bc1":102,"\u6b64\u547d\u4ee4\u5c06\u5728":116,"\u6b64\u5904\u90fd\u4e3a2":111,"\u6b64\u5916":[0,72,82,116],"\u6b64\u6559\u7a0b\u4f1a\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528python\u7684cprofile\u5305":107,"\u6b64\u6559\u7a0b\u5c06\u5411\u60a8\u5206\u6b65\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528\u5185\u7f6e\u7684\u5b9a\u65f6\u5de5\u5177":108,"\u6b64\u65b9\u6cd5\u4e0d\u80fd\u83b7\u53d6":81,"\u6b64\u65f6\u53ef\u4ee5\u5728\u8c03\u7528infer\u63a5\u53e3\u65f6\u901a\u8fc7\u8bbe\u7f6e":81,"\u6b64\u65f6\u53ef\u4ee5\u8df3\u8fc7paddlepaddle\u6a21\u578b\u53c2\u6570\u6587\u4ef6\u7684\u5934\u4fe1\u606f":83,"\u6b64\u65f6\u6bcf\u4e2a\u5c0f\u5206\u652f\u7684":42,"\u6b64\u65f6master\u5c06\u8d1f\u8d23\u542f\u52a8\u4e00\u4e2a\u65b0\u7684train":10,"\u6b64\u76ee\u5f55":90,"\u6b64\u793a\u4f8b":90,"\u6b64\u7c7b\u62a5\u9519\u901a\u5e38\u662f\u7531\u4e8e\u67d0\u4e00\u4e2a\u8282\u70b9\u7684\u9519\u8bef\u5bfc\u81f4\u8fd9\u4e2a\u8282\u70b9\u7684\u8bad\u7ec3\u8fdb\u7a0b\u9000\u51fa":79,"\u6b65\u9aa4":81,"\u6bb5\u843d\u53ef\u4ee5\u770b\u4f5c\u662f\u4e00\u4e2a\u5d4c\u5957\u7684\u53cc\u5c42\u7684\u5e8f\u5217":113,"\u6bb5\u843d\u662f\u7531\u53e5\u5b50\u6784\u6210\u7684\u5e8f\u5217":89,"\u6bcf\u4e00\u4e2a":[63,90],"\u6bcf\u4e00\u4e2a\u5916\u5c42\u5e8f\u5217\u53c8\u542b\u6709\u82e5\u5e72\u4e2a\u5185\u5c42\u5e8f\u5217":89,"\u6bcf\u4e00\u4e2a\u5e8f\u5217\u5728\u6574\u4e2a":89,"\u6bcf\u4e00\u4e2a\u6587\u4ef6\u662f\u6570\u636e\u96c6\u7684\u4e00\u4e2ashard":11,"\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65":111,"\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u4e4b\u95f4\u7684\u795e\u7ecf\u7f51\u7edc\u5177\u6709\u4e00\u5b9a\u7684\u76f8\u5173\u6027":111,"\u6bcf\u4e00\u4e2a\u8282\u70b9\u90fd\u6709\u76f8\u540c\u7684\u65e5\u5fd7\u7ed3\u6784":93,"\u6bcf\u4e00\u4e2a\u8f93\u5165":[89,90],"\u6bcf\u4e00\u4e2alayer\u8f93\u51fa\u77e9\u9635\u7684\u9ad8\u5ea6":81,"\u6bcf\u4e00\u5217\u7684\u542b\u4e49\u662f":107,"\u6bcf\u4e00\u7ec4\u5185\u7684\u6240\u6709\u53e5\u5b50\u548clabel":111,"\u6bcf\u4e00\u884c\u5143\u7d20\u5728":89,"\u6bcf\u4e2a":42,"\u6bcf\u4e2a\u503c\u7684\u7c7b\u578b\u53ef\u4ee5\u662f\u6574\u5f62":11,"\u6bcf\u4e2a\u5143\u7d20\u662f\u4e00\u4e2a0\u5c42\u5e8f\u5217":110,"\u6bcf\u4e2a\u5143\u7d20\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"\u6bcf\u4e2a\u5355\u5c42rnn":113,"\u6bcf\u4e2a\u53c2\u6570\u670d\u52a1\u5668\u53ea\u4fdd\u5b58\u6574\u4e2a\u795e\u7ecf\u7f51\u7edc\u6240\u6709\u53c2\u6570\u7684\u4e00\u90e8\u5206":92,"\u6bcf\u4e2a\u53e5\u5b50\u53c8\u662f\u5355\u8bcd\u7684\u6570\u7ec4":111,"\u6bcf\u4e2a\u53e5\u5b50\u90fd\u4ee5\u5f00\u59cb\u6807\u8bb0\u5f00\u5934":114,"\u6bcf\u4e2a\u53e5\u5b50\u90fd\u4ee5\u7ed3\u675f\u6807\u8bb0\u7ed3\u5c3e":114,"\u6bcf\u4e2a\u5b50\u5e8f\u5217\u957f\u5ea6\u53ef\u4ee5\u4e0d\u4e00\u81f4":111,"\u6bcf\u4e2a\u5c42\u5728\u5176":74,"\u6bcf\u4e2a\u6279\u6b21\u6570\u636e":103,"\u6bcf\u4e2a\u65f6\u95f4\u6b65\u4e4b\u5185\u7684\u8fd0\u7b97\u662f\u72ec\u7acb\u7684":113,"\u6bcf\u4e2a\u65f6\u95f4\u6b65\u90fd\u7528\u4e86\u4e0a\u4e00\u4e2a\u65f6\u95f4\u6b65\u7684\u8f93\u51fa\u7ed3\u679c":111,"\u6bcf\u4e2a\u6743\u91cd\u5bf9\u5e94\u4e00\u4e2a\u8f93\u5165":74,"\u6bcf\u4e2a\u6837\u672c\u7531\u4e24\u90e8\u5206\u7ec4\u6210":111,"\u6bcf\u4e2a\u6837\u672c\u95f4\u7528\u7a7a\u884c\u5206\u5f00":111,"\u6bcf\u4e2a\u6d4b\u8bd5\u4f1a\u5bf9\u6bd4paddlepaddle\u4e2dcpu\u7b97\u51fa\u7684\u7ed3\u679c\u4e0emkl":42,"\u6bcf\u4e2a\u72b6\u6001":113,"\u6bcf\u4e2a\u7ebf\u7a0b":103,"\u6bcf\u4e2a\u7ebf\u7a0b\u5206\u914d\u5230128\u4e2a\u6837\u672c\u7528\u4e8e\u8bad\u7ec3":103,"\u6bcf\u4e2a\u8bad\u7ec3\u8282\u70b9\u5fc5\u987b\u6307\u5b9a\u4e00\u4e2a\u552f\u4e00\u7684id\u53f7":103,"\u6bcf\u4e2a\u8f93\u5165\u90fd\u662f\u4e00\u4e2a":74,"\u6bcf\u4e2a\u8f93\u51fa\u8282\u70b9\u90fd\u8fde\u63a5\u5230\u6240\u6709\u7684\u8f93\u5165\u8282\u70b9\u4e0a":74,"\u6bcf\u4e2a\u90e8\u5206\u5206\u522b\u7ed9\u6bcf\u4e2atrainer\u4f7f\u7528":92,"\u6bcf\u4e2acommit\u53ea\u505a\u4e86\u5c11\u91cf\u7684\u4fee\u6539":72,"\u6bcf\u4e2adata":11,"\u6bcf\u4e2amkldnnlayer\u90fd\u5305\u542b\u7528\u4e8e\u5185\u90e8\u5b58\u50a8\u548c\u5916\u90e8\u5b58\u50a8\u7684\u4e00\u7cfb\u5217mkldnnmatrix":42,"\u6bcf\u4e2aparamet":10,"\u6bcf\u4e2apod\u5305\u542b\u4e00\u4e2apaddlepaddle\u5bb9\u5668":97,"\u6bcf\u4e2ashard\u5206\u522b\u5b58\u50a8\u5728\u5176\u4e2d\u4e00\u53f0paramet":10,"\u6bcf\u4e2atrainer\u542f\u52a8\u540e\u8bfb\u53d6\u5207\u5206\u597d\u7684\u4e00\u90e8\u5206\u6570\u636e":92,"\u6bcf\u4e2atrainer\u7684\u552f\u4e00id":91,"\u6bcf\u4e2atrainer\u8fdb\u7a0b\u9700\u8981\u80fd\u591f\u8bfb\u53d6\u5c5e\u4e8e\u81ea\u5df1\u7684\u4e00\u4efd\u6570\u636e":91,"\u6bcf\u53f0\u670d\u52a1\u5668\u5177\u6709\u96c6\u7fa4\u4e2d\u552f\u4e00\u7684ip\u5730\u5740":101,"\u6bcf\u5c42\u4e0a\u53ea\u80fd\u4fdd\u5b58\u56fa\u5b9a\u6570\u76ee\u4e2a\u6700\u597d\u7684\u72b6\u6001":103,"\u6bcf\u5c42\u4f7f\u7528\u7684gpu\u53f7\u4f9d\u8d56\u4e8e\u53c2\u6570train":105,"\u6bcf\u6279\u6b21":103,"\u6bcf\u6b21\u63d0\u4ea4\u4ee3\u7801":72,"\u6bcf\u6b21\u63d0\u4ea4\u65f6":72,"\u6bcf\u6b21\u8c03\u7528\u65f6\u5bf9\u539f\u6570\u636e\u7684\u91cd\u590dpacking\u4fbf\u6210\u4e3a\u4e86\u5197\u4f59":41,"\u6bcf\u6b21\u8c03\u7528\u7684\u8017\u65f6\u4e5f\u5f88\u957f":107,"\u6bcf\u6b21\u8f93\u51fa\u4e00\u4e2adata":11,"\u6bcf\u884c\u8868\u793a\u4e00\u4e2a\u6279\u6b21\u4e2d\u7684\u5355\u4e2a\u8f93\u5165":74,"\u6bcf\u8f6e\u4f1a\u5c06\u6570\u636e\u96c6\u4e2d\u7684\u6240\u6709\u8bad\u7ec3\u6837\u672c\u4f7f\u7528\u4e00\u6b21":103,"\u6bcf\u8f6e\u7ed3\u675f\u65f6\u5bf9\u6240\u6709\u6d4b\u8bd5\u6570\u636e\u8fdb\u884c\u6d4b\u8bd5":103,"\u6bcf\u8f6e\u90fd\u4f1a\u4fdd\u5b58\u9884\u6d4b\u7ed3\u679c":103,"\u6bcf\u8fd0\u884c\u591a\u5c11\u4e2a\u6279\u6b21\u6267\u884c\u4e00\u6b21\u7a00\u758f\u53c2\u6570\u5206\u5e03\u7684\u68c0\u67e5":103,"\u6bcf\u969410\u5206\u949f":10,"\u6bcfdot":103,"\u6bcflog":103,"\u6bcfsave":103,"\u6bcftest":103,"\u6bd4\u5982":[0,1,11,42,72,79,81],"\u6bd4\u5982\u4e00\u53e5\u8bdd\u4e2d\u7684\u6bcf\u4e00\u4e2a\u5355\u8bcd":111,"\u6bd4\u5982\u53ef\u80fd\u4f1a\u7528openmp\u6539\u8fdbsgd\u7684\u66f4\u65b0\u6027\u80fd":42,"\u6bd4\u5982\u5728":1,"\u6bd4\u5982\u5982\u679c\u8981build\u4e00\u4e2a\u4e0d\u4f9d\u8d56gpu":72,"\u6bd4\u5982\u5c06":63,"\u6bd4\u5982\u5e0c\u671b\u6700\u5c0f\u5316\u751f\u6210\u5e93\u7684\u5927\u5c0f":117,"\u6bd4\u5982\u5e0c\u671b\u6700\u5c0f\u5316\u751f\u6210\u7684\u5e93\u7684\u5927\u5c0f":118,"\u6bd4\u5982\u6bcf\u969410\u5206\u949f\u6700\u65b0\u7684\u5feb\u7167":10,"\u6bd4\u5982\u6d41\u5f0f\u6570\u636e\u5904\u7406":11,"\u6bd4\u5982\u8bbe\u7f6e\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42\u7684\u53c2\u6570\u521d\u59cb\u5316\u65b9\u5f0f\u548cbias\u521d\u59cb\u5316\u65b9\u5f0f":83,"\u6bd4\u5982cento":3,"\u6bd4\u5982fpe":79,"\u6bd4\u5982ide\u914d\u7f6e\u91cc":72,"\u6bd4\u5982imagenet\u8fd9\u4e2a\u6570\u636e\u96c6\u53ef\u80fd\u88ab\u5206\u62101000\u4e2ashard":11,"\u6bd4\u5982pil\u5e93\u7b49":91,"\u6bd5\u7adf\u5355\u7ebf\u7a0b\u8c03\u8bd5\u66f4\u5bb9\u6613":107,"\u6ca1\u6709\u5fc5\u8981\u5728\u6bcf\u6b21\u524d\u5411\u4e2d\u6bcf\u4e2a\u65f6\u95f4\u6b65\u7684\u8ba1\u7b97\u65f6\u5bf9\u6743\u91cd\u8fdb\u884c\u91cd\u590d\u7684packing\u64cd\u4f5c":41,"\u6ca1\u6709\u627e\u5230\u548c\u5f53\u524d\u7cfb\u7edf\u5339\u914d\u7684paddlepaddle\u5b89\u88c5\u5305":[3,78],"\u6ca1\u6709\u8bbe\u7f6e":[116,118],"\u6ce8":[0,1,10,63,89],"\u6ce8\u518c":75,"\u6ce8\u518ccpu":75,"\u6ce8\u518clayer\u7684\u65f6\u5019\u4fdd\u8bc1":[41,42],"\u6ce8\u518cop":75,"\u6ce8\u518cop\u65f6\u7684\u7c7b\u578b\u540d":75,"\u6ce8\u610f":[0,42,74,77,81,84,90,91,97,107,114,116,117,118],"\u6ce8\u610f\u4e0a\u8ff0\u547d\u4ee4\u4e2d":97,"\u6ce8\u610f\u4e8b\u9879":89,"\u6ce8\u610f\u5230\u6211\u4eec\u5df2\u7ecf\u5047\u8bbe\u673a\u5668\u4e0a\u67094\u4e2agpu":105,"\u6ce8\u610fnode":97,"\u6ce8\u91ca":75,"\u6d41\u7a0b\u6765\u63d0\u4ea4\u4ee3\u7801":72,"\u6d4b\u8bd5":72,"\u6d4b\u8bd5\u5206\u4e3a\u6bcf\u4e2alayer":42,"\u6d4b\u8bd5\u65f6\u6307\u5b9a\u7684\u5b58\u50a8\u6a21\u578b\u5217\u8868\u7684\u6587\u4ef6":103,"\u6d4b\u8bd5\u662f":72,"\u6d4b\u8bd5\u672c\u6b21release\u7684\u6b63\u786e\u6027":63,"\u6d4b\u8bd5\u7684\u6027\u80fd\u5bf9\u6bd4\u7ed3\u679c\u4f1a\u5728":42,"\u6d4b\u8bd5\u7684\u6a21\u578b\u5305\u62ec\u4ece\u7b2cm\u8f6e\u5230\u7b2cn":105,"\u6d4b\u8bd5model_list":102,"\u6d4b\u8bd5oper":75,"\u6d4b\u8bd5save_dir":102,"\u6d6e\u70b9\u578b\u6570\u636e":11,"\u6d6e\u70b9\u578b\u7a00\u758f\u77e9\u9635":89,"\u6d6e\u70b9\u578b\u7a20\u5bc6\u77e9\u9635":89,"\u6d6e\u70b9\u5f02\u5e38\u901a\u5e38\u7684\u539f\u56e0\u662f\u6d6e\u70b9\u6570\u6ea2\u51fa":81,"\u6d6e\u70b9\u6570":89,"\u6d6e\u70b9\u6570\u5411\u91cf\u7b49":89,"\u6d6e\u70b9\u7a00\u758f\u6570\u636e":74,"\u6df1\u5165paddlepaddl":42,"\u6df1\u5ea6\u5b66\u4e60\u7b97\u6cd5\u7684\u5b9e\u73b0\u6709\u7740\u591a\u6837\u5316\u7684\u7279\u70b9":104,"\u6df7\u5408\u4ee3\u7801\u7684\u6027\u80fd\u5206\u6790\u6765\u8fdb\u884c\u8c03\u4f18":107,"\u6df7\u5408\u4ee3\u7801\u7684\u6027\u80fd\u74f6\u9888\u4e5f\u662f\u8981\u770b":107,"\u6df7\u5408\u5f53\u524d\u8bcd\u5411\u91cf\u548cattention\u52a0\u6743\u7f16\u7801\u5411\u91cf":114,"\u6dfb\u52a0":41,"\u6dfb\u52a0\u4e86\u4e00\u4e2a\u8f93\u51fa":75,"\u6dfb\u52a0\u542f\u52a8\u811a\u672c":97,"\u6dfb\u52a0\u5e8f\u5217\u4fe1\u606f":89,"\u6dfb\u52a0\u7684\u76f8\u5173\u6587\u4ef6\u548c\u76ee\u5f55\u7ed3\u6784\u5982\u4e0b":[41,42],"\u6dfb\u52a0\u8f93\u5165\u53c2\u6570":75,"\u6dfb\u52a0\u8f93\u51fa\u53c2\u6570":75,"\u6dfb\u52a0op\u7684\u6ce8\u91ca":75,"\u6e05\u7406\u548c\u7ed3\u675f":90,"\u6e05\u7406\u6389\u8001\u65e7\u7684paddlepaddle\u5b89\u88c5\u5305":78,"\u6e90\u4ee3\u7801\u683c\u5f0f":72,"\u6e90\u5e8f\u5217":114,"\u6e90\u7801\u4e2d\u6784\u5efa\u7528\u4e8e\u7f16\u8bd1paddlepaddle\u7684docker\u955c\u50cf":0,"\u6e90\u7801\u6811\u6839\u76ee\u5f55":0,"\u6f5c\u5728\u4f1a\u5f15\u8d77\u672a\u5b9a\u4e49\u884c\u4e3a":89,"\u6fc0\u6d3b":74,"\u6fc0\u6d3b\u51fd\u6570\u662f\u72ec\u7acb\u4e8e":42,"\u6fc0\u6d3b\u65b9\u7a0b":74,"\u6fc0\u6d3b\u7684\u7c7b\u578b":74,"\u70b9\u51fb":[3,63],"\u70b9\u51fb\u8fd9\u91cc":77,"\u7136\u540e":[93,108],"\u7136\u540e\u4e0b\u8f7d\u4f18\u5316\u66f4\u65b0\u540e\u7684\u795e\u7ecf\u7f51\u7edc\u53c2\u6570":92,"\u7136\u540e\u4ea4\u7ed9step\u51fd\u6570":113,"\u7136\u540e\u4f7f\u7528":117,"\u7136\u540e\u4f7f\u7528resize\u63a5\u53e3\u8bbe\u7f6etensor\u7684\u5927\u5c0f":76,"\u7136\u540e\u5355\u51fb":72,"\u7136\u540e\u53ef\u4ee5\u4ecehead\u8282\u70b9ssh\u65e0\u5bc6\u7801\u767b\u5f55\u5230openmpi\u7684\u6bcf\u4e2a\u8282\u70b9\u4e0a":98,"\u7136\u540e\u53ef\u4ee5\u4f7f\u7528\u547d\u4ee4\u884c\u5de5\u5177\u521b\u5efajob":97,"\u7136\u540e\u5728\u4e0b\u4e00\u4e2a\u65f6\u95f4\u6b65\u8f93\u5165\u7ed9\u53e6\u4e00\u4e2a\u795e\u7ecf\u5143":111,"\u7136\u540e\u5728\u524d\u5411":41,"\u7136\u540e\u5728\u6d4f\u89c8\u5668\u4e2d\u8f93\u5165\u4ee5\u4e0b\u7f51\u5740":1,"\u7136\u540e\u5728dataprovider\u91cc\u9762\u6839\u636e\u8be5\u5730\u5740\u52a0\u8f7d\u5b57\u5178":83,"\u7136\u540e\u5728etcd\u7684":10,"\u7136\u540e\u5b89\u88c5paddle\u7684python\u73af\u5883":78,"\u7136\u540e\u5b9a\u4e49":114,"\u7136\u540e\u5c06\u6784\u5efa\u6210\u529f\u7684\u955c\u50cf\u4e0a\u4f20\u5230\u955c\u50cf\u4ed3\u5e93":97,"\u7136\u540e\u5c06\u8fd9\u4e9blayer\u7684\u53c2\u6570":82,"\u7136\u540e\u5c31\u53ef\u4ee5\u5e76\u53d1\u5199\u5165\u591a\u4e2achunk":27,"\u7136\u540e\u6240\u6709\u7528":72,"\u7136\u540e\u624d\u80fd\u4f7f\u7528pfsclient":27,"\u7136\u540e\u6253\u5370\u8f93\u51fa":84,"\u7136\u540e\u6309\u7167\u4e0a\u8ff0\u7684\u65b9\u6cd5":63,"\u7136\u540e\u63d0\u4ea4\u65b0\u6dfb\u52a0\u7684":72,"\u7136\u540e\u70b9\u51fb":[63,72],"\u7136\u540e\u7533\u660e\u4e00\u4e2a\u5b58\u50a8\u5377":97,"\u7136\u540e\u89c2\u5bdf\u5230\u8f93\u51fa\u7684\u53d8\u5316\u4e3a":74,"\u7136\u540e\u901a\u8fc7\u51fd\u6570":97,"\u7136\u540e\u901a\u8fc7\u81ea\u8eab\u7684ip\u5730\u5740\u5728":97,"\u7136\u540e\u91cd\u65b0cmake\u5373\u53ef":78,"\u7136\u800c":[103,114],"\u7248\u672c":[0,3,117],"\u7248\u672c\u4e3acpu_avx_mkl":1,"\u7248\u672c\u4e3acpu_avx_openbla":[3,86],"\u7248\u672c\u5206\u652f":63,"\u7248\u672c\u53f7":63,"\u7248\u672c\u53f7\u5bf9\u5e94\u7684tag\u5373\u53ef":63,"\u7248\u672c\u53f7rc":63,"\u7248\u672c\u5728":72,"\u7248\u672c\u8bf4\u660e":3,"\u7248\u672cfork\u51fa\u81ea\u5df1\u7684\u529f\u80fd\u5206\u652f":63,"\u7279\u522b\u662f\u5728lstm\u7b49rnn\u4e2d":81,"\u7279\u6307":90,"\u7279\u6709\u7684\u8bbe\u5907id":42,"\u72ec\u7acb\u5de5\u5177\u94fe":116,"\u72ec\u7acb\u5de5\u5177\u94fe\u6240\u5728\u7684\u7edd\u5bf9\u8def\u5f84":116,"\u73af\u5883\u51c6\u5907":92,"\u73af\u5883\u53d8\u91cf":91,"\u73af\u5883\u53d8\u91cf\u4e2d":87,"\u73af\u5883\u53d8\u91cf\u6765\u6307\u5b9a\u7279\u5b9a\u7684gpu":81,"\u73b0\u9636\u6bb5\u7684\u4f18\u5316\u4e3b\u8981\u9488\u5bf9":41,"\u73b0\u9636\u6bb5paddle\u6709\u4e00\u4e2a\u95ee\u9898\u662f":45,"\u7406\u89e3":0,"\u7406\u89e3\u4e3a\u4e00\u4e2a\u4e00\u7ef4\u7684\u6574\u578b\u6570\u7ec4":89,"\u751a\u81f3\u80fd\u89e3\u91ca\u4e3a\u4ec0\u4e48\u67d0\u4e2a\u64cd\u4f5c\u82b1\u4e86\u5f88\u957f\u65f6\u95f4":108,"\u751f\u4ea7\u73af\u5883\u4e2d\u7684\u8bad\u7ec3\u6570\u636e\u96c6\u901a\u5e38\u4f53\u79ef\u5f88\u5927":11,"\u751f\u4ea7\u73af\u5883\u7684\u65e5\u5fd7\u6570\u636e\u4f1a\u901a\u8fc7\u5b9e\u65f6\u6d41\u7684\u65b9\u5f0f":11,"\u751f\u4ea7\u955c\u50cf":72,"\u751f\u6210":97,"\u751f\u6210\u5404\u79cd\u8bed\u8a00\u7684\u7ed1\u5b9a\u4ee3\u7801":45,"\u751f\u6210\u540e\u7684\u6587\u6863\u5206\u522b\u5b58\u50a8\u5728\u7f16\u8bd1\u76ee\u5f55\u7684":77,"\u751f\u6210\u5e8f\u5217\u7684\u6700\u5927\u957f\u5ea6":114,"\u751f\u6210\u6587\u6863":45,"\u751f\u6210\u7684":11,"\u751f\u6210\u7684\u6027\u80fd\u5206\u6790\u6587\u4ef6\u4e3a":107,"\u751f\u6210\u7684\u6570\u636e\u5c06\u4f1a\u5b58\u50a8\u5728\u8fd9\u4e2avolume\u4e0b":97,"\u751f\u6210\u7684\u6570\u636e\u7f13\u5b58\u5728\u5185\u5b58\u91cc":81,"\u751f\u6210\u7ed9\u5b9a":11,"\u751f\u6210\u7f51\u7edc\u5c42\u914d\u7f6e":74,"\u751f\u6210\u81ea\u5df1\u76ee\u5f55\u4e0b\u7684\u4ed3\u5e93":72,"\u751f\u6210\u8c03\u8bd5\u4fe1\u606f":107,"\u751f\u6210\u968f\u673a\u7684\u8f93\u5165\u6570\u636e":75,"\u751f\u6210api\u6587\u6863":45,"\u751f\u6210pfsclient\u548cpfsserver\u7684\u6846\u67b6\u90e8\u5206":27,"\u751f\u6210python\u6027\u80fd\u5206\u6790\u7684\u547d\u4ee4\u5982\u4e0b":107,"\u7528":27,"\u7528\u4e8e\u521d\u59cb\u5316\u53c2\u6570\u548c\u8bbe\u7f6e":74,"\u7528\u4e8e\u5c06\u53c2\u6570\u4f20\u9012\u7ed9\u7f51\u7edc\u914d\u7f6e":105,"\u7528\u4e8e\u6307\u5b9a\u5176\u8981\u5173\u8054\u7684layer":82,"\u7528\u4e8e\u6307\u5b9a\u7f51\u7edc\u914d\u7f6e\u6587\u4ef6":103,"\u7528\u4e8e\u6ce8\u518c\u6ca1\u6709\u53cd\u5411\u7684op":75,"\u7528\u4e8e\u6d4b\u8bd5\u548c\u5bf9\u6bd4\u5728\u4f7f\u7528mkl":42,"\u7528\u4e8e\u7a00\u758f\u7c7b\u578b\u53c2\u6570\u901a\u4fe1\u7684\u7aef\u53e3\u4e2a\u6570":91,"\u7528\u4e8e\u7a00\u758f\u8bad\u7ec3\u4e2d":103,"\u7528\u4e8e\u7ba1\u7406mkl":42,"\u7528\u4e8e\u83b7\u53d6\u7279\u5b9alayer\u4e0a\u4e00\u65f6\u95f4\u6b65\u7684\u8f93\u51fa":82,"\u7528\u4e8e\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898":88,"\u7528\u4e8e\u8ba1\u7b97\u7f16\u7801\u5411\u91cf\u7684\u52a0\u6743\u548c":114,"\u7528\u4e8e\u8bad\u7ec3\u795e\u7ecf\u7f51\u7edc\u7684\u6570\u636e":92,"\u7528\u4e8e\u9009\u62e9\u662f\u5426\u4f7f\u7528\u76f8\u5173\u529f\u80fd":41,"\u7528\u4e8e\u9009\u62e9\u662f\u5426\u4f7f\u7528mkl":42,"\u7528\u4e8emkl":[41,42],"\u7528\u53cc\u5411\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7f16\u7801":114,"\u7528\u591a\u5bf9\u6548\u679c\u5b8c\u5168\u76f8\u540c\u7684":111,"\u7528\u6237\u4e00\u822c\u901a\u8fc7\u8c03\u7528":107,"\u7528\u6237\u4e0a\u4f20\u6570\u636e\u540e":11,"\u7528\u6237\u4e5f\u53ef\u4ee5\u4e0a\u4f20label":11,"\u7528\u6237\u4e5f\u53ef\u4ee5\u4f7f\u7528paddlepaddle\u63d0\u4f9b\u7684\u5b98\u65b9\u5f00\u53d1\u955c\u50cf":116,"\u7528\u6237\u4ea6\u53ef\u4ee5\u901a\u8fc7\u624b\u52a8\u8bbe\u7f6e":116,"\u7528\u6237\u4eceapp":117,"\u7528\u6237\u53ea\u9700\u5b9a\u4e49rnn\u5728\u4e00\u4e2a\u65f6\u95f4\u6b65\u5185\u5b8c\u6210\u7684\u8ba1\u7b97":113,"\u7528\u6237\u53ef\u4ee5\u5206\u522b\u67e5\u770b\u6700\u65b0\u7684":77,"\u7528\u6237\u53ef\u4ee5\u53c2\u8003\u4e0b\u6587":116,"\u7528\u6237\u53ef\u4ee5\u53c2\u8003sphinx\u6559\u7a0b\u8fdb\u884c\u4e66\u5199":77,"\u7528\u6237\u53ef\u4ee5\u5b89\u5168\u7684\u91ca\u653e\u67d0\u4e2ac":46,"\u7528\u6237\u53ef\u4ee5\u628a\u81ea\u5df1\u7684\u6570\u636e\u5206\u4eab\u7ed9\u522b\u4eba":11,"\u7528\u6237\u53ef\u4ee5\u76f4\u63a5\u4f7f\u7528\u8fd9\u4e2a\u52a8\u6001\u5e93\u6765\u5f15\u5165paddl":46,"\u7528\u6237\u53ef\u4ee5\u81ea\u5b9a\u4e49beam":103,"\u7528\u6237\u53ef\u4ee5\u8bbe\u7f6e":105,"\u7528\u6237\u53ef\u5728\u81ea\u5df1\u719f\u6089\u7684\u5f00\u53d1\u5e73\u53f0\u4e0a\u7f16\u8bd1android\u5e73\u53f0\u4e0a\u9002\u7528\u7684paddlepaddle\u5e93":116,"\u7528\u6237\u53ef\u5728\u8c03\u7528cmake\u7684\u65f6\u5019\u8bbe\u7f6e\u5b83\u4eec":0,"\u7528\u6237\u53ef\u5c06":116,"\u7528\u6237\u53ef\u5c06\u5408\u6210\u7684fat\u5e93\u7528\u4e8e\u6df1\u5ea6\u5b66\u4e60\u76f8\u5173\u7684io":117,"\u7528\u6237\u53ef\u6839\u636e\u81ea\u5df1\u7684\u7f16\u8bd1\u76ee\u6807\u67b6\u6784":116,"\u7528\u6237\u53ef\u81ea\u884c\u524d\u5f80\u4e0b\u8f7d\u9884\u7f16\u8bd1\u597d\u7684\u7248\u672c":116,"\u7528\u6237\u53ef\u901a\u8fc7\u5982\u4e0b\u4e24\u79cd\u65b9\u5f0f":116,"\u7528\u6237\u5728\u4f7f\u7528\u8fd9\u4e00\u7c7brecurr":82,"\u7528\u6237\u5728\u4f7f\u7528paddlepaddl":78,"\u7528\u6237\u5728\u672c\u5730\u8f6c\u6362\u597d\u518d\u4e0a\u4f20":11,"\u7528\u6237\u5c06\u53c2\u6570\u8f7d\u5165":83,"\u7528\u6237\u5c06\u914d\u7f6e\u4e0e\u8bad\u7ec3\u6570\u636e\u5207\u5206\u597d\u653e\u5728\u5206\u5e03\u5f0f\u6587\u4ef6\u7cfb\u7edf\u9884\u5148\u5206\u914d\u597d\u7684\u76ee\u5f55\u4e2d":97,"\u7528\u6237\u5f3a\u5236\u6307\u5b9a\u7279\u5b9a\u7684python\u7248\u672c":78,"\u7528\u6237\u6587\u4ef6\u53ef\u80fd\u662f\u6bd4\u8f83\u5927\u7684":27,"\u7528\u6237\u7684\u96c6\u7fa4\u73af\u5883\u4e0d\u5c3d\u76f8\u540c":94,"\u7528\u6237\u8fd8\u53ef\u6839\u636e\u81ea\u5df1\u7684\u9700\u6c42\u8bbe\u7f6e\u5176\u4ed6\u7f16\u8bd1\u53c2\u6570":[116,117,118],"\u7528\u6237\u901a\u8fc7\u53c2\u6570":[82,83],"\u7528\u6237\u901a\u8fc7c":46,"\u7528\u6237\u9700\u8981\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u6307\u5b9a":105,"\u7528\u6237\u9700\u8981\u5728cmake\u65f6\u624b\u52a8\u8bbe\u7f6e\u8fd9\u4e9b\u503c":[116,118],"\u7528\u6237\u9700\u8981\u6307\u5b9a\u672c\u673a\u4e0apython\u7684\u8def\u5f84":78,"\u7528\u6237\u9700\u8981\u63d0\u524d\u51c6\u5907\u597d\u4ea4\u53c9\u7f16\u8bd1\u73af\u5883":116,"\u7528\u6765\u4ece\u53c2\u6570\u670d\u52a1\u5668\u9884\u53d6\u53c2\u6570\u77e9\u9635\u76f8\u5e94\u7684\u884c":74,"\u7528\u6765\u5b58\u50a8\u672c\u6b21\u6027\u80fd\u5206\u6790\u7684\u7ed3\u679c":107,"\u7528\u8fd9\u4e2a\u955c\u50cf\u521b\u5efa\u7684\u5bb9\u5668\u9700\u8981\u6709\u4ee5\u4e0b\u4e24\u4e2a\u529f\u80fd":97,"\u7528web\u6d4f\u89c8\u5668\u8bbf\u95ee\u5bf9\u5e94\u7f51\u5740":107,"\u7531":[82,89,113],"\u7531\u4e8e":87,"\u7531\u4e8e\u5728\u73b0\u6709\u7684\u67d0\u4e9b\u60c5\u51b5\u4e0b":41,"\u7531\u4e8e\u5b83\u5185\u90e8\u5305\u542b\u4e86\u6bcf\u7ec4\u6570\u636e\u4e2d\u7684\u6240\u6709\u53e5\u5b50":111,"\u7531\u4e8e\u5bf9parameters\u7684\u66f4\u65b0\u9700\u8981\u83b7\u53d6parameters\u5185\u5b58\u7684":10,"\u7531\u4e8e\u6211\u4eec\u60f3\u8981\u7684\u53d8\u6362\u662f\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217":111,"\u7531\u4e8e\u6211\u4eec\u652f\u6301\u8bad\u7ec3\u6570\u636e\u6709\u4e0d\u540c\u7684\u6279\u6b21\u5927\u5c0f":74,"\u7531\u4e8e\u96c6\u7fa4\u4e2d\u540c\u65f6\u5b58\u5728\u4e24\u53f0\u673a\u5668\u6545\u969c\u7684\u6982\u7387\u6781\u4f4e":10,"\u7531\u4e8earm64\u67b6\u6784\u8981\u6c42android":116,"\u7531\u4e8ec":45,"\u7531\u4e8echunk\u6bd4\u8f83\u5c0f":27,"\u7531\u4e8eeigen":76,"\u7531\u4e8emkl":42,"\u7531\u4e8epypi":63,"\u7531\u4e8estep":113,"\u7531\u4e8etensor\u7684rank\u662f\u6a21\u677f\u53c2\u6570":76,"\u7531\u5206\u652f\u5904\u7684layer\u8d1f\u8d23\u6c42\u548c":42,"\u7531\u8bcd\u8bed\u6784\u6210\u7684\u53e5\u5b50":110,"\u7531\u94fe\u63a5\u65b9\u5f0f\u51b3\u5b9a":87,"\u7533\u8bf7\u7528\u6237\u7a7a\u95f4":27,"\u767b\u5f55\u5230head\u8282\u70b9":98,"\u7684":[3,72,76,89,90,91,93,97,117],"\u7684\u4e00\u4e2a\u5b50\u96c6":42,"\u7684\u4e00\u4e2a\u7b80\u5355\u8c03\u7528\u5982\u4e0b":113,"\u7684\u4e3a0":103,"\u7684\u4efb\u4e00\u4e00\u79cd":81,"\u7684\u4f5c\u7528\u662f\u5ef6\u8fdf\u5206\u914d\u5185\u5b58":76,"\u7684\u4f7f\u7528\u793a\u4f8b\u5982\u4e0b":110,"\u7684\u4fe1\u606f":42,"\u7684\u503c":[116,117,118],"\u7684\u503c\u81ea\u52a8\u63a8\u5bfc\u5f97\u5230":116,"\u7684\u504f\u7f6e\u5411\u91cf":74,"\u7684\u5171\u4eab\u5df2\u7ecf\u52a0\u8f7d\u7684\u9884\u6d4b\u6a21\u578b":90,"\u7684\u5177\u4f53\u8ba1\u7b97\u903b\u8f91":75,"\u7684\u5185\u5b58":81,"\u7684\u5185\u5bb9\u6765\u5b9a\u5236imag":97,"\u7684\u5185\u6838block\u4f7f\u7528\u60c5\u51b5":108,"\u7684\u5355\u5143\u6d4b\u8bd5":75,"\u7684\u5355\u5143\u6d4b\u8bd5\u548c\u7b80\u5355\u7f51\u7edc\u7684\u6574\u4f53\u6d4b\u8bd5":42,"\u7684\u53c2\u6570\u4f7f\u4e4b\u652f\u6301\u5f02\u6b65sgd\u66f4\u65b0":91,"\u7684\u53cd\u5411\u4f20\u64ad\u5c06\u4f1a\u6253\u5370\u65e5\u5fd7\u4fe1\u606f":103,"\u7684\u53d8\u6362\u77e9\u9635":74,"\u7684\u540d\u79f0\u76f8\u540c":114,"\u7684\u5411\u91cf":74,"\u7684\u542f\u52a8\u53c2\u6570":97,"\u7684\u542f\u52a8\u53c2\u6570\u5e76\u6267\u884c\u8fdb\u7a0b":97,"\u7684\u547d\u4ee4\u548c\u4e00\u822c\u7684":107,"\u7684\u547d\u540d\u98ce\u683c\u5e76\u4e0d\u80fd\u9002\u5e94\u5176\u4ed6\u7b2c\u4e09\u65b9\u8bed\u8a00":45,"\u7684\u5730\u65b9":72,"\u7684\u5747\u5300\u5206\u5e03":83,"\u7684\u57fa\u672c\u903b\u8f91":42,"\u7684\u591a\u79cd\u5b89\u88c5\u65b9\u5f0f":101,"\u7684\u5934\u6587\u4ef6":45,"\u7684\u5b50\u7c7b\u53ea\u9700\u8981\u4f7f\u7528\u5185\u90e8\u5b58\u50a8\u5c31\u53ef\u4ee5\u4e86":42,"\u7684\u5b9e\u73b0":75,"\u7684\u5de5\u4f5c\u6d41\u7a0b\u5982\u56fe1\u6240\u793a":90,"\u7684\u5e73\u5747\u503c":110,"\u7684\u5e8f\u5217":89,"\u7684\u5e8f\u5217\u5f62\u72b6\u4e00\u81f4":111,"\u7684\u5f00\u53d1\u5de5\u4f5c\u90fd\u5e94\u8be5\u5728\u4e00\u4e2a\u65b0\u7684\u5206\u652f\u4e0a\u5b8c\u6210":72,"\u7684\u5f00\u53d1\u6d41\u7a0b":0,"\u7684\u5f00\u59cb\u8bf7\u52a0\u4e0a\u5b8f\u5b9a\u4e49":75,"\u7684\u6027\u80fd\u5206\u6790\u4e0e\u8c03\u4f18\u5206\u4e3a\u4e24\u4e2a\u90e8\u5206":107,"\u7684\u6027\u80fd\u5206\u6790\u5de5\u5177\u975e\u5e38\u591a":107,"\u7684\u6027\u80fd\u6709\u95ee\u9898":107,"\u7684\u60c5\u51b5\u4e0b":41,"\u7684\u63a5\u53e3\u6837\u5f0f":45,"\u7684\u63a5\u53e3\u8bf7\u67e5\u770b":89,"\u7684\u63cf\u8ff0\u8bf4\u660e\u4e2d":72,"\u7684\u64cd\u4f5c":76,"\u7684\u6570\u636e\u6d41\u56fe":11,"\u7684\u6570\u76ee\u4e00\u81f4":110,"\u7684\u6587\u4ef6\u4e5f\u5e26\u5230\u65b0\u5206\u652f\u4e0a":72,"\u7684\u65b9\u7a0b":74,"\u7684\u65f6\u5019":42,"\u7684\u65f6\u95f4\u6b65\u4fe1\u606f\u6210\u6b63\u6bd4":81,"\u7684\u66f4\u8be6\u7ec6\u51c6\u786e\u7684\u5b9a\u4e49":111,"\u7684\u6700\u5c0f\u503c":103,"\u7684\u6700\u65b0\u4ee3\u7801\u5e76\u66f4\u65b0\u5f53\u524d\u5206\u652f":72,"\u7684\u6784\u9020\u51fd\u6570":75,"\u7684\u67b6\u6784\u7684\u793a\u4f8b":114,"\u7684\u6837\u5f0f":72,"\u7684\u6838\u5fc3\u662f\u8bbe\u8ba1step\u51fd\u6570\u7684\u8ba1\u7b97\u903b\u8f91":113,"\u7684\u6839\u76ee\u5f55":117,"\u7684\u683c\u5f0f\u59cb\u7ec8\u662f":42,"\u7684\u683c\u5f0f\u5b58\u50a8":42,"\u7684\u6982\u5ff5":42,"\u7684\u6bb5\u843d\u5b9a\u4e49\u4e3a\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":113,"\u7684\u6bcf\u4e2a\u8fdb\u7a0b\u90fd\u53ef\u4ee5\u4ececeph\u8bfb\u53d6\u6570\u636e":96,"\u7684\u6e90\u7801\u4ee5\u53ca\u751f\u6210\u6587\u6863\u9700\u8981\u591a\u79cd\u5f00\u53d1\u5de5\u5177":72,"\u7684\u6e90\u7801\u91cc\u4f7f\u7528\u4e86":45,"\u7684\u7248\u672c":[63,118],"\u7684\u72b6\u6001":113,"\u7684\u72ec\u7acb\u5de5\u5177\u94fe":116,"\u7684\u77e9\u9635":[74,81],"\u7684\u7a20\u5bc6\u5411\u91cf\u4f5c\u4e3a\u8f93\u5165":74,"\u7684\u7a20\u5bc6\u77e9\u9635":89,"\u7684\u7a20\u5bc6\u77e9\u9635\u662f\u4e00\u4e2a\u7531":89,"\u7684\u7b2c\u4e00\u4e2a\u53c2\u6570":90,"\u7684\u7b2ci\u4e2a\u503c":74,"\u7684\u7b2cj\u4e2a\u503c":74,"\u7684\u7cfb\u7edf":0,"\u7684\u7ed3\u679c":41,"\u7684\u7f16\u5199":91,"\u7684\u7f16\u8bd1\u5de5\u5177\u94fe":116,"\u7684\u7f29\u5199":27,"\u7684\u7f51\u7edc\u6a21\u578b":41,"\u7684\u89c4\u8303":45,"\u7684\u89d2\u5ea6":11,"\u7684\u8ba1\u7b97\u4ee3\u7801":76,"\u7684\u8ba1\u7b97\u8fc7\u7a0b\u4e66\u5199\u66f4\u52a0\u7b80\u5355":75,"\u7684\u8bdd":81,"\u7684\u8be6\u7ec6\u4fe1\u606f":107,"\u7684\u8f93\u5165":113,"\u7684\u8f93\u51fa":[81,108],"\u7684\u8f93\u51fa\u4fe1\u606f\u5165\u624b\u662f\u4e2a\u4e0d\u9519\u7684\u9009\u62e9":108,"\u7684\u8f93\u51fa\u51fd\u6570\u8fd4\u56de\u7684\u662f\u4e0b\u4e00\u4e2a\u65f6\u523b\u8f93\u51fa\u8bcd\u7684":114,"\u7684\u8f93\u51fa\u683c\u5f0f":111,"\u7684\u8f93\u51fa\u88ab\u7528\u4f5c":114,"\u7684\u8f93\u51fab\u662f\u4e00\u4e2a":81,"\u7684\u8fd0\u884c\u73af\u5883":0,"\u7684\u8fdc\u7a0b\u4ed3\u5e93\u7684\u540d\u5b57":72,"\u7684\u914d\u7f6e\u5199\u5230\u914d\u7f6e\u6587\u4ef6\u4e2d":11,"\u7684\u96c6\u88c5\u7bb1\u6280\u672f":0,"\u7684\u9875\u9762\u5220\u9664\u8fdc\u7a0b\u4ed3\u5e93\u7684\u5206\u652f":72,"\u7684cpu":75,"\u7684docker\u955c\u50cf":1,"\u7684flag":[41,42],"\u7684linux\u670d\u52a1\u5668\u7ec4\u6210":101,"\u7684paddlepaddle\u5e93":116,"\u7684vanilla":41,"\u76d1\u542c\u7684\u7aef\u53e3\u4e2a\u6570":91,"\u76ee\u524d":113,"\u76ee\u524d\u4f7f\u7528":72,"\u76ee\u524d\u53ea\u8003\u8651":42,"\u76ee\u524d\u53ea\u8003\u8651\u52a8\u6001\u6269\u5bb9trainer\u6570\u91cf":10,"\u76ee\u524d\u5728paddlepaddle\u4e2d":42,"\u76ee\u524d\u5728paddlepaddle\u4e2d\u6570\u636e\u90fd\u662f\u4ee5":42,"\u76ee\u524d\u5d4c\u5165python\u89e3\u91ca\u5668":45,"\u76ee\u524d\u5fc5\u987b\u8bbe\u7f6e\u6210":118,"\u76ee\u524d\u6211\u4eec\u7528cephfs\u6765\u642d\u5efa":27,"\u76ee\u524d\u63d0\u4f9b\u4e09\u79cd\u94fe\u63a5\u65b9\u5f0f":87,"\u76ee\u524d\u652f\u6301":116,"\u76ee\u524d\u652f\u6301\u4e24\u79cd":110,"\u76ee\u524d\u652f\u6301cento":86,"\u76ee\u524d\u652f\u6301fail":103,"\u76ee\u524d\u7684\u4f18\u5316":42,"\u76ee\u524d\u8be5\u53c2\u6570\u4ec5\u7528\u4e8eaucvalidationlayer\u548cpnpairvalidationlayer\u5c42":103,"\u76ee\u524d\u8fd8\u672a\u652f\u6301":113,"\u76ee\u524dpaddle\u7684\u8fdb\u7a0b\u6a21\u578b\u662fc":45,"\u76ee\u524dpaddlepaddle\u7684develop\u5206\u652f\u7684\u6587\u6863\u662f\u81ea\u52a8\u89e6\u53d1\u66f4\u65b0\u7684":77,"\u76ee\u524dpaddlepaddle\u91c7\u7528\u4e86":41,"\u76ee\u5f55":[0,1,93,96,97,116,117,118],"\u76ee\u5f55\u4e0b":[46,74,93],"\u76ee\u5f55\u4e0b\u5bf9\u5e94\u7684\u5730\u65b9":42,"\u76ee\u5f55\u4e0b\u65b0\u589e\u7684":75,"\u76ee\u5f55\u4e0b\u6700\u65b0\u7684":117,"\u76ee\u5f55\u4e0b\u7684\u4ee3\u7801\u793a\u4f8b":90,"\u76ee\u5f55\u4e0b\u7684\u751f\u6210\u6587\u4ef6\u7528\u4e8e\u6df1\u5ea6\u5b66\u4e60\u76f8\u5173android":116,"\u76ee\u5f55\u4e0b\u7684python\u5305":78,"\u76ee\u5f55\u4e2d":[87,90,93],"\u76ee\u5f55\u4e2d\u4f1a\u5305\u542b":[116,118],"\u76ee\u5f55\u4e2d\u4f1a\u5305\u542b\u4ee5\u4e0b\u5185\u5bb9":117,"\u76ee\u5f55\u4e2d\u7684":108,"\u76ee\u5f55\u4e2dpaddl":97,"\u76ee\u5f55\u548c":[116,117,118],"\u76ee\u5f55\u5c31\u6210\u4e3a\u4e86\u5171\u4eab\u5b58\u50a8":97,"\u76ee\u5f55\u751f\u6210\u4e00\u5957\u72ec\u7acb\u7f16\u8bd1\u5de5\u5177\u94fe":116,"\u76ee\u5f55\u91cc\u627e\u5230\u4ea4\u53c9\u7f16\u8bd1\u5668":118,"\u76ee\u6807\u5411\u91cf":114,"\u76ee\u6807\u5de5\u5177\u94fe":116,"\u76ee\u6807\u673a\u7248protobuf\u5e93":118,"\u76ee\u6807\u67b6\u6784":117,"\u76ee\u6807\u67b6\u6784abi":116,"\u76f4\u5230\u8bad\u7ec3\u6536\u655b\u4e3a\u6b62":83,"\u76f4\u63a5\u4f7f\u7528c\u8bed\u8a00\u7684":45,"\u76f4\u63a5\u5220\u9664\u8fd9\u4e2a\u53c2\u6570\u5373\u53ef":46,"\u76f4\u63a5\u5347\u7ea7\u5230\u66f4\u65b0\u7684\u7248\u672c":0,"\u76f4\u63a5\u5bfc\u51fa\u5230c\u7684\u63a5\u53e3\u6bd4\u8f83\u56f0\u96be":45,"\u76f4\u63a5\u8c03\u7528\u76f8\u5e94\u63a5\u53e3\u5373\u53ef":75,"\u76f4\u63a5\u8fd0\u884c":1,"\u76f8\u5173\u5c42":41,"\u76f8\u540c\u540d\u5b57\u7684\u53c2\u6570":83,"\u76f8\u6bd4":75,"\u76f8\u6bd4\u4e8e\u6a21\u578b\u8bad\u7ec3":88,"\u770b\u5f53\u524dmpi\u96c6\u7fa4\u662f\u5426\u652f\u6301resourc":79,"\u77a7":86,"\u77e9\u9635":102,"\u77e9\u9635\u4e2d\u6bcf\u4e2a\u5143\u7d20\u7684\u503c\u968f\u673a\u751f\u6210":89,"\u77e9\u9635\u4e58\u6cd5\u7684\u516c\u5f0f":75,"\u77e9\u9635\u5927\u5c0f\u662f":41,"\u77e9\u9635\u662f\u5426\u662f\u4e00\u4e2a\u5e8f\u5217":89,"\u77e9\u9635\u7684\u9ad8\u5ea6":89,"\u77e9\u9635\u91cc\u7684\u5143\u7d20\u662f\u6d6e\u70b9\u6570":89,"\u786e\u4fdd\u7f16\u8bd1\u5668\u9009\u9879":72,"\u78c1\u76d8\u4e0d\u591f":0,"\u78c1\u76d8\u7a7a\u95f4\u4e0d\u8db3\u7b49":79,"\u793a\u4f8b":[81,83,90],"\u793a\u4f8b3\u5bf9\u4e8e\u5355\u5c42rnn\u548c\u53cc\u5c42rnn\u6570\u636e\u5b8c\u5168\u76f8\u540c":111,"\u793a\u4f8b3\u7684\u914d\u7f6e\u4f7f\u7528\u4e86\u5355\u5c42rnn\u548c\u53cc\u5c42rnn":111,"\u793a\u4f8b3\u7684\u914d\u7f6e\u5206\u522b\u4e3a":111,"\u793a\u4f8b\u4ee3\u7801\u5982\u4e0b":[81,90],"\u793a\u4f8b\u5982\u4e0b":83,"\u793a\u4f8b\u7a0b\u5e8f":91,"\u793e\u533a\u53c2\u4e0e\u56f0\u96be":45,"\u793e\u533a\u8d21\u732e\u4ee3\u7801\u5b66\u4e60\u6210\u672c\u9ad8":45,"\u795e\u7ecf\u7f51\u7edc\u4e2d\u4e00\u4e2a\u8ba1\u7b97\u5c42\u7684\u8f93\u5165":89,"\u795e\u7ecf\u7f51\u7edc\u4e2d\u4e00\u4e2a\u8ba1\u7b97\u5c42\u7684\u8f93\u5165\u8f93\u51fa\u88ab\u7ec4\u7ec7\u4e3a\u4e00\u4e2a":90,"\u795e\u7ecf\u7f51\u7edc\u4e2d\u7684\u53c2\u6570":10,"\u795e\u7ecf\u7f51\u7edc\u4e5f\u9700\u8981\u4e00\u4e9b\u7279\u5b9a\u7684layer\u4f5c\u4e3a\u8f93\u5165\u63a5\u53e3":84,"\u795e\u7ecf\u7f51\u7edc\u53c2\u6570\u4ee5\u53ca\u8fed\u4ee3\u65b9\u7a0b":84,"\u795e\u7ecf\u7f51\u7edc\u5728\u8bad\u7ec3\u7684\u65f6\u5019":81,"\u795e\u7ecf\u7f51\u7edc\u672c\u8d28\u4e0a\u662f\u4e00\u4e2a\u8ba1\u7b97\u56fe":76,"\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u7ed3\u6784\u548c\u8bad\u7ec3\u597d\u7684\u6a21\u578b\u5c06\u88ab\u5e8f\u5217\u5316\u5408\u5e76\u5165\u4e00\u4e2a\u6587\u4ef6":90,"\u795e\u7ecf\u7f51\u7edc\u7684\u7f51\u7edc\u7ed3\u6784\u4e2d\u5177\u6709\u6709\u5411\u73af\u7ed3\u6784":111,"\u795e\u7ecf\u7f51\u7edc\u7684\u8bad\u7ec3\u672c\u8eab\u662f\u4e00\u4e2a\u975e\u5e38\u6d88\u8017\u5185\u5b58\u548c\u663e\u5b58\u7684\u5de5\u4f5c":81,"\u79bb\u7ebf\u6279\u5904\u7406":11,"\u79f0\u4e3a":[72,114],"\u79f0\u4e3a\u5f00\u53d1\u955c\u50cf":116,"\u79f0\u4e4b\u4e3a":89,"\u79f0\u4e4b\u4e3a\u53cc\u5c42\u5e8f\u5217\u7684\u4e00\u4e2a\u5b50\u5e8f\u5217":110,"\u79f0\u4e4b\u4e3a\u96c6\u675f\u5927\u5c0f":103,"\u79f0\u4f5c\u6709kernel":75,"\u79f0\u4f5ckernel":75,"\u79fb\u52a8\u7aef\u9884\u6d4b":89,"\u7a00\u758f\u6570\u636e\u7684\u683c\u5f0f":74,"\u7a00\u758f\u66f4\u65b0\u7684\u7aef\u53e3\u6570\u91cf":97,"\u7a00\u758f\u768401\u5411\u91cf":84,"\u7a00\u758f\u7684\u5411\u91cf":84,"\u7a00\u758f\u77e9\u9635":89,"\u7a00\u758f\u77e9\u9635\u4f7f\u7528":89,"\u7a00\u758f\u77e9\u9635\u53ca\u76f8\u5173\u7684\u63a5\u53e3":89,"\u7a00\u758f\u77e9\u9635\u5b58\u50a8\u793a\u610f\u56fe":89,"\u7a00\u758f\u77e9\u9635\u7684\u4e58\u79ef\u5e94\u7528\u4e8e\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b":105,"\u7a0b\u5e8f\u4ece\u6b64\u76ee\u5f55\u62f7\u8d1d\u6587\u4ef6\u5230\u5bb9\u5668\u5185\u8fdb\u884c\u8bad\u7ec3":97,"\u7a0b\u5e8f\u4f9d\u8d56":91,"\u7a0b\u5e8f\u505c\u6b62":103,"\u7a0b\u5e8f\u662f\u4e00\u6837\u7684":107,"\u7a0b\u5e8f\u76f4\u63a5\u9000\u51fa":103,"\u7a20\u5bc6\u5411\u91cf":74,"\u7a20\u5bc6\u66f4\u65b0\u7684\u7aef\u53e3\u6570\u91cf":97,"\u7a20\u5bc6\u7684\u6d6e\u70b9\u6570\u5411\u91cf":84,"\u7a20\u5bc6\u77e9\u9635":89,"\u7acb\u523b\u9000\u51fa":81,"\u7acb\u5373\u6267\u884c\u5355\u5143\u6d4b\u8bd5":0,"\u7ae0\u8282":116,"\u7aef\u53e3":79,"\u7aef\u6570\u636e\u7c7b\u578b":89,"\u7aef\u7684":107,"\u7aef\u8bfb\u53d6\u6570\u636e":81,"\u7b2c\u4e00\u4e2a":72,"\u7b2c\u4e00\u4e2a\u53c2\u6570":75,"\u7b2c\u4e00\u4e2a\u6837\u672c\u540c\u65f6encode\u4e24\u6761\u6570\u636e\u6210\u4e24\u4e2a\u5411\u91cf":111,"\u7b2c\u4e00\u4e2atag\u4e3a":63,"\u7b2c\u4e00\u6b65\u9700\u8c03\u7528":90,"\u7b2c\u4e00\u7ae0\u8282":84,"\u7b2c\u4e09\u4e2a\u53c2\u6570":75,"\u7b2c\u4e09\u65b9\u4f9d\u8d56\u5e93\u9700\u8981\u6309\u7167\u4e0e\u65b9\u5f0f2\u540c\u6837\u65b9\u6cd5\u663e\u793a\u5730\u8fdb\u884c\u94fe\u63a5":87,"\u7b2c\u4e09\u65b9\u94fe\u63a5\u5e93\u548c\u5934\u6587\u4ef6":87,"\u7b2c\u4e09\u6b65\u5b8c\u6210\u540e":63,"\u7b2c\u4e8c\u4e2a":81,"\u7b2c\u4e8c\u4e2a\u4e3a":63,"\u7b2c\u4e8c\u4e2a\u53c2\u6570":75,"\u7b2c\u4e8c\u7c7b":82,"\u7b2ci\u884c\u7b2cj\u5217\u7684\u6570\u503c":74,"\u7b49":[42,46,75,79,90],"\u7b49\u4e8e\u6837\u672c\u6570":81,"\u7b49\u5168\u90e8\u9759\u6001\u5e93\u4e2d\u7684\u76ee\u6807\u6587\u4ef6\u5168\u90e8\u6253\u5305\u540e\u4ea7\u751f\u7684\u6587\u4ef6":46,"\u7b49\u53c2\u6570":97,"\u7b49\u5f85\u7f16\u8bd1\u5b8c\u6210\u540e":63,"\u7b49\u6587\u4ef6":46,"\u7b49\u7b2c\u4e09\u65b9\u5e93":87,"\u7b80\u5199":75,"\u7b80\u5355\u4ecb\u7ecd\u9700\u8981\u7528\u5230\u57fa\u7c7b":75,"\u7b80\u5355\u603b\u7ed3op\u9700\u8981\u5305\u542b\u7684\u5185\u5bb9\u5982\u4e0b":75,"\u7b80\u5355\u6765\u8bf4":108,"\u7b80\u5355\u7684\u5168\u8fde\u63a5\u7f51\u7edc":83,"\u7b80\u5355\u7684\u6027\u80fd\u5206\u6790":108,"\u7b80\u5355\u7684yaml\u6587\u4ef6\u5982\u4e0b":96,"\u7b97\u6cd5":[81,114],"\u7b97\u6cd5\u4e2d\u7684beam\u5927\u5c0f":114,"\u7ba1\u7406\u4e86\u6bcf\u4e2a\u8ba1\u7b97\u5c42\u8f93\u51fa\u7684\u5b58\u50a8\u7a7a\u95f4":90,"\u7ba1\u7406\u7684\u65b9\u6cd5":94,"\u7c7b\u4f3c":[46,110],"\u7c7b\u4f5c\u4e3a\u53c2\u6570\u7684\u62bd\u8c61":74,"\u7c7b\u522b\u4e2d\u7684\u53c2\u6570\u53ef\u7528\u4e8e\u6240\u6709\u573a\u5408":102,"\u7c7b\u522b\u6807\u7b7e\u4e4b\u4e00":90,"\u7c7b\u522b\u6807\u7b7e\u5c42":90,"\u7c7b\u540d\u548cc":45,"\u7c7b\u578b":[45,75,89,103],"\u7c7b\u578b\u4e3a":75,"\u7c7b\u578b\u4ecd\u7136\u4e3aeigenvector":76,"\u7c7b\u578b\u53ef\u4ee5\u662fpaddlepaddle\u652f\u6301\u7684\u4efb\u610f\u8f93\u5165\u6570\u636e\u7c7b\u578b":110,"\u7c7b\u578b\u540d\u4e3a":75,"\u7c7b\u578b\u662fnumpy\u7684ndarrai":81,"\u7c7b\u578b\u662fsparse_binary_vector":84,"\u7c7b\u578b\u662fsparse_float_vector":84,"\u7c7b\u578b\u7684":111,"\u7c7b\u578b\u7b49\u662f\u5426\u5408\u6cd5":75,"\u7c7b\u578b\u8fd8\u662f":89,"\u7c7b\u7684\u5b9a\u4e49\u5199\u5728":75,"\u7c7b\u7684\u5bf9\u8c61":90,"\u7c7b\u7684\u6784\u9020\u51fd\u6570\u548c\u6790\u6784\u51fd\u6570":74,"\u7c7b\u91cd\u5199":75,"\u7c7b\u9700\u8981\u5b9e\u73b0\u521d\u59cb\u5316":74,"\u7cfb\u6570":75,"\u7cfb\u7edf\u4e2d\u7684\u74f6\u9888\u53ef\u80fd\u548c\u7a0b\u5e8f\u5458\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u60f3\u8c61\u7684\u74f6\u9888\u76f8\u53bb\u751a\u8fdc":107,"\u7cfb\u7edf\u4f1a\u5bf9\u65b0\u589e\u7684op\u81ea\u52a8\u7ed1\u5b9apython":75,"\u7cfb\u7edf\u4f1a\u63d0\u4f9b\u4e00\u4e2a\u5206\u5e03\u5f0f\u5b58\u50a8\u670d\u52a1":91,"\u7cfb\u7edf\u4f1a\u6839\u636e\u6587\u4ef6\u540d\u81ea\u52a8\u6784\u5efaop\u548c\u5176\u5bf9\u5e94\u7684python\u6269\u5c55":75,"\u7ebf\u7a0bid\u53f7":105,"\u7ec6\u8282\u63cf\u8ff0":104,"\u7ecf\u5e38\u4f1a\u6d88\u8017\u657010gb\u7684\u5185\u5b58\u548c\u6570gb\u7684\u663e\u5b58":81,"\u7ecf\u8fc7\u6a21\u578b\u5904\u7406\u4e4b\u540e":88,"\u7ed3\u5c3e":75,"\u7ed3\u675f\u6807\u8bb0":114,"\u7ed3\u675f\u9884\u6d4b\u4e4b\u540e":90,"\u7ed3\u6784\u4f53":[89,90],"\u7ed3\u679c\u4f1a\u5199\u5165\u5f53\u524d\u8fd0\u884c\u76ee\u5f55\u4e0b\u7684":90,"\u7ed3\u679c\u5982\u4e0b\u56fe\u6240\u793a":107,"\u7ed3\u8bba":45,"\u7ed9\u4e2a\u7b80\u5355\u7684":72,"\u7ed9\u5b9aencoder\u8f93\u51fa\u548c\u5f53\u524d\u8bcd":113,"\u7edf\u4e00\u7528":11,"\u7ee7\u627f\u81ea":75,"\u7ee7\u627f\u81eaoperatorbas":75,"\u7ef4\u57fa\u767e\u79d1\u4e2d\u6587\u9875\u9762":111,"\u7ef4\u57fa\u767e\u79d1\u9875\u9762":111,"\u7ef4\u7a7a\u95f4":114,"\u7ef4\u7a7a\u95f4\u5b8c\u6210":114,"\u7f13\u5b58\u6c60\u7684\u51cf\u5c0f":81,"\u7f16\u5199":1,"\u7f16\u5199\u4e86\u4e00\u4e2apaddlepaddle\u7684\u7a0b\u5e8f":1,"\u7f16\u5199\u5b8cyaml\u6587\u4ef6\u540e":97,"\u7f16\u5199\u672c\u6b21\u8bad\u7ec3\u7684yaml\u6587\u4ef6":97,"\u7f16\u5199\u6df1\u5ea6\u5b66\u4e60\u7a0b\u5e8f":107,"\u7f16\u5199\u7684\u90e8\u5206":3,"\u7f16\u53f7\u4ece0\u5f00\u59cb":81,"\u7f16\u7801\u5411\u91cf":114,"\u7f16\u7801\u5668\u8f93\u51fa":114,"\u7f16\u7801\u6e90\u5e8f\u5217":114,"\u7f16\u8bd1":[1,72,116],"\u7f16\u8bd1\u51fa\u7684paddlepaddle\u9884\u6d4b\u5e93\u548c\u5934\u6587\u4ef6":87,"\u7f16\u8bd1\u53ca\u5b89\u88c5":2,"\u7f16\u8bd1\u540e\u7684\u6587\u4ef6\u5c06\u88ab\u5b58\u50a8\u5728\u5de5\u4f5c\u76ee\u5f55":77,"\u7f16\u8bd1\u548c\u5b89\u88c5paddlepaddl":118,"\u7f16\u8bd1\u548c\u5b89\u88c5paddlepaddle\u9884\u6d4b\u5e93":[116,117],"\u7f16\u8bd1\u5668":[116,117,118],"\u7f16\u8bd1\u5668\u6ca1\u6709":45,"\u7f16\u8bd1\u5668\u8981\u6c42\u7cfb\u7edf\u652f\u6301":116,"\u7f16\u8bd1\u578b\u8bed\u8a00":45,"\u7f16\u8bd1\u5b89\u88c5\u4e0e\u5355\u5143\u6d4b\u8bd5":80,"\u7f16\u8bd1\u5b89\u88c5\u7ed3\u675f\u4e4b\u540e":116,"\u7f16\u8bd1\u5b8c\u6210\u4e4b\u540e":77,"\u7f16\u8bd1\u5b8c\u6210\u540e\u4f1a\u5728build":0,"\u7f16\u8bd1\u5de5\u5177\u94fe":116,"\u7f16\u8bd1\u5de5\u5177\u94fe\u6240\u5728\u7684\u7edd\u5bf9\u8def\u5f84":118,"\u7f16\u8bd1\u6027\u80fd\u4f1a\u548c":107,"\u7f16\u8bd1\u6210\u529f\u540e":75,"\u7f16\u8bd1\u6210\u529f\u540e\u5728":87,"\u7f16\u8bd1\u6210\u52a8\u6001\u5e93":103,"\u7f16\u8bd1\u65f6\u4e00\u5b9a\u8981\u5f00\u542f\u4f18\u5316":107,"\u7f16\u8bd1\u65f6\u4f1a\u628a\u5bf9\u5e94\u7684\u5934\u6587\u4ef6\u548c\u5e93\u653e\u5728":42,"\u7f16\u8bd1\u65f6\u53ef\u80fd\u4f1a\u53bb\u6389\u8c03\u8bd5\u4fe1\u606f":107,"\u7f16\u8bd1\u65f6\u6307\u5b9a":107,"\u7f16\u8bd1\u751f\u6210":77,"\u7f16\u8bd1\u8fd9\u4e2a\u7248\u672c\u7684docker\u53d1\u884c\u955c\u50cf":63,"\u7f16\u8bd1\u8fd9\u4e2a\u7248\u672c\u7684python":63,"\u7f16\u8bd1c":46,"\u7f16\u8bd1paddlepaddl":0,"\u7f51\u7edc\u5c42\u53ef\u4ee5\u6709\u591a\u4e2a\u8f93\u5165":74,"\u7f51\u7edc\u5c42\u7684\u6807\u8bc6\u7b26\u4e3a":74,"\u7f51\u7edc\u5c42\u7684\u7c7b\u578b":74,"\u7f51\u7edc\u5c42\u7684\u7ec6\u8282\u53ef\u4ee5\u901a\u8fc7\u4e0b\u9762\u8fd9\u4e9b\u4ee3\u7801\u7247\u6bb5\u6765\u6307\u5b9a":74,"\u7f51\u7edc\u5c42\u7684\u8f93\u51fa\u662f\u7ecf\u8fc7\u6fc0\u6d3b\u51fd\u6570\u4e4b\u540e\u7684\u503c":103,"\u7f51\u7edc\u5c42\u914d\u7f6e\u5305\u542b\u4ee5\u4e0b\u51e0\u9879":74,"\u7f51\u7edc\u63a5\u53d7\u4e00\u5e45\u56fe\u7247\u4f5c\u4e3a\u8f93\u5165":90,"\u7f51\u7edc\u7ed3\u6784\u7684\u5e8f\u5217\u5316\u7ed3\u679c\u548c\u6a21\u578b\u53c2\u6570\u5b58\u50a8\u76ee\u5f55":90,"\u7f51\u7edc\u901a\u4fe1":74,"\u7f51\u901f\u6216ssl\u94fe\u63a5\u539f\u56e0":78,"\u800c":[82,107,114],"\u800c\u4e0d\u4f1a\u6539\u53d8\u539f\u6709tensor\u7684shape\u4fe1\u606f":76,"\u800c\u4e0d\u5fc5\u5728\u610fpaddl":46,"\u800c\u4e0d\u652f\u6301pypy\u89e3\u91ca\u5668":45,"\u800c\u4e0d\u662f\u5728layer\u91cc\u5b9e\u73b0":82,"\u800c\u4e0d\u662f\u6e90\u7801\u76ee\u5f55\u91cc":78,"\u800c\u4e0d\u662f\u7279\u5f81\u7684\u96c6\u5408":111,"\u800c\u4e0d\u662f\u76f8\u5bf9":89,"\u800c\u4e0d\u662fc":89,"\u800c\u4e0d\u66b4\u9732\u6982\u5ff5\u7684\u5b9e\u73b0":46,"\u800c\u4e14\u4e2a\u6570\u5e76\u4e0d\u786e\u5b9a":91,"\u800c\u4e14\u5305\u542b\u4e86c":3,"\u800c\u4e14\u5728\u4f20\u8f93\u7684\u8fc7\u7a0b\u4e2d\u4e5f\u53ef\u80fd\u51fa\u73b0\u7f51\u7edc\u4e0d\u7a33\u5b9a\u7684\u60c5\u51b5":27,"\u800c\u4e14cento":3,"\u800c\u4e4b\u524d\u7684\u53c2\u6570\u5c06\u4f1a\u88ab\u5220\u9664":103,"\u800c\u4ece\u5e94\u7528\u7684\u89d2\u5ea6":108,"\u800c\u4f18\u5316\u6027\u80fd\u7684\u9996\u8981\u4efb\u52a1":108,"\u800c\u5176\u4ed6\u5c42\u4f7f\u7528cpu\u8ba1\u7b97":105,"\u800c\u51fa\u73b0\u9636\u6bb5\u6027\u7684\u8fd0\u884c\u505c\u6ede":10,"\u800c\u53cc\u5c42rnn\u662f\u53ef\u4ee5\u5904\u7406\u8fd9\u79cd\u8f93\u5165\u6570\u636e\u7684\u7f51\u7edc\u7ed3\u6784":111,"\u800c\u53cd\u5411\u6d4b\u8bd5\u4e2d":75,"\u800c\u53ea\u9700\u8981\u83b7\u5f97recurr":82,"\u800c\u5728\u8ba1\u7b97\u7ed3\u675f\u4e4b\u540e":76,"\u800c\u5728cpp\u91cc\u9762\u5b9e\u73b0\u8fd9\u4e2ac\u7684\u63a5\u53e3":45,"\u800c\u591a\u8bed\u8a00\u63a5\u53e3\u9700\u8981\u76f4\u63a5\u8bfb\u53d6\u751f\u6210\u7684\u4e8c\u8fdb\u5236":45,"\u800c\u5b89\u88c5\u5305":[3,78],"\u800c\u5b89\u88c5\u5305\u662f":[3,78],"\u800c\u5bf9\u4e8e\u53cc\u5c42\u5e8f\u5217":111,"\u800c\u5bf9\u4e8e\u6bcf\u4e00\u4e2a\u5185\u5c42\u7279\u5f81\u6570\u636e\u800c\u8a00":111,"\u800c\u5bf9\u4e8egolang":45,"\u800c\u5bf9\u4e8egolang\u9519\u8bef\u5904\u7406\u5e94\u8be5\u4f7f\u7528\u8fd4\u56de\u503c":45,"\u800c\u5c06\u8fd9\u4e2a\u6bb5\u843d\u7684\u6bcf\u4e00\u53e5\u8bdd\u7528lstm\u7f16\u7801\u6210\u4e00\u4e2a\u5411\u91cf":111,"\u800c\u5f53\u524d\u5df2\u7ecf\u67095":108,"\u800c\u662f\u5c06\u8f93\u5165":[89,90],"\u800c\u662f\u76f4\u63a5\u4ece\u5185\u5b58\u7684\u7f13\u5b58\u91cc\u8bfb\u53d6\u6570\u636e":81,"\u800c\u662f\u76f4\u63a5\u4fee\u6539paddl":46,"\u800c\u662f\u76f4\u63a5\u7528api\u7684\u63a5\u53e3\u8fdc\u7a0b\u8bbf\u95ee":11,"\u800c\u66f4\u6df1\u5165\u7684\u5206\u6790":108,"\u800c\u6709\u4e9b\u53c2\u6570\u9700\u8981\u5728\u96c6\u7fa4\u591a\u673a\u8bad\u7ec3\u4e2d\u4f7f\u7528\u7b49":102,"\u800c\u6e90\u5e8f\u5217\u7684\u7f16\u7801\u5411\u91cf\u53ef\u4ee5\u88ab\u65e0\u8fb9\u754c\u7684memory\u8bbf\u95ee":114,"\u800c\u795e\u7ecf\u7f51\u7edc\u662f\u6211\u4eec\u8981\u642d\u5efa\u7684\u5b9d\u5854":84,"\u800c\u7a00\u758f\u66f4\u65b0\u5728\u53cd\u5411\u4f20\u64ad\u4e4b\u540e\u7684\u6743\u91cd\u66f4\u65b0\u65f6\u8fdb\u884c":105,"\u800c\u8ba1\u7b97\u8fc7\u7a0b\u662f\u7531":76,"\u800c\u8fd9\u4e00\u53e5\u8bdd\u5c31\u53ef\u4ee5\u8868\u793a\u6210\u8fd9\u4e9b\u4f4d\u7f6e\u7684\u6570\u7ec4":111,"\u800c\u8fd9\u6bcf\u4e00\u4e2a\u6570\u7ec4\u5143\u7d20":111,"\u800c\u975e\u76f4\u63a5\u56de\u590d\u7684\u65b9\u5f0f":72,"\u800c\u975e\u9759\u6001\u52a0\u8f7dcuda\u52a8\u6001\u5e93":0,"\u800ceigenvector":76,"\u800crnn\u662f\u6700\u6d41\u884c\u7684\u9009\u62e9":113,"\u800cswig\u53ea\u80fd\u7b80\u5355\u7684\u66b4\u9732c":45,"\u800ctrainer\u9700\u8981\u8bfb\u53d6\u8bad\u7ec3\u6570\u636e\u8fdb\u884c\u8bad\u7ec3":84,"\u800cy_predict\u662f\u63a5\u6536x\u4f5c\u4e3a\u8f93\u5165":84,"\u8054\u901a":101,"\u80fd\u591f\u5904\u7406\u53cc\u5c42\u5e8f\u5217":113,"\u80fd\u591f\u5bf9\u53cc\u5411\u5e8f\u5217\u8fdb\u884c\u5904\u7406\u7684\u6709":113,"\u80fd\u591f\u8bb0\u5f55\u4e0a\u4e00\u4e2asubseq":113,"\u80fd\u591f\u9488\u5bf9cpu\u548cgpu\u7684\u8ba1\u7b97\u505a\u66f4\u591a\u4f18\u5316":82,"\u80fd\u83b7\u53d6":93,"\u811a\u672c":[0,90,116],"\u811a\u672c\u5f00\u59cb\u65f6":97,"\u811a\u672c\u96c6\u6210\u4e86\u5e8f\u5217\u5316\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u7684\u8fc7\u7a0b":90,"\u81ea\u52a8\u5173\u95ed\u5bf9\u5e94\u7684":72,"\u81ea\u52a8\u5730\u5c06\u8fd9\u4e9b\u9009\u9879\u5e94\u7528\u5230":93,"\u81ea\u52a8\u5b8c\u6210\u8fd9\u4e00\u8fc7\u7a0b":113,"\u81ea\u52a8\u6302\u8f7d\u5206\u5e03\u5f0f\u5b58\u50a8\u76ee\u5f55":10,"\u81ea\u52a8\u6784\u5efa\u72ec\u7acb\u5de5\u5177\u94fe":116,"\u81ea\u52a8\u751f\u6210":77,"\u81ea\u52a8\u83b7\u53d6\u4e0a\u4e00\u4e2a\u751f\u6210\u7684\u8bcd":114,"\u81ea\u52a8\u9009\u62e9":117,"\u81ea\u6b64":[116,117],"\u81ea\u7136\u4e5f\u5c31\u6709\u7ba1\u7406\u5458\u6743\u9650":0,"\u81ea\u7136\u8bed\u8a00\u4e2d\u7684\u53e5\u5b50\u662f\u4e00\u4e2a\u5e8f\u5217":89,"\u81ea\u7136\u8bed\u8a00\u4e2d\u7684\u6bb5\u843d\u662f\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":89,"\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7b49":105,"\u81f3\u4e8e\u4e3a\u4ec0\u4e48\u9700\u8981c":46,"\u81f3\u5c11\u5305\u542bgcc_3":3,"\u81f3\u5c11\u5305\u542bglibcxx_3":3,"\u81f3\u6b64":[72,111],"\u826f\u597d\u7684\u6587\u6863":45,"\u8282\u70b9":[98,101],"\u8282\u7701\u4e86\u4e0d\u5fc5\u8981\u7684\u64cd\u4f5c":42,"\u82e5":74,"\u82e5\u5728paddlepaddle\u7f16\u8bd1\u65f6":83,"\u82e5\u5e0c\u671b\u5f97\u5230\u6700\u5feb\u7684\u6267\u884c\u901f\u5ea6":117,"\u82e5\u5e0c\u671b\u6700\u5feb\u7684\u6267\u884c\u901f\u5ea6":118,"\u82e5\u5e72\u4e2a\u53e5\u5b50\u6784\u6210\u4e00\u4e2a\u6bb5\u843d":110,"\u82e5\u6709\u4e0d\u4e00\u81f4\u4e4b\u5904":108,"\u82e5\u6709\u5fc5\u8981":74,"\u82e5\u672a\u663e\u5f0f\u6307\u5b9a":117,"\u82e5\u6ca1\u6709\u663e\u5f0f\u8bbe\u7f6e":116,"\u82e5\u73af\u5883\u53d8\u91cf":[116,117,118],"\u82e5\u8981\u5bf9\u8fd9\u51e0\u4e2alayer\u4f7f\u7528dropout":82,"\u82e5\u8f93\u51fa\u662f\u5355\u5c42\u5e8f\u5217":110,"\u82e5\u8f93\u51fa\u662f\u53cc\u5c42\u5e8f\u5217":110,"\u82f1\u6587\u6587\u6863":77,"\u82f1\u6587\u6587\u6863\u76ee\u5f55":77,"\u8303\u56f4":105,"\u83b7\u53d6":72,"\u83b7\u53d6\u53ef\u9009\u7684tag":1,"\u83b7\u53d6\u5f53\u524d\u7cfb\u7edf\u652f\u6301\u7684\u5b89\u88c5\u5305\u683c\u5f0f":3,"\u83b7\u53d6\u5f53\u524d\u7cfb\u7edf\u652f\u6301\u7684python\u5305\u7684\u540e\u7f00":78,"\u83b7\u53d6\u6700\u65b0\u7684\u68c0\u67e5\u70b9\u7684\u6587\u4ef6uuid":10,"\u83b7\u53d6\u6e90\u7801":0,"\u83b7\u53d6\u8f93\u51fa\u65f6":90,"\u83b7\u53d6trainer":97,"\u83b7\u5f97\u53c2\u6570\u5c3a\u5bf8":74,"\u83b7\u5f97\u5728\u6a21\u578b\u914d\u7f6e\u4e2d\u67d0\u4e00\u5c42\u7684name":81,"\u83b7\u5f97\u57fa\u672c\u7684docker\u5b89\u88c5\u548c\u4f7f\u7528\u65b9\u6cd5":1,"\u83b7\u5f97\u5f53\u524dmini":81,"\u83b7\u5f97\u6700\u5feb\u7684\u6267\u884c\u901f\u5ea6":116,"\u83b7\u5f97\u7684\u503c\u7c7b\u578b\u5747\u4e3a":81,"\u83b7\u5f97\u8ba1\u7b97\u7ed3\u679c":90,"\u83b7\u5f97\u8fd9\u4e9b\u8282\u70b9\u7684ip\u5730\u5740":93,"\u83b7\u5f97head\u548cnode\u8282\u70b9\u7684ip\u5730\u5740":98,"\u865a\u62df\u673a\u4e0a":0,"\u867d\u7136\u4e0d\u9f13\u52b1\u8fd9\u6837":46,"\u867d\u7136\u5f02\u6b65sgd\u65b9\u5f0f\u4f1a\u63d0\u9ad8\u53c2\u6570\u66f4\u65b0\u5e76\u884c\u5ea6":92,"\u867d\u7136paddle\u770b\u8d77\u6765\u5305\u542b\u4e86\u4f17\u591a\u53c2\u6570":102,"\u884c":89,"\u884c\u504f\u79fb":89,"\u884c\u53f7":107,"\u8865\u5145\u4e0a\u6b21\u7684commit":72,"\u8868\u660e\u4e86\u8fd9\u4e9b\u884c\u7684\u6807\u53f7":74,"\u8868\u660e\u8fd9\u4e2a\u5c42\u7684\u4e00\u4e2a\u5b9e\u4f8b\u662f\u5426\u9700\u8981\u504f\u7f6e":74,"\u8868\u793a":75,"\u8868\u793a\u4e3adeviceid":105,"\u8868\u793a\u5bf9\u8f93\u5165\u6570\u636e":42,"\u8868\u793a\u5c06\u5916\u5c42\u7684outer_mem\u4f5c\u4e3a\u5185\u5c42memory\u7684\u521d\u59cb\u72b6\u6001":111,"\u8868\u793a\u5f53\u524d\u96c6\u7fa4\u4f5c\u4e1a\u7684\u8282\u70b9":93,"\u8868\u793a\u6570\u636e\u7c7b\u578b":75,"\u8868\u793a\u7684\u504f\u79fb\u662f\u4ee5":89,"\u8868\u793a\u8bbe\u5907\u7c7b\u578b":75,"\u8868\u793a\u8bcd\u8bed\u5728\u8bcd\u5178\u4e2d\u7684\u5e8f\u53f7":89,"\u8868\u793a\u8bfb\u8005\u6240\u4f7f\u7528\u7684docker\u955c\u50cf\u4ed3\u5e93\u5730\u5740":97,"\u8868\u793a\u8fd9\u4e2ajob\u7684\u540d\u5b57":97,"\u8868\u793a\u9700\u8981\u6784\u5efa\u63a8\u7406\u5e93":118,"\u88ab":72,"\u88ab\u5207\u5206\u6210\u591a\u4e2a\u90e8\u5206":92,"\u88ab\u6269\u5c55\u4e3a\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"\u88ab\u653e\u5728":74,"\u88ab\u79f0\u4e3a":114,"\u8981\u4f7f\u7528\u547d\u4ee4\u884c\u5206\u6790\u5de5\u5177":108,"\u8981\u5728\u5df2\u6709\u7684kubernetes\u96c6\u7fa4\u4e0a\u8fdb\u884cpaddlepaddle\u7684\u5206\u5e03\u5f0f\u8bad\u7ec3":97,"\u8981\u6c42\u5355\u5c42\u5e8f\u5217\u542b\u6709\u5143\u7d20\u7684\u6570\u76ee":110,"\u8981\u751f\u6210\u7684\u76ee\u6807\u5e8f\u5217":113,"\u8981\u8c03\u7528":74,"\u89c6\u9891\u7b49":89,"\u89e3\u51b3\u529e\u6cd5\u662f":78,"\u89e3\u51b3\u65b9\u6848\u662f":83,"\u89e3\u6790\u73af\u5883\u53d8\u91cf\u5f97\u5230":97,"\u89e3\u7801\u5668\u4f7f\u7528":114,"\u89e3\u7801\u5668\u57fa\u4e8e\u7f16\u7801\u6e90\u5e8f\u5217\u548c\u6700\u540e\u751f\u6210\u7684\u76ee\u6807\u8bcd\u9884\u6d4b\u4e0b\u4e00\u76ee\u6807\u8bcd":114,"\u89e3\u7801\u5668\u662f\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":114,"\u89e3\u91ca\u578b\u8bed\u8a00\u53ea\u80fd\u8c03\u7528\u52a8\u6001\u5e93":45,"\u89e3\u91ca\u6027\u8bed\u8a00\u5b9e\u9645\u8fd0\u884c\u7684\u4e8c\u8fdb\u5236\u662f\u89e3\u91ca\u5668\u672c\u8eab":45,"\u8ba1\u5212\u5728":[41,42],"\u8ba1\u7b97":[92,114],"\u8ba1\u7b97\u504f\u7f6e\u7684\u68af\u5ea6":74,"\u8ba1\u7b97\u53cd\u5411rnn\u7684\u7b2c\u4e00\u4e2a\u5b9e\u4f8b":114,"\u8ba1\u7b97\u53d8\u6362\u77e9\u9635\u7684\u5927\u5c0f\u548c\u683c\u5f0f":74,"\u8ba1\u7b97\u5f53\u524d\u5c42\u6743\u91cd\u7684\u68af\u5ea6":74,"\u8ba1\u7b97\u6548\u7387\u66f4\u9ad8":82,"\u8ba1\u7b97\u6bcf\u4e2a\u8bcd\u7684\u8bcd\u5411\u91cf":114,"\u8ba1\u7b97\u6fc0\u6d3b\u51fd\u6570\u7684\u68af\u5ea6":74,"\u8ba1\u7b97\u7684\u7ec6\u8282\u5c06\u5728\u4e0b\u9762\u7684\u5c0f\u8282\u7ed9\u51fa":74,"\u8ba1\u7b97\u8282\u70b9":92,"\u8ba1\u7b97\u8282\u70b9\u4e4b\u95f4\u4e5f\u4e0d\u4f1a\u76f8\u4e92\u4f9d\u8d56":92,"\u8ba1\u7b97\u8f6c\u6362\u77e9\u9635\u548c\u8f93\u5165\u7684\u68af\u5ea6":74,"\u8ba1\u7b97\u8f93\u5165\u548c\u53c2\u6570\u7684\u68af\u5ea6":74,"\u8ba1\u7b97\u8f93\u5165\u5c42\u7684\u504f\u5dee":74,"\u8ba1\u7b97\u8f93\u51fa":74,"\u8ba1\u7b97\u8fd9\u4e2a\u6587\u4ef6\u7684md5":10,"\u8ba1\u7b97\u96c6\u7fa4\u901a\u5e38\u7531\u4e00\u7ec4":101,"\u8ba1\u7b97\u9700\u8981\u7684\u6570\u636e\u5b58\u653e\u5728":76,"\u8ba9paddle\u6838\u5fc3\u4e2d":46,"\u8bad\u7ec3":102,"\u8bad\u7ec3\u4efb\u52a1\u7684\u8fd0\u884c\u53ef\u80fd\u4f1a\u5360\u6ee1trainer\u548cparamet":10,"\u8bad\u7ec3\u548c\u7eaf\u4f7f\u7528":63,"\u8bad\u7ec3\u5931\u8d25\u65f6\u53ef\u4ee5\u68c0\u67e5\u9519\u8bef\u65e5\u5fd7":93,"\u8bad\u7ec3\u597d\u4e00\u4e2a\u6df1\u5c42\u795e\u7ecf\u7f51\u7edc\u901a\u5e38\u8981\u8017\u8d39\u975e\u5e38\u957f\u7684\u65f6\u95f4":108,"\u8bad\u7ec3\u597d\u7684\u6a21\u578b\u9ed8\u8ba4\u4fdd\u5b58\u5728\u5f53\u524d\u8fd0\u884c\u76ee\u5f55\u4e0b\u7684":90,"\u8bad\u7ec3\u6570\u636e\u6709\u95ee\u9898":81,"\u8bad\u7ec3\u6570\u636e\u683c\u5f0f\u548c\u8bad\u7ec3\u7a0b\u5e8f\u7684":91,"\u8bad\u7ec3\u65f6":97,"\u8bad\u7ec3\u6a21\u578b\u540e":114,"\u8bad\u7ec3\u6a21\u578b\u6b63\u786e\u6027":63,"\u8bad\u7ec3\u7a0b\u5e8f":91,"\u8bad\u7ec3\u7b56\u7565\u7b49\u7b49\u8fd9\u4e9b\u90fd\u662f\u5e38\u89c1\u7684\u53d8\u5316\u56e0\u7d20":104,"\u8bad\u7ec3\u7ed3\u675f\u540e\u67e5\u770b\u8f93\u51fa\u7ed3\u679c":97,"\u8bad\u7ec3\u8282\u70b9\u6570\u91cf":97,"\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u8ddd\u79bb":81,"\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u53c2\u6570\u6216\u8005\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u68af\u5ea6\u5c3a\u5ea6\u8fc7\u5927":81,"\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6d4b\u8bd5test_period":102,"\u8bad\u7ec3\u8fc7\u7a0b\u662f\u5426\u4e3a\u672c\u5730\u6a21\u5f0f":103,"\u8bad\u7ec3\u8fc7\u7a0b\u662f\u5426\u4f7f\u7528gpu":103,"\u8bad\u7ec3\u914d\u7f6e\u4e2d\u7684\u8bbe\u5907\u5c5e\u6027\u5c06\u4f1a\u65e0\u6548":103,"\u8bad\u7ec3dot_period":102,"\u8bb0\u5f55\u4e0b\u6240\u6709\u5931\u8d25\u7684\u4f8b\u5b50":63,"\u8bb0\u5fc6\u6a21\u5757":114,"\u8bbe\u4e3a\u5df2\u90e8\u7f72\u7684\u5de5\u4f5c\u7a7a\u95f4\u76ee\u5f55":93,"\u8bbe\u4e3a\u672c\u5730":93,"\u8bbe\u5b9a":82,"\u8bbe\u7f6e":[0,46,81,82,91,116,117],"\u8bbe\u7f6e\u4e3a":74,"\u8bbe\u7f6e\u4e3a\u4e0d\u540c\u7684\u503c":82,"\u8bbe\u7f6e\u4e3atrue\u4f7f\u7528\u672c\u5730\u8bad\u7ec3\u6216\u8005\u4f7f\u7528\u96c6\u7fa4\u4e0a\u7684\u4e00\u4e2a\u8282\u70b9":103,"\u8bbe\u7f6e\u4e3atrue\u4f7f\u7528gpu\u6a21\u5f0f":103,"\u8bbe\u7f6e\u4e86\u76f8\u540c\u7684\u53d6\u503c":82,"\u8bbe\u7f6e\u5176\u53c2\u6570\u5c5e\u6027":83,"\u8bbe\u7f6e\u53c2\u6570\u7684\u540d\u5b57":83,"\u8bbe\u7f6e\u547d\u4ee4\u884c\u53c2\u6570":81,"\u8bbe\u7f6e\u5b66\u4e60\u7387\u8870\u51cf\u56e0\u5b50\u5206\u6bb5\u51fd\u6570":83,"\u8bbe\u7f6e\u5e8f\u5217\u4fe1\u606f\u7684\u63a5\u53e3":89,"\u8bbe\u7f6e\u6210":83,"\u8bbe\u7f6e\u6210\u4e00\u4e2a\u5c0f\u4e00\u4e9b\u7684\u503c":81,"\u8bbe\u7f6e\u8f93\u51fa\u7684\u5c3a\u5bf8":74,"\u8bbe\u7f6e\u8f93\u51fatensor\u7684\u5f62\u72b6":75,"\u8bbe\u7f6e\u9ed8\u8ba4\u8bbe\u5907\u53f7\u4e3a0":105,"\u8bbe\u7f6egpu":103,"\u8bbf\u95ee\u5bf9\u5e94\u7684\u7f51\u5740":107,"\u8bbf\u95eekubernetes\u7684\u63a5\u53e3\u6765\u67e5\u8be2\u6b64job\u5bf9\u5e94\u7684\u6240\u6709pod\u4fe1\u606f":97,"\u8bc4\u5ba1\u4eba\u4e00\u822c\u4e0d\u505a\u8bc4\u5ba1":72,"\u8bc4\u5ba1\u4eba\u7684\u6bcf\u4e2a\u610f\u89c1\u90fd\u5fc5\u987b\u56de\u590d":72,"\u8bc4\u5ba1\u4eba\u9700\u8981\u9010\u4e00\u67e5\u770b\u6bcf\u4e2acommit\u624d\u80fd\u77e5\u9053\u505a\u4e86\u54ea\u4e9b\u4fee\u6539":72,"\u8bc4\u8bba\u6846\u4e2d\u52a0\u4e0a":72,"\u8bc6\u522b\u6570\u5b57":63,"\u8bcd\u5411\u91cf":63,"\u8bd5\u7740\u8ba9\u8f93\u51fa\u7684\u5206\u6790\u6570\u636e\u548c\u7406\u8bba\u503c\u5bf9\u5e94":108,"\u8be5\u53c2\u6570\u5728\u7f51\u7edc\u914d\u7f6e\u7684output":103,"\u8be5\u53c2\u6570\u5728\u96c6\u7fa4\u63d0\u4ea4\u73af\u5883\u4e2d\u81ea\u52a8\u8bbe\u7f6e":103,"\u8be5\u53c2\u6570\u5df2\u7ecf\u5728\u96c6\u7fa4\u63d0\u4ea4\u73af\u5883\u4e2d\u5b8c\u6210\u8bbe\u7f6e":103,"\u8be5\u53c2\u6570\u5fc5\u987b\u80fd\u88abflag":103,"\u8be5\u53c2\u6570\u6307\u793a\u662f\u5426\u6253\u5370\u65e5\u5fd7\u622a\u65ad\u4fe1\u606f":103,"\u8be5\u53c2\u6570\u6307\u793a\u662f\u5426\u6253\u5370\u9519\u8bef\u622a\u65ad\u65e5\u5fd7":103,"\u8be5\u53c2\u6570\u7528\u4e8e\u6307\u5b9a\u52a8\u6001\u5e93\u8def\u5f84":103,"\u8be5\u53c2\u6570\u7684\u610f\u601d\u662f\u8bad\u7ec3num":103,"\u8be5\u53c2\u6570\u9ed8\u8ba4\u4e3anull":103,"\u8be5\u5c42\u4ec5\u9700\u8981\u8fd9\u4e9b\u975e\u96f6\u6837\u672c\u4f4d\u7f6e\u6240\u5bf9\u5e94\u7684\u53d8\u6362\u77e9\u9635\u7684\u90a3\u4e9b\u884c":74,"\u8be5\u622a\u65ad\u4f1a\u5f71\u54cd":103,"\u8be5\u6279\u6b21\u7684\u8f93\u5165\u4e2d\u4ec5\u6709\u4e00\u4e2a\u5b50\u96c6\u662f\u975e\u96f6\u7684":74,"\u8be5\u63a5\u53e3\u53ef\u7528\u4e8e\u9884\u6d4b\u548c\u5b9a\u5236\u5316\u8bad\u7ec3":0,"\u8be5\u63a5\u53e3\u63a5\u53d7\u4e24\u4e2a\u53c2\u6570":90,"\u8be5\u6570\u76ee\u662f\u63d0\u524d\u5b9a\u4e49\u597d\u7684":103,"\u8be5\u6587\u4ef6\u5bf9\u76f8\u5173gemm":41,"\u8be5\u65b9\u5f0f\u5177\u6709\u4ee5\u4e0b\u4f18\u52bf":2,"\u8be5\u65f6\u95f4\u53bb\u9664\u6389\u672c\u51fd\u6570\u8c03\u7528\u5176\u4ed6\u51fd\u6570\u7684\u65f6\u95f4":107,"\u8be5\u6a21\u578b\u7684\u8bf4\u660e\u5982\u4e0b\u56fe\u6240\u793a":114,"\u8be5\u7c7b\u7684":75,"\u8be5\u7c7b\u7684\u5b9e\u73b0\u7ec6\u8282\u5728":74,"\u8be5\u7c7b\u7ee7\u627f\u4e8epaddlepaddle\u7684\u57fa\u7c7b":42,"\u8be5\u811a\u672c\u4e2d\u8bb0\u5f55\u4e86\u4ea4\u53c9\u7f16\u8bd1android\u7248paddlepaddle\u5e93\u5e38\u7528\u7684cmake\u914d\u7f6e":116,"\u8be5\u8bed\u53e5\u4f1a\u4e3a\u6bcf\u4e2a\u5c42\u521d\u59cb\u5316\u5176\u6240\u9700\u8981\u7684\u53d8\u91cf\u548c\u8fde\u63a5":74,"\u8be5layer\u662f\u901a\u8fc7\u53c2\u6570":82,"\u8be6\u7ec6\u4ecb\u7ecd\u53ef\u4ee5\u53c2\u8003":111,"\u8be6\u7ec6\u4ecb\u7ecd\u8bf7\u53c2\u8003\u8bbe\u8ba1\u6587\u6863":75,"\u8be6\u7ec6\u4fe1\u606f\u8bf7\u68c0\u67e5":93,"\u8be6\u7ec6\u53c2\u8003":0,"\u8be6\u7ec6\u53ef\u53c2\u8003":72,"\u8be6\u7ec6\u6587\u6863\u53c2\u8003":81,"\u8be6\u7ec6\u7684cmake\u4f7f\u7528\u65b9\u6cd5\u53ef\u4ee5\u53c2\u8003":0,"\u8be6\u7ec6\u89c1":110,"\u8be6\u7ec6\u89e3\u91ca\u8fd9\u4e9b\u53c2\u6570\u7684\u5c5e\u6027\u548c\u610f\u4e49":104,"\u8be6\u7ec6\u8bbe\u8ba1":27,"\u8be6\u7ec6\u8bf7\u4e86\u89e3":94,"\u8bed\u610f\u89d2\u8272\u6807\u6ce8":63,"\u8bed\u8a00\u91cd\u6784\u540e\u7684":107,"\u8bf4\u660e":[0,3,10,89],"\u8bf4\u660e\u63d0\u4ea4\u7684\u4ee3\u7801\u5b58\u5728\u95ee\u9898":72,"\u8bf4\u660e\u8fd9\u4e2a\u5c42\u7684\u8f93\u5165":74,"\u8bf7\u4e0d\u8981\u521b\u5efa\u7a7a\u7684":75,"\u8bf7\u4e0d\u8981\u5fd8\u8bb0\u63d0\u524d\u5728\u7269\u7406\u673a\u4e0a\u5b89\u88c5gpu\u6700\u65b0\u9a71\u52a8":1,"\u8bf7\u4fdd\u8bc1travi":72,"\u8bf7\u5148\u4f7f\u7528":[116,117,118],"\u8bf7\u5148\u5c1d\u8bd5\u5728\u4e0b\u9762\u7684\u9875\u9762\u5bfb\u627e\u7b54\u6848":2,"\u8bf7\u53c2\u7167\u4ee5\u4e0b\u6559\u7a0b":2,"\u8bf7\u53c2\u7167\u7f51\u7edc\u914d\u7f6e\u7684\u6587\u6863\u4e86\u89e3\u66f4\u8be6\u7ec6\u7684\u4fe1\u606f":105,"\u8bf7\u53c2\u8003":[46,74,75,78,81,84,90,111],"\u8bf7\u53c2\u8003\u6b64":90,"\u8bf7\u53c2\u89c1":72,"\u8bf7\u53c2\u9605":114,"\u8bf7\u5728\u8be5pull":72,"\u8bf7\u5728\u8f93\u5165\u65f6\u8fdb\u884c\u5408\u6cd5\u6027\u68c0\u67e5":89,"\u8bf7\u60a8\u5230":80,"\u8bf7\u60a8\u6bcf\u6b21\u63d0\u4ea4\u4ee3\u7801\u65f6":72,"\u8bf7\u60a8\u9075\u5b88\u4ee5\u4e0b\u7ea6\u5b9a":72,"\u8bf7\u6307\u5b9a\u7684paddlepaddle\u5de5\u4f5c\u76ee\u5f55\u7ed9\u73af\u5883\u53d8\u91cf":77,"\u8bf7\u6307\u5b9a\u8be5\u76ee\u5f55":103,"\u8bf7\u663e\u793a\u5730\u8c03\u7528":75,"\u8bf7\u6839\u636e\u673a\u5668\u914d\u7f6e\u548c\u7cfb\u7edf\u9009\u62e9\u5bf9\u5e94\u7684\u5b89\u88c5\u5305":2,"\u8bf7\u68c0\u67e5python\u7248\u672c\u662f\u5426\u4e3a2":3,"\u8bf7\u6ce8\u610f":[75,96,114],"\u8bf7\u6ce8\u610f\u662f\u5426\u9700\u8981\u4fee\u6539\u7f51\u7edc\u7ed3\u6784":90,"\u8bf7\u6ce8\u610f\u6bcf\u4e2acommit\u7684\u540d\u79f0":72,"\u8bf7\u6ce8\u610fcommit\u7684\u6570\u91cf":72,"\u8bf7\u76f4\u63a5\u586b\u51450":83,"\u8bf7\u770b\u4e0b\u9762\u7684\u4f8b\u5b50":105,"\u8bf7\u786e\u4fdd":72,"\u8bf7\u7ed9\u51fa\u603b\u4f53\u7684\u4fee\u6539\u60c5\u51b5":72,"\u8bf7\u7ed9\u51fa\u60a8\u81ea\u5df1\u7684\u53cd\u9a73\u7406\u7531":72,"\u8bf7\u9009\u62e9\u5408\u9002\u7684\u8bcd\u6c47":72,"\u8bf7\u9009\u62e9\u6b63\u786e\u7684\u7248\u672c":78,"\u8bf7\u9075\u5b88":72,"\u8bf7\u91c7\u7528":72,"\u8bf7\u9605\u8bfb\u4ee5\u4e0b\u6307\u5357":94,"\u8bf8\u5982\u56fe\u50cf\u5206\u7c7b":105,"\u8bfb\u53d6\u9700\u8981\u7684\u7ed3\u679c\u5373\u53ef":89,"\u8bfb\u53d6volume\u4e2d\u7684\u6570\u636e\u8fdb\u884c\u8fd9\u6b21\u5206\u5e03\u5f0f\u8bad\u7ec3":97,"\u8bfb\u8005\u53ef\u4ee5\u67e5\u770b":97,"\u8bfb\u8005\u9700\u8981\u66ff\u6362\u6210\u81ea\u5df1\u4f7f\u7528\u7684\u4ed3\u5e93\u5730\u5740":97,"\u8c03\u7528":[74,90,117],"\u8c03\u7528\u5bf9\u5e94":76,"\u8c03\u7528\u65b9\u6cd5\u89c1c":[116,117],"\u8c03\u7528\u7528":107,"\u8c03\u7528\u7684\u4e00\u4e9b\u7528\u6237\u5b9a\u4e49\u7684\u5e93\u51fd\u6570":91,"\u8c03\u7528\u7684\u51fd\u6570\u662f\u5426\u652f\u6301\u4e0d\u540c\u8bbe\u5907":75,"\u8c03\u7528\u8be5\u51fd\u6570\u540e":74,"\u8c03\u7528c":[89,90],"\u8d21\u732e\u6587\u6863":77,"\u8d77\u59cb\u5b58\u50a8\u5730\u5740\u4ee5\u6570\u636e\u7684\u5b58\u50a8\u5927\u5c0f\u4e3a\u5355\u4f4d\u7684\u504f\u79fb":89,"\u8df3\u8f6c\u5230":72,"\u8df3\u8fc7":81,"\u8f6c\u5316\u65b9\u6cd5\u5728\u76f8\u5e94\u7684\u9886\u57df\u90fd\u6709\u901a\u7528\u89e3\u51b3\u65b9\u6848":89,"\u8f6c\u6362\u5185\u5b58\u7684\u5de5\u4f5c":42,"\u8f6c\u6362\u5197\u4f59":41,"\u8f6c\u6362\u51fd\u6570":42,"\u8f6c\u6362\u751f\u6210\u7684\u6587\u4ef6\u540d\u4f1a\u662f\u4ee5\u4e0b\u683c\u5f0f":11,"\u8f6c\u6362\u8017\u65f6":41,"\u8f93\u5165":[88,90,110,114],"\u8f93\u5165\u4e86\u6027\u80fd\u5206\u6790\u7ed3\u679c":107,"\u8f93\u5165\u548c\u8f93\u51fa\u90fd\u662f\u5355\u5c42\u5e8f\u5217":113,"\u8f93\u5165\u548c\u8f93\u51fa\u90fd\u662f\u53cc\u5c42\u5e8f\u5217":113,"\u8f93\u5165\u5e8f\u5217\u4e2d\u5143\u7d20\u7684\u603b\u6570":81,"\u8f93\u5165\u6570\u636e\u4e3a\u4e00\u4e2a\u5b8c\u6574\u7684\u65f6\u95f4\u5e8f\u5217":111,"\u8f93\u5165\u6570\u636e\u4e3a\u5728\u5355\u5c42rnn\u6570\u636e\u91cc\u9762":111,"\u8f93\u5165\u6570\u636e\u53ef\u5206\u4e3a":89,"\u8f93\u5165\u6570\u636e\u6574\u4f53\u4e0a\u662f\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217":111,"\u8f93\u5165\u6570\u636e\u7684\u5b57\u5178\u7ef4\u6570\u662f1\u767e\u4e07":105,"\u8f93\u5165\u6570\u636e\u7c7b\u578b":89,"\u8f93\u5165\u662f\u5426\u662f\u8f6c\u7f6e\u7684":74,"\u8f93\u5165\u662f\u7531\u4e00\u4e2alist\u4e2d\u7684\u7f51\u7edc\u5c42\u5b9e\u4f8b\u7684\u540d\u5b57\u7ec4\u6210\u7684":74,"\u8f93\u5165\u68af\u5ea6":42,"\u8f93\u5165\u7684\u540d\u5b57":74,"\u8f93\u5165\u7684\u5927\u5c0f":74,"\u8f93\u5165\u7684\u7c7b\u578b":74,"\u8f93\u5165\u9700\u8981\u9884\u6d4b\u7684\u5411\u91cf\u7ec4":84,"\u8f93\u51fa":[75,90,110,114],"\u8f93\u51fa\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":113,"\u8f93\u51fa\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":113,"\u8f93\u51fa\u4fe1\u606f\u6709\u673a\u5730\u7ec4\u7ec7\u5728\u4e00\u8d77":89,"\u8f93\u51fa\u51fd\u6570":114,"\u8f93\u51fa\u521b\u5efa":[89,90],"\u8f93\u51fa\u5e8f\u5217\u7684\u7c7b\u578b":110,"\u8f93\u51fa\u5e8f\u5217\u7684\u8bcd\u8bed\u6570\u548c\u8f93\u5165\u5e8f\u5217\u4e00\u81f4":113,"\u8f93\u51fa\u6240\u643a\u5e26\u7684\u5e8f\u5217\u4fe1\u606f":89,"\u8f93\u51fa\u6570\u636e\u548c\u8f93\u51fa\u68af\u5ea6":42,"\u8f93\u51fa\u6570\u636e\u548c\u8f93\u51fa\u68af\u5ea6\u7684\u8f6c\u6362":42,"\u8f93\u51fa\u6570\u636e\u662f\u5728\u4e0a\u6587\u4ecb\u7ecd\u7684":89,"\u8f93\u51fa\u6570\u636e\u6709\u673a\u5730\u7ec4\u7ec7\u5728\u4e00\u8d77":90,"\u8f93\u51fa\u6570\u636e\u7ec4\u7ec7":[88,90],"\u8f93\u51fa\u7531":89,"\u8f93\u51fa\u7684\u5e8f\u5217\u4fe1\u606f":89,"\u8f93\u51fa\u7684\u68af\u5ea6":103,"\u8f93\u51fa\u7ed3\u679c\u53ef\u80fd\u4f1a\u968f\u7740\u5bb9\u5668\u7684\u6d88\u8017\u800c\u88ab\u5220\u9664":96,"\u8f93\u51fa\u88ab\u7ec4\u7ec7\u4e3a":89,"\u8f93\u51fa\u88ab\u7ec4\u7ec7\u4e3a\u4e00\u4e2a":89,"\u8f93\u51fa\u90fd\u4f1a\u5bf9\u5e94\u6709\u81ea\u5df1\u7684":[89,90],"\u8fbe\u5230\u5bb9\u707e\u7684\u76ee\u7684":10,"\u8fc7\u4e86\u4e00\u4e2a\u5f88\u7b80\u5355\u7684recurrent_group":111,"\u8fc7\u5b8c\u6240\u6709\u8bad\u7ec3\u6570\u636e\u5373\u4e3a\u4e00\u4e2apass":81,"\u8fc7\u7a0b\u4e2d\u6240\u6709\u65f6\u95f4\u6b65":41,"\u8fd0\u884c":90,"\u8fd0\u884c\u4e00\u4e2a":0,"\u8fd0\u884c\u4e0b\u9762\u547d\u4ee4\u53ef\u4ee5\u8fdb\u884c\u7f16\u8bd1":75,"\u8fd0\u884c\u5355\u5143\u6d4b\u8bd5\u6d4b\u65f6\u9700\u8981\u7f16\u8bd1\u6574\u4e2a\u5de5\u7a0b":75,"\u8fd0\u884c\u5931\u8d25":105,"\u8fd0\u884c\u5b8c\u6210\u540e":93,"\u8fd0\u884c\u5b8c\u6bd5\u540e\u8f93\u51fa":107,"\u8fd0\u884c\u6027\u80fd\u5206\u6790\u7684\u65f6\u5019":107,"\u8fd0\u884c\u65e5\u5fd7":93,"\u8fd0\u884c\u65f6\u4e5f\u53ef\u80fd\u56e0\u4e3a\u591a\u7ebf\u7a0b\u4ea7\u751f\u6df7\u4e71\u4e0d\u53ef\u8bfb\u7684\u6027\u80fd\u5206\u6790\u7ed3\u679c":107,"\u8fd0\u884c\u65f6\u4f1a\u81ea\u52a8\u627e\u5230\u7cfb\u7edf\u4e2d\u5b89\u88c5\u7684cuda\u548ccudnn\u5e93\u8fdb\u884c\u7f16\u8bd1\u548c\u6267\u884c":0,"\u8fd0\u884c\u65f6c":90,"\u8fd0\u884c\u73af\u5883":104,"\u8fd0\u884c\u7684\u4e00\u4e9b\u53c2\u6570\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u4f20\u9012\u5230\u5bb9\u5668\u5185":97,"\u8fd0\u884c\u8be5\u7f16\u8bd1\u5de5\u5177\u94fe\u9700\u8981\u4e00\u53f0":118,"\u8fd0\u884c\u9636\u6bb5":104,"\u8fd1\u671f\u76ee\u6807":42,"\u8fd4\u56de\u7684\u662f":84,"\u8fd4\u56de\u7b2c\u4e8c\u6b65":63,"\u8fd4\u56de\u7b2ci\u4e2a\u8f93\u5165\u77e9\u9635":74,"\u8fd4\u56depython\u7aef\u7684\u8ba1\u7b97\u7ed3\u679c":75,"\u8fd8\u4f1a\u4e0b\u8f7dmkl":0,"\u8fd8\u4f1a\u5f3a\u5236\u8bbe\u7f6e\u4e00\u4e9bpaddlepaddle\u53c2\u6570\u7684\u503c":116,"\u8fd8\u4f1a\u8f93\u51fa\u4e00\u4e2a":72,"\u8fd8\u53ef\u4ee5\u901a\u8fc7\u51cf\u5c0f\u5b66\u4e60\u7387\u6216\u8005\u5bf9\u6570\u636e\u8fdb\u884c\u5f52\u4e00\u5316\u5904\u7406\u6765\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898":81,"\u8fd8\u662f\u4ece":11,"\u8fd8\u662f\u865a\u62df\u673a":0,"\u8fd8\u9700\u8981\u5728\u8282\u70b9\u4e0a\u5b89\u88c5\u5bf9\u5e94\u7684gpu\u9a71\u52a8\u4ee5\u53cacuda":101,"\u8fd8\u9700\u8981\u91cd\u5199":75,"\u8fd9":81,"\u8fd98\u79cdlearning_rate_schedule\u53ca\u5176\u5bf9\u5e94\u5b66\u4e60\u7387\u8ba1\u7b97\u65b9\u5f0f\u5982\u4e0b":83,"\u8fd9\u4e00\u4e2a\u5e93":87,"\u8fd9\u4e00\u5757\u7684\u8017\u65f6\u6bd4\u4f8b\u771f\u7684\u592a\u9ad8":108,"\u8fd9\u4e00\u5c42\u8fdb\u884c\u5c01\u88c5":46,"\u8fd9\u4e00\u6570\u636e\u683c\u5f0f\u7684\u8f6c\u6362\u64cd\u4f5c":41,"\u8fd9\u4e00\u6982\u5ff5\u4e0d\u518d\u7410\u788e":46,"\u8fd9\u4e00\u8282\u5bf9\u56fe1\u4e2d\u9884\u6d4b\u4ee3\u7801\u7f16\u5199\u76845\u4e2a\u6b65\u9aa4\u8fdb\u884c\u4ecb\u7ecd\u548c\u8bf4\u660e":90,"\u8fd9\u4e00\u8ba1\u7b97\u5355\u5143":82,"\u8fd9\u4e00\u8fc7\u7a0b\u5bf9\u7528\u6237\u662f\u5b8c\u5168\u900f\u660e\u7684":113,"\u8fd9\u4e09\u4e2a\u5206\u652f":63,"\u8fd9\u4e24\u4e2a\u6307\u6807\u4ee3\u8868\u4e86\u67d0\u4e00\u4e2a\u51fd\u6570\u771f\u5b9e\u7684\u8fd0\u884c\u65f6\u95f4":107,"\u8fd9\u4e2a":3,"\u8fd9\u4e2a\u4efb\u52a1\u7684\u914d\u7f6e\u4e3a":81,"\u8fd9\u4e2a\u4efb\u52a1\u7684dataprovider\u4e3a":81,"\u8fd9\u4e2a\u4f8b\u5b50\u6709\u4e24\u5904\u4e0d\u540c":75,"\u8fd9\u4e2a\u51fd\u6570\u672c\u8eab\u4f1a\u5728\u8ba1\u7b97\u524d\u5c06\u539f\u6570\u636e\u8f6c\u6362\u4e3a\u66f4\u9002\u5408\u82f1\u7279\u5c14\u5e73\u53f0\u7684\u5185\u90e8\u683c\u5f0f":41,"\u8fd9\u4e2a\u51fd\u6570\u7684":114,"\u8fd9\u4e2a\u51fd\u6570\u8fdb\u884c\u53d8\u6362":111,"\u8fd9\u4e2a\u51fd\u6570\u9700\u8981\u8bbe\u7f6e":114,"\u8fd9\u4e2a\u52a8\u6001\u5e93\u7684\u8fde\u63a5\u53c2\u6570\u4e0epaddle\u7684\u5176\u4ed6\u4e8c\u8fdb\u5236":46,"\u8fd9\u4e2a\u53c2\u6570\u4e5f\u4e0d\u4f1a\u4e00\u5e76\u5220\u9664":46,"\u8fd9\u4e2a\u5730\u5740\u6765\u8868\u793a\u6b64\u6b65\u9aa4\u6240\u6784\u5efa\u51fa\u7684\u955c\u50cf":97,"\u8fd9\u4e2a\u57fa\u7c7b":74,"\u8fd9\u4e2a\u5934\u6587\u4ef6\u4e0d\u5047\u8bbe\u5176\u4ed6\u6587\u4ef6\u7684\u5f15\u7528\u987a\u5e8f":46,"\u8fd9\u4e2a\u5e8f\u5217\u7684\u6bcf\u4e2a\u5143\u7d20\u53c8\u662f\u4e00\u4e2a\u5e8f\u5217":113,"\u8fd9\u4e2a\u60c5\u51b5\u4e0b\u6240\u6709\u7684\u6587\u4ef6\u4f1a\u5b58\u5728\u6574\u7406\u8fc7\u7684\u7684\u6587\u4ef6\u76ee\u5f55":77,"\u8fd9\u4e2a\u63a5\u53e3\u9700\u8981\u505a\u5230":45,"\u8fd9\u4e2a\u6570\u636e\u4e5f\u88ab\u5355\u5c42rnn\u7f51\u7edc\u76f4\u63a5\u4f7f\u7528":111,"\u8fd9\u4e2a\u6587\u4ef6\u5177\u6709\u72ec\u7279\u7684\u8bed\u6cd5":45,"\u8fd9\u4e2a\u662f\u76ee\u524d\u63a8\u8350\u7684\u4f7f\u7528\u65b9\u6cd5":77,"\u8fd9\u4e2a\u73af\u5883\u53d8\u91cf\u5173\u95edopenmp\u4f18\u5316":107,"\u8fd9\u4e2a\u76ee\u5f55\u4e2d\u9664\u4e86":46,"\u8fd9\u4e2a\u793a\u4f8b":90,"\u8fd9\u4e2a\u795e\u7ecf\u7f51\u7edc\u5355\u5143\u5c31\u53ebmemori":111,"\u8fd9\u4e2a\u7c7b\u7684\u53c2\u6570\u5305\u62ec":74,"\u8fd9\u4e2a\u7c7b\u9700\u8981\u7ee7\u627f":74,"\u8fd9\u4e2a\u7ed3\u6784\u4f53\u4e2d\u7684\u53e6\u4e00\u4e2a\u9879\u76ee\u662f":46,"\u8fd9\u4e2a\u7ed3\u6784\u4f53\u5305\u542b\u4e24\u4e2a\u9879\u76ee":46,"\u8fd9\u4e2a\u811a\u672c\u8c03\u7528":0,"\u8fd9\u4e2a\u8f93\u5165\u4e0d\u53c2\u4e0e":75,"\u8fd9\u4e2a\u8fc7\u7a0b\u5bf9\u7528\u6237\u4e5f\u662f\u900f\u660e\u7684":113,"\u8fd9\u4e2a\u8fc7\u7a0b\u9664\u4e86\u7f16\u8bd1paddlepaddle\u4e3a":72,"\u8fd9\u4e2a\u9009\u62e9":[41,42],"\u8fd9\u4e2a\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u751f\u6210\u4e00\u7cfb\u5217\u6743\u91cd":114,"\u8fd9\u4e2a\u9759\u6001\u5e93\u5305\u542b\u4e86paddle\u7684\u5168\u90e8\u7b26\u53f7":46,"\u8fd9\u4e2ainstance\u53ef\u4ee5\u662f\u5355\u4e2a\u503c":11,"\u8fd9\u4e2aissu":0,"\u8fd9\u4e2ajob\u624d\u7b97\u6210\u529f\u7ed3\u675f":97,"\u8fd9\u4e2alayer\u7684\u8f93\u51fa\u4f1a\u4f5c\u4e3a\u6574\u4e2a":113,"\u8fd9\u4e5f\u4f1a\u6781\u5927\u51cf\u5c11\u6570\u636e\u8bfb\u5165\u7684\u8017\u65f6":81,"\u8fd9\u4e9b\u4f1a\u5728":[41,42],"\u8fd9\u4e9b\u51fd\u6570\u4f1a\u5c06\u5bf9\u5e94\u5185\u5bb9\u6dfb\u52a0\u5230":75,"\u8fd9\u4e9b\u51fd\u6570\u4f1a\u6839\u636e\u8f93\u5165\u53c2\u6570\u91cd\u65b0\u8bbe\u7f6e\u5185\u90e8\u548c\u5916\u90e8\u5b58\u50a8":42,"\u8fd9\u4e9b\u5206\u5e03\u5f0f\u5b58\u50a8\u670d\u52a1\u901a\u5e38\u4f1a\u628a\u6570\u636e\u5207\u5272\u6210\u591a\u4e2a\u5206\u7247\u5206\u5e03\u5f0f\u7684\u5b58\u50a8\u5728\u591a\u4e2a\u8282\u70b9\u4e4b\u4e0a":11,"\u8fd9\u4e9b\u53c2\u6570\u53ef\u4ee5\u901a\u8fc7":91,"\u8fd9\u4e9b\u53c2\u6570\u7684\u5177\u4f53\u63cf\u8ff0":97,"\u8fd9\u4e9b\u540d\u5b57\u5fc5\u987b\u8981\u5199\u5bf9":74,"\u8fd9\u4e9b\u6570\u636e\u4f1a\u88ab\u7528\u6765\u66f4\u65b0\u53c2\u6570":81,"\u8fd9\u4e9b\u6570\u636e\u4f7f\u7528\u7684\u5185\u5b58\u4e3b\u8981\u548c\u4e24\u4e2a\u53c2\u6570\u6709\u5173\u7cfb":81,"\u8fd9\u4e9b\u7279\u5f81\u6570\u636e\u4e4b\u95f4\u7684\u987a\u5e8f\u662f\u6709\u610f\u4e49\u7684":111,"\u8fd9\u4e9b\u955c\u50cf\u4e5f\u53ef\u4ee5\u4ece":63,"\u8fd9\u4efd\u6559\u7a0b\u5c55\u793a\u4e86\u5982\u4f55\u5728paddlepaddle\u4e2d\u5b9e\u73b0\u4e00\u4e2a\u81ea\u5b9a\u4e49\u7684\u7f51\u7edc\u5c42":74,"\u8fd9\u4f1a\u63d0\u793a\u5f53\u524d\u76ee\u5f55\u7684\u4e00\u4e9b\u53d8\u5316":72,"\u8fd9\u4f1a\u7ed9\u8bc4\u5ba1\u4eba\u5e26\u6765\u5f88\u5927\u56f0\u6270":72,"\u8fd9\u4f1a\u81ea\u52a8\u8fdb\u884c\u7f51\u7edc\u914d\u7f6e\u4e2d\u58f0\u660e\u7684\u6fc0\u6d3b\u64cd\u4f5c":74,"\u8fd9\u4fbf\u662f\u4e00\u79cd\u53cc\u5c42rnn\u7684\u8f93\u5165\u6570\u636e":111,"\u8fd9\u51e0\u4e2a\u7f16\u8bd1\u9009\u9879\u7684\u8bbe\u7f6e":0,"\u8fd9\u53e5\u8868\u793a\u4f7f\u7528\u57fa\u7c7b":75,"\u8fd9\u53ef\u4ee5\u5e2e\u60a8\u7701\u6389\u82b1\u4e00\u5c0f\u65f6\u5b89\u88c5\u548c\u914d\u7f6e\u5404\u79cd\u5f00\u53d1\u5de5\u5177":0,"\u8fd9\u53ef\u4ee5\u8ba9\u5176\u4ed6\u4eba\u77e5\u9053\u8fd9\u6b21\u63d0\u4ea4\u505a\u4e86\u54ea\u4e9b\u6539\u53d8":72,"\u8fd9\u53ef\u4ee5\u901a\u8fc7":72,"\u8fd9\u548c\u5355\u5c42rnn\u7684\u914d\u7f6e\u662f\u7b49\u4ef7\u7684":111,"\u8fd9\u56db\u4e2a\u5e8f\u5217\u53c8\u5206\u522b\u542b\u67093":89,"\u8fd9\u56db\u6761\u6570\u636e\u540c\u65f6\u5904\u7406\u7684\u53e5\u5b50\u6570\u91cf\u4e3a":111,"\u8fd9\u5728\u6784\u9020\u975e\u5e38\u590d\u6742\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u65f6\u662f\u6709\u7528\u7684":114,"\u8fd9\u5bf9\u4e8e\u901a\u5e38\u7684java\u7684\u5f00\u53d1\u8005\u6765\u8bf4":45,"\u8fd9\u5c06\u4f1a\u5bfc\u81f4\u5355\u5143\u6d4b\u8bd5\u51fa\u9519":75,"\u8fd9\u5c06\u4f1a\u5bfc\u81f4\u7f16\u8bd1\u51fa\u9519":75,"\u8fd9\u610f\u5473\u7740":114,"\u8fd9\u610f\u5473\u7740\u9664\u4e86\u6307\u5b9adevic":105,"\u8fd9\u65f6":[81,90],"\u8fd9\u65f6\u5728\u4f7f\u7528":83,"\u8fd9\u65f6\u7684":89,"\u8fd9\u65f6\u7684\u9700\u8981\u540c\u65f6\u63d0\u4f9b":89,"\u8fd9\u65f6\u884c\u504f\u79fb\u548c\u5217\u53f7\u6307\u5b9a\u7684\u5143\u7d20\u9ed8\u8ba4\u5176\u503c\u4e3a1":89,"\u8fd9\u65f6\u8fdb\u884c\u77e9\u9635\u4e58\u6cd5\u8fd0\u7b97\u5c31\u53ef\u80fd\u5bfc\u81f4\u6d6e\u70b9\u6570\u6ea2\u51fa":81,"\u8fd9\u65f6\u9700\u8981\u8c03\u7528\u521b\u5efa\u5e8f\u5217\u4fe1\u606f\u548c\u4e3a":89,"\u8fd9\u662f\u4e00\u79cd\u6309\u5df2\u8bad\u7ec3\u6837\u672c\u6570\u5206\u6bb5\u53d6\u503c\u7684\u5b66\u4e60\u7387\u9000\u706b\u65b9\u6cd5":83,"\u8fd9\u662f\u4e00\u79cd\u6309\u5df2\u8bad\u7ec3pass\u6570\u5206\u6bb5\u53d6\u503c\u7684\u5b66\u4e60\u7387\u9000\u706b\u65b9\u6cd5":83,"\u8fd9\u662f\u4e00\u79cd\u975e\u5e38\u7075\u6d3b\u7684\u6570\u636e\u7ec4\u7ec7\u65b9\u5f0f":110,"\u8fd9\u662f\u56e0\u4e3a":45,"\u8fd9\u662f\u5f00\u6e90\u793e\u533a\u7684\u57fa\u672c\u793c\u8c8c":72,"\u8fd9\u662f\u666e\u901a\u7684\u5355\u5c42\u65f6\u95f4\u5e8f\u5217\u7684dataprovider\u4ee3\u7801":111,"\u8fd9\u662f\u6700\u4fbf\u6377\u7684\u5b89\u88c5\u65b9\u5f0f":2,"\u8fd9\u662f\u76ee\u524dcmake\u5bfb\u627epython\u7684\u903b\u8f91\u5b58\u5728\u7f3a\u9677":78,"\u8fd9\u6837":[46,92],"\u8fd9\u6837\u4e0b\u4e00\u4e2acpu":42,"\u8fd9\u6837\u4fdd\u5b58\u5728\u5206\u5e03\u5f0f\u5b58\u50a8\u4e2d\u7684\u6570\u636e\u53ef\u4ee5\u88ab\u96c6\u7fa4\u4e2d\u7684\u6bcf\u4e2a\u8282\u70b9\u8bfb\u53d6\u5230":91,"\u8fd9\u6837\u4fdd\u8bc1":63,"\u8fd9\u6837\u4fdd\u8bc1\u8fd0\u884c\u7ed3\u675f\u4e4b\u540e\u7684":0,"\u8fd9\u6837\u505a\u53ef\u4ee5\u6781\u5927\u7684\u51cf\u5c11\u5185\u5b58\u5360\u7528":81,"\u8fd9\u6837\u53ef\u4ee5\u514d\u53bb\u5355\u72ec\u5b89\u88c5\u7f16\u8bd1\u4f9d\u8d56\u7684\u6b65\u9aa4":0,"\u8fd9\u6837\u53ef\u4ee5\u51cf\u5c0fgpu\u5185\u5b58":105,"\u8fd9\u6837\u5982\u679c\u9047\u5230\u95ee\u9898":0,"\u8fd9\u6837\u5bb9\u5668\u7684":97,"\u8fd9\u6837\u5c31\u53ef\u4ee5\u5728\u4e91\u7aef\u6267\u884c\u591a\u79cd\u6570\u636e\u7c7b\u8ba1\u7b97\u4efb\u52a1":11,"\u8fd9\u6837\u5df2\u7ecf\u4f20\u8f93\u6210\u529f\u7684\u90e8\u5206\u5c31\u4e0d\u7528\u91cd\u65b0\u4f20\u8f93\u4e86":27,"\u8fd9\u6837\u5e26\u6765\u7684\u597d\u5904\u5c31\u662f\u4e0d\u9700\u8981\u4e00\u76f4\u6e05\u7a7amemori":42,"\u8fd9\u6837\u5f53\u8be5pull":72,"\u8fd9\u6837\u65e2\u4f7f\u5f97\u6700\u7ec8\u4fdd\u5b58\u7684\u53c2\u6570\u683c\u5f0f\u4e0epaddlepaddle\u4e00\u81f4":42,"\u8fd9\u6837\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u8ba1\u7b97\u7684\u5e76\u884c\u6027":92,"\u8fd9\u6837\u7684\u88c5\u9970\u5668":74,"\u8fd9\u6837\u7684\u8bdd":96,"\u8fd9\u6837\u8bad\u7ec3\u6587\u4ef6\u7684\u4e2a\u6570\u4f1a\u6bd4\u8f83\u591a":91,"\u8fd9\u6b63\u662f\u5b83\u4eec\u901f\u5ea6\u5feb\u7684\u539f\u56e0":108,"\u8fd9\u7528\u4e8e\u5728\u591a\u7ebf\u7a0b\u548c\u591a\u673a\u4e0a\u66f4\u65b0\u53c2\u6570":74,"\u8fd9\u79cd\u521d\u59cb\u5316\u65b9\u5f0f\u5728\u4e00\u822c\u60c5\u51b5\u4e0b\u4e0d\u4f1a\u4ea7\u751f\u5f88\u5dee\u7684\u7ed3\u679c":83,"\u8fd9\u79cd\u5b89\u88c5\u65b9\u5f0f\u4f1a\u6d89\u53ca\u5230\u4e00\u4e9b\u7b2c\u4e09\u65b9\u5e93\u7684\u4e0b\u8f7d":2,"\u8fd9\u79cd\u60c5\u51b5\u4e0b\u4e0d\u9700\u8981\u91cd\u5199\u8be5\u51fd\u6570":74,"\u8fd9\u79cd\u60c5\u51b5\u591a\u51fa\u73b0\u5728\u4f7f\u7528\u591a\u7ebf\u7a0b\u9884\u6d4b\u65f6":90,"\u8fd9\u79cd\u60c5\u51b5\u5e38\u5e38\u53d1\u751f\u5728":81,"\u8fd9\u79cd\u65b9\u5f0f\u5bf9\u5185\u5b58\u6d88\u8017\u8f83\u5927":82,"\u8fd9\u79cd\u65b9\u5f0f\u5fc5\u987b\u4f7f\u7528paddle\u5b58\u50a8\u7684\u6a21\u578b\u8def\u5f84\u683c\u5f0f":105,"\u8fd9\u79cd\u65b9\u5f0f\u6700\u4e3a\u7b80\u4fbf":87,"\u8fd9\u79cd\u751f\u6210\u6280\u672f\u53ea\u7528\u4e8e\u7c7b\u4f3c\u89e3\u7801\u5668\u7684\u751f\u6210\u8fc7\u7a0b":114,"\u8fd9\u79cd\u7c7b\u578b\u7684\u8f93\u5165\u5fc5\u987b\u901a\u8fc7":113,"\u8fd9\u79cd\u94fe\u63a5\u65b9\u5f0f\u4e3b\u8981\u7528\u4e8e\u79fb\u52a8\u7aef\u9884\u6d4b":87,"\u8fd9\u79cd\u96c6\u7fa4\u8282\u70b9\u7ba1\u7406\u65b9\u5f0f\u4f1a\u5728\u5c06\u6765\u4f7f\u7528":97,"\u8fd9\u7bc7":1,"\u8fd9\u7bc7\u6587\u6863":72,"\u8fd9\u7bc7\u6587\u6863\u4e4b\u540e\u90e8\u5206\u4f1a\u4f7f\u7528":90,"\u8fd9\u7bc7\u6587\u6863\u4e4b\u540e\u90e8\u5206\u4f1a\u7edf\u4e00\u4f7f\u7528":89,"\u8fd9\u7bc7\u6587\u6863\u4e4b\u540e\u90e8\u5206\u5c06\u4f1a\u7edf\u4e00\u4f7f\u7528":89,"\u8fd9\u7bc7\u6587\u6863\u4ecb\u7ecd":90,"\u8fd9\u7bc7\u6587\u6863\u4ecb\u7ecd\u5728":118,"\u8fd9\u7bc7\u6587\u6863\u4ecb\u7ecd\u5728\u4f7f\u7528":89,"\u8fd9\u7bc7\u6587\u6863\u4ecb\u7ecd\u57fa\u4e8e":0,"\u8fd9\u7bc7\u6587\u6863\u7684\u4e4b\u540e\u90e8\u5206\u4f1a\u4f7f\u7528":90,"\u8fd9\u7bc7\u6587\u7ae0":0,"\u8fd9\u7ec4\u8bed\u4e49\u76f8\u540c\u7684\u793a\u4f8b\u914d\u7f6e\u5982\u4e0b":111,"\u8fd9\u884c\u547d\u4ee4\u4e2d":107,"\u8fd9\u901a\u8fc7\u83b7\u5f97\u53cd\u5411\u5faa\u73af\u7f51\u7edc\u7684\u7b2c\u4e00\u4e2a\u5b9e\u4f8b":114,"\u8fd9\u90fd\u9700\u8981\u8fd9\u4e2a\u63a5\u53e3\u6309\u7167\u7ea6\u5b9a\u4fd7\u6210\u7684\u89c4\u5219\u6765\u6ce8\u91ca\u5b8c\u5907":45,"\u8fd9\u91cc":[0,42,72,83,97,114],"\u8fd9\u91cc\u4e0d\u518d\u8d58\u8ff0":75,"\u8fd9\u91cc\u4ecb\u7ecdc":90,"\u8fd9\u91cc\u4f7f\u7528\u4e86\u7528":107,"\u8fd9\u91cc\u4f7f\u7528\u4e86paddlepaddle\u9884\u5b9a\u4e49\u597d\u7684rnn\u5904\u7406\u51fd\u6570":111,"\u8fd9\u91cc\u4f7f\u7528\u7b80\u5355\u7684":81,"\u8fd9\u91cc\u5c06\u4ecb\u7ecdpaddlepaddle\u7684\u57fa\u672c\u4f7f\u7528\u6982\u5ff5":84,"\u8fd9\u91cc\u6211\u4eec\u5c55\u793a\u4e00\u4efd\u7b80\u5316\u8fc7\u7684\u4ee3\u7801":74,"\u8fd9\u91cc\u6211\u4eec\u901a\u8fc7\u5728kubernetes\u96c6\u7fa4\u4e0a\u542f\u52a8\u4e00\u4e2ajob\u6765\u4e0b\u8f7d\u5e76\u5207\u5272\u6570\u636e":97,"\u8fd9\u91cc\u6709\u4e24\u79cd\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6cd5":81,"\u8fd9\u91cc\u68c0\u9a8c\u8fd0\u884c\u65f6\u95f4\u6a21\u578b\u7684\u6536\u655b":93,"\u8fd9\u91cc\u7684dockerimage\u4f5c\u4e3a\u7f16\u8bd1\u73af\u5883\u4ee5\u652f\u6301\u66f4\u591a\u7684linux":63,"\u8fd9\u91cc\u7684eigentensor\u4e4b\u95f4\u7684\u8fd0\u7b97\u53ea\u662f\u6539\u53d8\u4e86\u539f\u6709tensor\u4e2d\u7684\u6570\u636e":76,"\u8fd9\u91cc\u9009\u62e90":63,"\u8fd9\u91cc\u9700\u8981\u7528\u6237\u989d\u5916\u6ce8\u610f":10,"\u8fd9\u9700\u8981\u8054\u5408\u6211\u4eec\u7b2c\u4e8c\u8282":107,"\u8fdb\u4e00\u6b65\u4f18\u5316":42,"\u8fdb\u4e3b\u4ed3\u5e93\u540e":72,"\u8fdb\u5165":63,"\u8fdb\u5165\u5bb9\u5668":96,"\u8fdb\u5165\u5bf9\u5e94\u7684\u76ee\u5f55":78,"\u8fdb\u7a0b\u542f\u52a8\u7684\u5fc5\u8981\u53c2\u6570":97,"\u8fdb\u7a0b\u7684":93,"\u8fdb\u7a0b\u7684\u542f\u52a8\u53c2\u6570":97,"\u8fdb\u7a0b\u7684\u8fd0\u884c\u73af\u5883":97,"\u8fdb\u7a0b\u9700\u8981\u7684":97,"\u8fdb\u800c\u591a\u673a":107,"\u8fdb\u800c\u6211\u4eec\u53ef\u4ee5\u4f7f\u7528\u5982\u4e0b\u547d\u4ee4\u5f00\u542f\u4e00\u4e2ahttp\u670d\u52a1":107,"\u8fdb\u800c\u6307\u5b9a\u4e86python\u53ef\u6267\u884c\u6587\u4ef6\u7684\u8def\u5f84":107,"\u8fdb\u800c\u8fdb\u884c\u4ee3\u7801\u8bc4\u5ba1":63,"\u8fdb\u884c\u4e86":111,"\u8fdb\u884c\u5206\u5e03\u5f0f\u8bad\u7ec3\u7684\u65b9\u6848":97,"\u8fdb\u884c\u5206\u5e03\u5f0f\u8bad\u7ec3\u7684\u65b9\u6cd5":97,"\u8fdb\u884c\u524d\u5411\u8ba1\u7b97":90,"\u8fdb\u884c\u56de\u590d":72,"\u8fdb\u884c\u5e8f\u5217\u5316":90,"\u8fdb\u884c\u5f00\u53d1":72,"\u8fdb\u884c\u62c6\u89e3":111,"\u8fdb\u884c\u6fc0\u6d3b\u64cd\u4f5c":74,"\u8fdb\u884c\u7f16\u8bd1\u548c\u5b89\u88c5":116,"\u8fdb\u884c\u8bad\u7ec3":90,"\u8fdb\u884c\u8bbe\u7f6e":75,"\u8fdb\u884c\u90e8\u7f72":94,"\u8fdb\u884c\u94fe\u63a5":87,"\u8fdb\u884c\u9884\u6d4b\u4f9d\u8d56\u4e8e\u5c06":87,"\u8fdb\u884c\u9884\u6d4b\u65f6":90,"\u8fdb\u884cpython\u4e0ec":107,"\u8fdb\u9636\u4f7f\u7528":115,"\u8fdb\u9636\u6307\u5357":84,"\u8fde\u63a5":113,"\u8fde\u63a5\u5230pserver\u7684\u7aef\u53e3":91,"\u8fde\u63a5\u5230pserver\u7684\u7aef\u53e3\u4e2a\u6570":91,"\u9000\u51fa\u5bb9\u5668":96,"\u9009\u62e9\u4e0b\u8f7d\u4f7f\u7528\u4e0d\u540c\u7684blas\u5e93\u7684docker\u955c\u50cf":1,"\u9009\u62e9\u662f\u5426\u7f16\u8bd1mkl":42,"\u9009\u62e9\u76ee\u6807\u5206\u652f":72,"\u9009\u62e9\u8def\u5f84\u6765\u52a8\u6001\u52a0\u8f7dnvidia":103,"\u9009\u62e9\u9700\u8981\u53d1\u5e03\u7684\u7248\u672c":63,"\u9009\u9879":0,"\u900f\u4f20\u7528\u6237\u8eab\u4efd\u7684\u529e\u6cd5":27,"\u9012\u5f52\u795e\u7ecf\u7f51\u7edc":102,"\u901a\u5e38":[46,75,107],"\u901a\u5e38\u4f1a\u4f7f\u7528\u73af\u5883\u53d8\u91cf\u914d\u7f6ejob\u7684\u914d\u7f6e\u4fe1\u606f":97,"\u901a\u5e38\u4f1a\u4f7f\u7528mapreduce\u4efb\u52a1\u7684\u8f93\u51fa\u7ed3\u679c\u4f5c\u4e3a\u8bad\u7ec3\u7ed3\u679c":91,"\u901a\u5e38\u4f7f\u7528\u7a00\u758f\u8bad\u7ec3\u6765\u52a0\u901f\u8ba1\u7b97\u8fc7\u7a0b":105,"\u901a\u5e38\u4f7f\u7528cento":3,"\u901a\u5e38\u505a\u6cd5\u662f\u4ece\u4e00\u4e2a\u6bd4\u8f83\u5927\u7684learning_rate\u5f00\u59cb\u8bd5":83,"\u901a\u5e38\u5305\u542b\u4e00\u4e2acpu\u7248\u672c\u548c\u4e00\u4e2agpu\u7248\u672c":63,"\u901a\u5e38\u540d\u5b57\u662f":72,"\u901a\u5e38\u60c5\u51b5\u4e0b":108,"\u901a\u5e38\u6211\u4eec\u4f1a\u5b89\u88c5ceph\u7b49\u5206\u5e03\u5f0f\u6587\u4ef6\u7cfb\u7edf\u6765\u5b58\u50a8\u8bad\u7ec3\u6570\u636e":96,"\u901a\u5e38\u6307\u5c06\u4e00\u4e2a\u6574\u4f53\u62c6\u5206\u6210\u591a\u4efd\u7684\u5176\u4e2d\u7684\u4e00\u4efd":10,"\u901a\u5e38\u6709\u4e24\u4e2a\u65b9\u6cd5\u6765\u6784\u5efa\u57fa\u4e8e":118,"\u901a\u5e38\u7528\u4e8e\u8868\u793a\u79bb\u6563\u7684\u7c7b\u522b\u6807\u7b7e":89,"\u901a\u5e38\u7684\u505a\u6cd5\u662f\u4f7f\u7528":114,"\u901a\u5e38\u7684\u505a\u6cd5\u662f\u5c06\u914d\u7f6e\u5b58\u4e8e":74,"\u901a\u5e38\u8981\u6c42\u65f6\u95f4\u6b65\u4e4b\u95f4\u5177\u6709\u4e00\u4e9b\u4f9d\u8d56\u6027":111,"\u901a\u5e38\u89c2\u5bdf\u70ed\u70b9\u51fd\u6570\u95f4\u7684\u8c03\u7528\u5173\u7cfb":107,"\u901a\u5e38\u90fd\u4f1a\u4f7f\u7528\u4e0b\u9762\u8fd9\u4e9b\u547d\u4ee4\u884c\u53c2\u6570":105,"\u901a\u5e38\u9700\u8981\u53bb\u6389\u7f51\u7edc\u4e2d\u7684":90,"\u901a\u7528":102,"\u901a\u8fc7":[72,74,75,81,89,111],"\u901a\u8fc7\u4e24\u4e2a\u5d4c\u5957\u7684":113,"\u901a\u8fc7\u4f7f\u7528":0,"\u901a\u8fc7\u4f7f\u7528\u8fd9\u4e9bapi":41,"\u901a\u8fc7\u51fd\u6570":97,"\u901a\u8fc7\u547d\u4ee4\u884c\u53c2\u6570":81,"\u901a\u8fc7\u591a\u4e2a\u7ebf\u7a0b\u5171\u4eab\u540c\u4e00\u4e2a\u6a21\u578b\u6765\u51cf\u5c11\u5185\u5b58\u5f00\u9500":90,"\u901a\u8fc7\u5f15\u7528memory\u5f97\u5230\u8fd9\u4e2alayer\u4e0a\u4e00\u4e2a\u65f6\u523b\u7684\u8f93\u51fa":113,"\u901a\u8fc7\u5f15\u7528memory\u5f97\u5230\u8fd9\u4e2alayer\u4e0a\u4e00\u4e2a\u65f6\u523b\u8f93\u51fa":113,"\u901a\u8fc7\u6240\u6709\u5355\u5143\u6d4b\u8bd5":72,"\u901a\u8fc7\u6a21\u578b\u63a8\u65adapi\u7684\u5b9e\u73b0\u4f5c\u4e3a\u4e00\u4e2a\u6837\u4f8b":46,"\u901a\u8fc7\u7075\u6d3b\u4f7f\u7528\u4ee5\u4e0a\u4e24\u4e2a\u63a5\u53e3":90,"\u901a\u8fc7\u7ec4\u5408\u4e0d\u540c\u7684layer":84,"\u901a\u8fc7\u7f51\u7edc\u5c42\u7684\u6807\u8bc6\u7b26\u6765\u6307\u5b9a":74,"\u901a\u8fc7\u8ba1\u7b97\u8282\u70b9\u548c\u53c2\u6570\u670d\u52a1\u5668\u7684\u5206\u5e03\u5f0f\u534f\u4f5c":92,"\u901a\u8fc7\u8be5\u53c2\u6570\u53ef\u83b7\u53d6\u5230\u8f93\u5165\u8f93\u51fa\u4ee5\u53ca\u5c5e\u6027":75,"\u901a\u8fc7\u8c03\u7528":[89,90],"\u901a\u8fc7\u8c03\u7528\u4ee5\u4e0b\u63a5\u53e3\u521b\u5efa\u7a00\u758f\u77e9\u9635":89,"\u901a\u8fc7data":113,"\u901a\u8fc7ssh\u7b49\u65b9\u5f0f\u767b\u5f55\u5230raspberri":118,"\u903b\u8f91\u4e0a\u9ad8\u4e8e\u4e8c\u7ef4\u7684\u6570\u636e":89,"\u903b\u8f91\u5212\u4e0a\u6587\u4ef6\u5206\u5757\u7684\u5355\u4f4d":27,"\u9047\u5230\u8be5\u9519\u8bef\u65f6":82,"\u9075\u5b88\u4ee5\u4e0b\u7ea6\u5b9a":72,"\u9075\u5faa\u4ee5\u4e0b\u6d41\u7a0b":63,"\u90a3\u4e48":[46,74,113],"\u90a3\u4e480\u5c42\u5e8f\u5217\u5373\u4e3a\u4e00\u4e2a\u8bcd\u8bed":113,"\u90a3\u4e48\u4f1a\u643a\u5e26\u6709":89,"\u90a3\u4e48\u53ef\u4ee5\u8ba4\u4e3a\u8bad\u7ec3\u4e0d\u6536\u655b":83,"\u90a3\u4e48\u5728":75,"\u90a3\u4e48\u5982\u4f55\u5224\u65ad\u8bad\u7ec3\u4e0d\u6536\u655b\u5462":83,"\u90a3\u4e48\u5bf9\u5e94\u7684\u5185\u90e8\u5b58\u50a8\u4e5f\u4f1a\u4e0e\u5b83\u4eec\u5171\u4eab\u5185\u5b58":42,"\u90a3\u4e48\u5c31\u4f1a\u4f7f":42,"\u90a3\u4e48\u5e38\u6570\u8f93\u51fa\u6240\u80fd\u8fbe\u5230\u7684\u6700\u5c0fcost\u662f":83,"\u90a3\u4e48\u6211\u4eec\u53ef\u4ee5\u5224\u65ad\u4e3a\u8bad\u7ec3\u4e0d\u6536\u655b":83,"\u90a3\u4e48\u63a8\u8350\u4f7f\u7528":114,"\u90a3\u4e48\u63a8\u8350\u4f7f\u7528\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u65b9\u6cd5":114,"\u90a3\u4e48\u6536\u655b\u53ef\u80fd\u5f88\u6162":83,"\u90a3\u4e48\u6700\u597d\u5c06\u6570\u636e\u6587\u4ef6\u5728\u6bcf\u6b21\u8bfb\u53d6\u4e4b\u524d\u505a\u4e00\u6b21shuffl":81,"\u90a3\u4e48\u7528\u6237\u9700\u8981\u62c9\u53d6\u6240\u6709\u7684\u8fdc\u7a0b\u5206\u652f\u5230\u672c\u673a":78,"\u90a3\u4e48\u8bad\u7ec3\u6709\u53ef\u80fd\u4e0d\u6536\u655b":83,"\u90a3\u4e48\u8be5\u4f18\u5316\u7b97\u6cd5\u81f3\u5c11\u9700\u8981":81,"\u90a3\u4e48fc1\u548cfc2\u5c42\u5c06\u4f1a\u4f7f\u7528\u7b2c1\u4e2agpu\u6765\u8ba1\u7b97":105,"\u90a3\u4e5f\u5c31\u4e0d\u9700\u8981\u6025\u7740\u4f18\u5316\u6027\u80fd\u5566":108,"\u90a3\u4f30\u8ba1\u8fd9\u91cc\u7684\u6f5c\u529b\u5c31\u6ca1\u5565\u597d\u6316\u7684\u4e86":108,"\u90a3\u51cf\u5c11\u5b66\u4e60\u738710\u500d\u7ee7\u7eed\u8bd5\u9a8c":83,"\u90a3\u6211\u4f1a\u671f\u671b\u5206\u6790\u5de5\u5177\u7edf\u8ba1\u5230\u901f\u5ea6\u662f100gb":108,"\u90a3\u7a0b\u5e8f\u5206\u6790\u5de5\u5177\u662f\u5fc5\u4e0d\u53ef\u5c11\u7684\u5229\u5668":108,"\u90fd\u4e0d\u4f1a\u60f3\u8981\u77e5\u9053next":42,"\u90fd\u4e0d\u9700\u8981":0,"\u90fd\u4f1a\u4ea7\u751f\u5f53\u524d\u5c42\u72b6\u6001\u7684\u6240\u6709\u7ee7\u627f\u7ed3\u679c":103,"\u90fd\u4f1a\u7ba1\u7406\u7ef4\u62a4\u4e00\u4efd\u8bad\u7ec3\u597d\u7684\u6a21\u578b":90,"\u90fd\u4f1a\u9020\u6210\u8bad\u7ec3\u4e2d\u7684\u6570\u636e\u4ecec":81,"\u90fd\u4f7f\u7528":89,"\u90fd\u53ea\u662f\u4ecb\u7ecd\u53cc\u5c42rnn\u7684api\u63a5\u53e3":111,"\u90fd\u53ef\u4ee5\u8fd0\u884c":0,"\u90fd\u53ef\u4ee5\u901a\u8fc7\u8c03\u7528\u4e0b\u9762\u7684\u63a5\u53e3\u4e3a\u539f\u6709\u7684\u6570\u636e\u8f93\u5165\u9644\u52a0\u4e0a\u5e8f\u5217\u4fe1\u606f":89,"\u90fd\u5e94\u4f7f\u7528c":89,"\u90fd\u662f\u4e94\u4f4d\u7684\u6570\u5b57":11,"\u90fd\u662f\u4ee5ext\u5f00\u5934":42,"\u90fd\u662f\u5bf9layer1\u5143\u7d20\u7684\u62f7\u8d1d":110,"\u90fd\u662f\u5c06\u6bcf\u4e00\u53e5\u5206\u597d\u8bcd\u540e\u7684\u53e5\u5b50":111,"\u90fd\u662fabi\u8c03\u7528\u6807\u51c6\u7684":45,"\u90fd\u7528":72,"\u90fd\u7ee7\u627f\u4e8epaddlepaddle\u7684\u57fa\u7c7b":41,"\u90fd\u9700\u8981\u5199\u63d0\u4ea4\u8bf4\u660e":72,"\u90fd\u9700\u8981\u8c03\u7528\u4e00\u6b21":74,"\u914d\u5236\u7f16\u8bd1\u9009\u9879":87,"\u914d\u7f6e\u6253\u5f00":108,"\u914d\u7f6e\u6587\u4ef6\u63a5\u53e3\u662ffc_layer":74,"\u914d\u7f6e\u6587\u4ef6\u91cc\u52a0\u4e24\u884c":0,"\u914d\u7f6e\u7684\u65b9\u6cd5\u53c2\u8003":27,"\u914d\u7f6e\u7b80\u5355\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u4f8b\u5b50":114,"\u914d\u7f6e\u7f51\u7edc\u5c42\u7684\u8f93\u5165":74,"\u914d\u7f6eapi":110,"\u91c7\u7528\u5747\u5300\u5206\u5e03\u6216\u8005\u9ad8\u65af\u5206\u5e03\u521d\u59cb\u5316":103,"\u91c7\u7528multi":83,"\u91ca\u653e\u5bf9paramters\u5185\u5b58\u7684\u9501\u5b9a":10,"\u91cc\u53ef\u4ee5\u6807\u51c6\u5316\u7f16\u8bd1\u73af\u5883":0,"\u91cc\u5b8c\u6210":75,"\u91cc\u6240\u6709\u7684\u7b26\u53f7\u90fd\u5199\u5165\u81ea\u5df1\u7684\u7a0b\u5e8f\u7684\u4e8c\u8fdb\u5236\u6587\u4ef6\u91cc":45,"\u91cc\u7684":0,"\u91cc\u7684\u65e5\u5fd7":93,"\u91cc\u8fd0\u884c\u7684\u7f16\u8bd1\u5de5\u5177\u5b9e\u9645\u4e0a\u90fd\u662f\u5728\u672c\u673a\u7684":0,"\u91cc\u9009\u62e9\u9700\u8981\u53d1\u5e03\u7684\u5206\u652f":63,"\u91cc\u9762":75,"\u91cc\u9762\u6db5\u76d6\u4e86\u4ea4\u53c9\u7f16\u8bd1android\u7248paddlepaddle\u5e93\u9700\u8981\u7684\u6240\u6709\u7f16\u8bd1\u5de5\u5177":116,"\u91cc\u9762\u6dfb\u52a0":42,"\u91ccstep\u7684\u5185\u5bb9":81,"\u91cd\u5199\u7236\u7c7blayer\u7684":42,"\u91cd\u547d\u540d\u6210":45,"\u91cd\u65b0\u7f16\u8bd1paddlepaddl":108,"\u9488\u5bf9\u4e0d\u540c\u7684":117,"\u9488\u5bf9\u4efb\u52a1\u8fd0\u884c\u5b8c\u6210\u540e\u5bb9\u5668\u81ea\u52a8\u9000\u51fa\u7684\u573a\u666f":96,"\u9488\u5bf9\u5185\u5b58\u548c\u663e\u5b58":81,"\u94fe\u63a5":87,"\u94fe\u63a5\u4e2d\u627e\u5230":3,"\u94fe\u63a5\u4f55\u79cdblas\u5e93\u7b49":0,"\u94fe\u63a5\u5230\u81ea\u5df1\u7684\u7a0b\u5e8f\u91cc":45,"\u94fe\u63a5\u76f8\u5bf9\u5bb9\u6613":87,"\u94fe\u63a5\u9009\u9879":87,"\u94fe\u63a5\u9759\u6001\u5e93":87,"\u9519\u8bef":78,"\u9519\u8bef\u5904\u7406":45,"\u9519\u8bef\u5904\u7406\u65b9\u5f0f\u662f\u8fd4\u56de\u503c":45,"\u9519\u8bef\u5904\u7406\u7684\u65b9\u5f0f\u4e5f\u4e0d\u5c3d\u76f8\u540c":45,"\u9519\u8bef\u7684define_py_data_sources2\u7c7b\u4f3c":83,"\u952e\u6765\u542f\u52a8\u7f16\u8bd1\u4e86":0,"\u955c\u50cf\u91cc":0,"\u955c\u50cf\u91cc\u6709paddlepaddle\u7684\u6e90\u7801\u4e0edemo":96,"\u957f\u5ea6":81,"\u95e8\u63a7\u5faa\u73af\u5355\u5143\u5355\u6b65\u51fd\u6570\u548c\u8f93\u51fa\u51fd\u6570":114,"\u95e8\u63a7\u5faa\u73af\u5355\u5143\u7684\u8f93\u51fa\u88ab\u7528\u4f5c\u8f93\u51famemori":114,"\u9644\u52a0\u4e0a\u5e8f\u5217\u4fe1\u606f":89,"\u9650\u5236\u5957\u63a5\u5b57\u53d1\u9001\u7f13\u51b2\u533a\u7684\u5927\u5c0f":103,"\u9650\u5236\u5957\u63a5\u5b57\u63a5\u6536\u7f13\u51b2\u533a\u7684\u5927\u5c0f":103,"\u9664\u4e86\u53ef\u4ee5\u81ea\u52a8\u7f16\u8bd1\u6587\u6863":77,"\u9664\u4e86boot_lay":111,"\u9664\u6784\u9020\u67d0\u79cd\u7c7b\u578b\u7684\u51fd\u6570":46,"\u9664\u6b64\u4e4b\u5916":81,"\u9664\u96f6\u7b49\u95ee\u9898":81,"\u968f\u540e\u53ef\u4ee5\u7528\u8fd9\u4e2a\u5f00\u53d1\u955c\u50cf\u5f00\u59cbbuild":72,"\u968f\u673a\u6570\u7684\u79cd\u5b50":103,"\u968f\u673a\u6570seed":102,"\u9694\u5f00":91,"\u96c6\u6210\u5230":41,"\u96c6\u6210\u5230paddlepaddl":42,"\u96c6\u675f\u641c\u7d22\u4f7f\u7528\u5e7f\u5ea6\u4f18\u5148\u641c\u7d22\u7684\u65b9\u5f0f\u6784\u5efa\u67e5\u627e\u6811":103,"\u96c6\u7fa4\u4e0a\u542f\u52a8\u4e00\u4e2a\u5355\u673a\u4f7f\u7528cpu\u7684paddlepaddle\u8bad\u7ec3\u4f5c\u4e1a":96,"\u96c6\u7fa4\u4e2d\u7684\u6bcf\u53f0\u8ba1\u7b97\u673a\u901a\u5e38\u88ab\u6210\u4e3a\u4e00\u4e2a":101,"\u96c6\u7fa4\u4efb\u52a1":93,"\u96c6\u7fa4\u4f5c\u4e1a\u5c06\u4f1a\u5728\u51e0\u79d2\u540e\u542f\u52a8":93,"\u96c6\u7fa4\u6d4b\u8bd5":102,"\u96c6\u7fa4\u8bad\u7ec3":102,"\u96c6\u7fa4\u8bad\u7ec3\u4e0e\u9884\u6d4b":80,"\u96c6\u7fa4\u8fdb\u7a0b":93,"\u9700\u52a0\u8be5\u6a21\u677f\u53c2\u6570":75,"\u9700\u5728nvvp\u754c\u9762\u4e2d\u9009\u4e0a\u624d\u80fd\u5f00\u542f":108,"\u9700\u6307\u5b9a":87,"\u9700\u63d0\u4f9b\u975e\u96f6\u5143\u7684\u503c":89,"\u9700\u6ce8\u610f":87,"\u9700\u8981":[0,11,75,90],"\u9700\u8981\u4e3a":75,"\u9700\u8981\u4f7f\u7528":81,"\u9700\u8981\u4f7f\u7528\u5176\u5236\u5b9a\u7684\u65b9\u5f0f\u6302\u8f7d\u540e\u5e76\u5bfc\u5165\u6570\u636e":97,"\u9700\u8981\u4f7f\u7528\u6700\u65b0\u7684pip":3,"\u9700\u8981\u4f7f\u7528\u8005\u81ea\u5df1\u4e86\u89e3\u5e76\u5b8c\u6210\u8f6c\u5316":89,"\u9700\u8981\u4fdd\u6301\u5f53\u524d\u5206\u652f\u76ee\u5f55":72,"\u9700\u8981\u4fee\u6539build":63,"\u9700\u8981\u521b\u5efa\u5e76\u586b\u5199":89,"\u9700\u8981\u5347\u7ea7pip\u7248\u672c\u5230\u6700\u65b0":[3,78],"\u9700\u8981\u5355\u72ec":1,"\u9700\u8981\u53ef\u4ee5\u8de8\u5e73\u53f0\u6267\u884c":27,"\u9700\u8981\u540c\u65f6\u63d0\u4f9b\u6bcf\u4e00\u4e2a\u5185\u5c42\u5e8f\u5217\u5728\u6574\u4e2a":89,"\u9700\u8981\u540c\u6b65\u539f\u4ed3\u5e93":72,"\u9700\u8981\u542f\u52a8\u7684\u8282\u70b9\u4e2a\u6570\u4ee5\u53ca":97,"\u9700\u8981\u548c\u8be5op\u7684\u540d\u5b57\u4e00\u6837":75,"\u9700\u8981\u54ea\u4e9b\u5c42\u7684\u8ba1\u7b97\u7ed3\u679c\u4f5c\u4e3a\u8f93\u51fa":90,"\u9700\u8981\u5728":75,"\u9700\u8981\u5728\u521b\u5efa\u5bb9\u5668\u524d\u6302\u8f7d\u5377\u4ee5\u4fbf\u6211\u4eec\u4fdd\u5b58\u8bad\u7ec3\u7ed3\u679c":96,"\u9700\u8981\u5728\u7cfb\u7edf\u91cc\u5148\u5b89\u88c5\u597ddocker\u5de5\u5177\u5305":77,"\u9700\u8981\u5728cmake\u7684\u65f6\u5019":46,"\u9700\u8981\u5728macos\u7cfb\u7edf\u4e0a\u8fdb\u884c":117,"\u9700\u8981\u5c06\u5176parameter\u8bbe\u7f6e\u6210":81,"\u9700\u8981\u5c06\u7f51\u7edc\u7ed3\u6784\u4f7f\u7528":90,"\u9700\u8981\u5c06bugfix\u7684\u5206\u652f\u540c\u65f6merge\u5230":63,"\u9700\u8981\u5c06cuda\u76f8\u5173\u7684\u5e93\u8bbe\u7f6e\u5230":87,"\u9700\u8981\u5c06paddl":87,"\u9700\u8981\u5f15\u7528":46,"\u9700\u8981\u5f3a\u8c03\u7684\u662f":0,"\u9700\u8981\u601d\u8003\u5b8c\u6210\u4ee5\u4e0b\u5de5\u4f5c":[89,90],"\u9700\u8981\u624b\u52a8\u8fdb\u884c\u89e3\u538b":90,"\u9700\u8981\u6267\u884c":[0,3,86],"\u9700\u8981\u6307\u5b9a":87,"\u9700\u8981\u6307\u5b9a\u4e0e\u67d0\u4e00\u4e2a\u8f93\u5165\u7684\u5e8f\u5217\u4fe1\u606f\u662f\u4e00\u81f4\u7684":111,"\u9700\u8981\u6307\u5b9alayer\u7684\u8f93\u5165\u6765\u6e90":84,"\u9700\u8981\u63d0\u9192\u7684\u662f":2,"\u9700\u8981\u660e\u786e\u6307\u5b9a":103,"\u9700\u8981\u663e\u5f0f\u5730\u94fe\u63a5":87,"\u9700\u8981\u663e\u793a\u5730\u94fe\u63a5":87,"\u9700\u8981\u663e\u793a\u5730\u94fe\u63a5mkl\u7684\u52a8\u6001\u5e93":87,"\u9700\u8981\u6709\u7a33\u5b9a\u7684\u5bfc\u51fa\u7b26\u53f7":45,"\u9700\u8981\u6839\u636e\u4e0d\u540c\u7684\u5206\u5e03\u5f0f\u5b58\u50a8\u6765\u7ed1\u5b9a\u4e00\u4e2a":97,"\u9700\u8981\u6ce8\u610f":75,"\u9700\u8981\u6ce8\u610f\u7684\u662f":[42,63,81,103],"\u9700\u8981\u6ce8\u610f\u7684\u662f\u68af\u5ea6\u68c0\u67e5\u4ec5\u4ec5\u9a8c\u8bc1\u4e86\u68af\u5ea6\u7684\u8ba1\u7b97":74,"\u9700\u8981\u6ce8\u610f\u7684\u662fpaddlepaddle\u76ee\u524d\u53ea\u652f\u6301\u5b50\u5e8f\u5217\u6570\u76ee\u4e00\u6837\u7684\u591a\u8f93\u5165\u53cc\u5c42rnn":111,"\u9700\u8981\u7528\u5230\u7684\u7f16\u8bd1\u5de5\u5177\u548c\u7cfb\u7edf\u5e93":116,"\u9700\u8981\u7528\u6237\u663e\u5f0f\u8bbe\u5b9a":82,"\u9700\u8981\u7d2f\u52a0\u4e0d\u540clayer\u4f20\u8fc7\u6765\u7684\u68af\u5ea6":42,"\u9700\u8981\u81ea\u5df1\u94fe\u63a5mkl\u94fe\u63a5\u5e93":87,"\u9700\u8981\u88ab\u66b4\u9732\u5230\u5176\u4ed6\u8bed\u8a00":46,"\u9700\u8981\u8bf7\u7ba1\u7406\u5458\u5b89\u88c5\u548c\u914d\u7f6e\u597d":0,"\u9700\u8981\u9075\u5faa\u4ee5\u4e0b\u7ea6\u5b9a":113,"\u9700\u8981\u91cd\u547d\u540dwheel\u5305\u4e2dplatform\u76f8\u5173\u7684\u540e\u7f00":63,"\u9700\u8981\u989d\u5916\u6ce8\u610f\u7684\u662f":76,"\u9700\u9644\u52a0\u53cc\u5c42\u5e8f\u5217\u4fe1\u606f":89,"\u9700\u9644\u52a0\u5e8f\u5217\u4fe1\u606f":89,"\u975e\u5e38\u6570":74,"\u975e\u5e8f\u5217\u8f93\u5165\u4e0d\u643a\u5e26":89,"\u975e\u5e8f\u5217\u8f93\u5165\u65e0\u9700\u6784\u9020":89,"\u975e\u96f6\u5143\u4e2a\u6570":89,"\u975e\u96f6\u5143\u7d20\u7684\u503c":89,"\u975e\u96f6\u5143\u7d20\u7684\u5217\u53f7":89,"\u975e\u96f6\u6570\u5b57\u7684\u4e2a\u6570":74,"\u9762\u5411\u67b6\u6784\u4e3a32\u4f4darm\u67b6\u6784":116,"\u9762\u5411\u67b6\u6784\u4e3a64\u4f4darm64\u67b6\u6784":116,"\u9879\u76ee\u5728\u52aa\u529b\u5f00\u59cb\u652f\u6301\u5176\u4ed6\u4e0d\u9700\u8981":0,"\u987a\u5e8f":111,"\u9884\u63d0\u4ea4\u94a9\u5b50":72,"\u9884\u6d4b\u4e0d\u9700\u8981\u6807\u7b7e":88,"\u9884\u6d4b\u4e0d\u9700\u8981\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u53cd\u5411\u4f20\u64ad\u548c\u53c2\u6570\u66f4\u65b0\u7684\u90e8\u5206":88,"\u9884\u6d4b\u4ee3\u7801\u66f4\u591a\u8be6\u7ec6\u793a\u4f8b\u4ee3\u7801\u8bf7\u53c2\u8003":90,"\u9884\u6d4b\u4f7f\u7528\u7684\u7f51\u7edc\u7ed3\u6784\u5f80\u5f80\u4e0d\u540c\u4e8e\u8bad\u7ec3":90,"\u9884\u6d4b\u5c31\u662f\u51c6\u5907\u8f93\u5165\u6570\u636e":88,"\u9884\u6d4b\u5f88\u591a\u65f6\u5019\u9700\u8981\u548c\u7528\u6237\u7cfb\u7edf\u6574\u5408\u5728\u4e00\u8d77":88,"\u9884\u6d4b\u65f6":90,"\u9884\u6d4b\u65f6\u53ea\u9700\u52a0\u8f7d\u4e00\u4e2a\u6587\u4ef6\u4fbf\u4e8e\u53d1\u5e03":90,"\u9884\u6d4b\u6709\u5982\u4e0b\u7279\u70b9":88,"\u9884\u6d4b\u7a0b\u5e8f\u5f00\u53d1\u4e24\u5927\u90e8\u5206":90,"\u9884\u6d4bsdk\u4e0d\u5305\u542b\u53cd\u5411\u4f20\u64ad\u548c\u53c2\u6570\u66f4\u65b0\u90e8\u5206":88,"\u9884\u6d4bsdk\u9700\u8981\u63d0\u4f9b\u4e00\u4e2a\u7b80\u6d01\u7684\u7528\u6237\u63a5\u53e3":88,"\u9996\u5148":[74,111,114],"\u9996\u5148\u4ee5\u51e0\u4e2a\u5b9e\u9645\u573a\u666f\u4e3a\u4f8b":104,"\u9996\u5148\u5728\u7cfb\u7edf\u8def\u5f84":0,"\u9996\u5148\u5b89\u88c5\u5e76\u5728\u5f53\u524d\u76ee\u5f55\u8fd0\u884c\u5b83":72,"\u9996\u5148\u5b9a\u4e49":75,"\u9996\u5148\u5bf9\u8f93\u5165\u505a\u4e00\u4e2a\u5c0f\u7684\u6270\u52a8":74,"\u9996\u5148\u6211\u4eec\u9700\u8981\u63a8\u5bfc\u8be5\u7f51\u7edc\u5c42\u7684":74,"\u9996\u5148\u6784\u9020\u5934\u4fe1\u606f":83,"\u9996\u5148\u901a\u8fc7":72,"\u9996\u5148\u9700\u8981\u52a0\u8f7d\u76f8\u5e94\u7684python\u5e93":84,"\u9a71\u52a8":77,"\u9ad8\u4eae\u90e8\u5206":111,"\u9ad8\u5ea6":89,"\u9ad8\u5ea6\u652f\u6301\u7075\u6d3b\u548c\u9ad8\u6548\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u914d\u7f6e":114,"\u9ad8\u65af\u5206\u5e03":83,"\u9ed8\u8ba4":103,"\u9ed8\u8ba40":91,"\u9ed8\u8ba41":91,"\u9ed8\u8ba4127":91,"\u9ed8\u8ba4256k":27,"\u9ed8\u8ba47164":91,"\u9ed8\u8ba4\u4e0d\u663e\u793a":103,"\u9ed8\u8ba4\u4e0d\u8bbe\u7f6e":113,"\u9ed8\u8ba4\u4e3a0":[103,105],"\u9ed8\u8ba4\u4e3a1":[89,105],"\u9ed8\u8ba4\u4e3a100":105,"\u9ed8\u8ba4\u4e3a4096mb":103,"\u9ed8\u8ba4\u4e3a\u7b2c\u4e00\u4e2a\u8f93\u5165":113,"\u9ed8\u8ba4\u4e3anull":103,"\u9ed8\u8ba4\u4f1a\u5c06a\u548cb":81,"\u9ed8\u8ba4\u4f7f\u7528concurrentremoteparameterupdat":103,"\u9ed8\u8ba4\u4f7f\u7528mkl":0,"\u9ed8\u8ba4\u503c":[0,105,110,117],"\u9ed8\u8ba4\u503c\u4e3a":[116,117,118],"\u9ed8\u8ba4\u503c\u4e3a\u73af\u5883\u53d8\u91cf":117,"\u9ed8\u8ba4\u521d\u59cb\u72b6\u4e3a0":113,"\u9ed8\u8ba4\u60c5\u51b5\u4e0b":[83,93],"\u9ed8\u8ba4\u60c5\u51b5\u4e0b\u6309\u7167float\u7cbe\u5ea6\u8ba1\u7b97":83,"\u9ed8\u8ba4\u6307\u5b9a\u7b2c\u4e00\u4e2a\u8f93\u5165":111,"\u9ed8\u8ba4\u662f\u4f7f\u7528mkl\u7684\u955c\u50cf":1,"\u9ed8\u8ba4\u6ca1\u6709\u5b89\u88c5vim":1,"\u9ed8\u8ba4\u7684paddlepaddle\u751f\u4ea7\u73af\u5883\u955c\u50cf":96,"\u9ed8\u8ba4\u7f16\u8bd1\u6240\u6709\u67b6\u6784":117,"\u9ed8\u8ba4\u8bbe\u7f6e\u4e3a":41,"\u9ed8\u8ba4\u8bbe\u7f6e\u4e3a\u771f":105,"\u9ed8\u8ba4\u8bbe\u7f6e\u6210\u73af\u5883\u53d8\u91cf":[116,118],"\u9ed8\u8ba4\u8c03\u7528":0,"\u9ed8\u8ba4fals":91,"abi\u7684paddlepaddle\u5e93":116,"abstract":[18,19,26,30,53,62,64],"android\u5b98\u65b9\u63d0\u4f9b\u7684":116,"android\u5e73\u53f0\u4e0a\u4f7f\u7528\u7684c":116,"android\u5e73\u53f0\u53ef\u9009\u914d\u7f6e\u53c2\u6570":116,"android\u7684docker\u5f00\u53d1\u955c\u50cf\u5411\u7528\u6237\u63d0\u4f9b\u4e24\u4e2a\u53ef\u914d\u7f6e\u7684\u53c2\u6570":116,"api\u4e0d\u4f1a\u76f4\u63a5\u52a0\u8f7d":90,"api\u4e0d\u5c0f\u4e8e21":116,"api\u4e2d":89,"api\u4e2d\u4f7f\u7528":45,"api\u4e2d\u7684\u4e00\u7ef4\u6570\u7ec4":89,"api\u4e2d\u7684\u77e9\u9635\u6765\u8868\u793a":89,"api\u4e2d\u795e\u7ecf\u7f51\u7edc\u7684\u4e00\u4e2a\u8f93\u5165":90,"api\u4f7f\u7528\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u6982\u5ff5":90,"api\u4f7f\u7528\u6d41\u7a0b":88,"api\u4f7f\u7528\u6d41\u7a0b\u793a\u610f\u56fe":90,"api\u4f7f\u7528\u793a\u4f8b":90,"api\u521b\u5efa\u7684gradientmachine\u7c7b\u7684\u5bf9\u8c61":90,"api\u53ef\u4ee5\u901a\u8fc7\u5206\u522b\u6307\u5b9a\u5e8f\u5217\u5316\u540e\u7684\u7f51\u7edc\u7ed3\u6784\u6587\u4ef6\u548c\u53c2\u6570\u76ee\u5f55\u6765\u52a0\u8f7d\u8bad\u7ec3\u597d\u7684\u6a21\u578b":90,"api\u53ef\u4ee5\u901a\u8fc7\u6307\u5b9a":90,"api\u5b8c\u6210\u5206\u5e03\u5f0f\u8bad\u7ec3":91,"api\u5bf9\u6bd4\u4ecb\u7ecd":112,"api\u5bfc\u51fa\u7684\u52a8\u6001\u5e93":46,"api\u5bfc\u51fa\u7684\u9759\u6001\u5e93":46,"api\u5e93\u5c06\u88ab\u5b89\u88c5\u5230":116,"api\u5f00\u53d1\u5305\u5e76\u5b89\u88c5":3,"api\u5f00\u53d1\u9884\u6d4b\u7a0b\u5e8f\u65f6":87,"api\u5f00\u53d1\u9884\u6d4b\u7a0b\u5e8f\u9700\u8981\u4e00\u4e2a\u8bad\u7ec3\u597d\u7684\u6a21\u578b":90,"api\u6240\u9700\u7684\u4f9d\u8d56":87,"api\u63a5\u53d7\u7684\u7c7b\u578b\u5168\u662f":46,"api\u63a5\u53e3":27,"api\u63a5\u53e3\u751f\u6210":75,"api\u63a5\u53e3\u7684\u53c2\u6570\u8f6c\u53d1\u7ed9":46,"api\u63a5\u53e3\u7684\u751f\u6210":75,"api\u63d0\u4f9b\u7684":90,"api\u652f\u6301\u7684\u6240\u6709\u8f93\u5165\u6570\u636e\u7c7b\u578b\u548c\u4ed6\u4eec\u7684\u7ec4\u7ec7\u65b9\u5f0f":90,"api\u6587\u6863":[116,117],"api\u65f6":46,"api\u65f6\u4e3a\u8f93\u51fa":90,"api\u65f6\u6240\u552f\u4e00\u9700\u8981\u5f15\u5165\u7684\u5934\u6587\u4ef6":46,"api\u662f\u591a\u8bed\u8a00api\u7684\u57fa\u7840\u90e8\u5206":46,"api\u66b4\u9732\u7684\u7c7b\u578b":46,"api\u6765\u9884\u6d4b":[116,117],"api\u751f\u6210\u7684\u4e8c\u8fdb\u5236\u6587\u4ef6\u4f1a\u88ab\u5b89\u88c5\u5230":46,"api\u7684\u4f7f\u7528":88,"api\u7684\u5934\u6587\u4ef6":[116,117,118],"api\u7684\u5b9e\u4f8b":46,"api\u7684\u5b9e\u73b0\u7ec6\u8282":46,"api\u7684\u63a5\u53e3":46,"api\u7684\u65f6\u5019\u63a8\u8350paddle\u4e0d\u5d4c\u5165python\u89e3\u91ca\u5668":46,"api\u7684\u7f16\u8bd1\u9009\u9879\u9ed8\u8ba4\u5173\u95ed":46,"api\u76ee\u5f55\u7ed3\u6784\u5982\u4e0a\u56fe\u8868\u6240\u793a":46,"api\u76f8\u5173\u63a5\u53e3":89,"api\u7ea7\u522b":116,"api\u7ea7\u522b\u4e3a21":116,"api\u83b7\u5f97\u4e86\u795e\u7ecf\u7f51\u7edc\u7684\u53c2\u6570\u5b9e\u4f8b":46,"api\u8bad\u7ec3":90,"api\u9700\u8981\u521b\u5efa\u7684\u6570\u636e\u7c7b\u578b":89,"api\u9759\u6001\u5e93":117,"api\u9884\u6d4b\u5e93":[106,117],"api\u9884\u6d4b\u65f6":90,"apis\u505a\u4e86\u5c01\u88c5":41,"app\u4e2d":[116,117],"apple\u5b98\u65b9\u4e3aios\u5f00\u53d1\u63d0\u4f9b\u4e86\u5b8c\u6574\u7684\u4ea4\u53c9\u7f16\u8bd1\u5de5\u5177\u548c\u96c6\u6210\u5f00\u53d1\u73af\u5883":117,"async_sgd\u8fdb\u884c\u8bad\u7ec3\u65f6":83,"avx\u662f\u4e00\u79cdcpu\u6307\u4ee4\u96c6":1,"avx\u7248\u672c":1,"avx\u7684\u955c\u50cf":1,"batch\u4e2d\u5305\u542b":81,"batch\u7684\u6743\u91cd":81,"batches\u4e2a\u6279\u6b21\u4fdd\u5b58\u4e00\u6b21\u53c2\u6570":103,"batches\u6b21":103,"block\u6784\u6210\u4e00\u4e2amodel":10,"book\u4e00\u5b9a\u662f\u60a8\u6700\u597d\u7684\u9009\u62e9":1,"book\u4e2d\u6240\u6709\u7ae0\u8282\u529f\u80fd\u7684\u6b63\u786e\u6027":63,"book\u662f\u4e3a\u7528\u6237\u548c\u5f00\u53d1\u8005\u5236\u4f5c\u7684\u4e00\u4e2a\u4ea4\u4e92\u5f0f\u7684jupyt":1,"book\u7684":84,"book\u7684docker\u955c\u50cf":1,"boolean":[26,28,36,45,69],"break":[8,67,69,70,71],"bugfix\u5206\u652f\u4e5f\u662f\u5728\u5f00\u53d1\u8005\u81ea\u5df1\u7684fork\u7248\u672c\u5e93\u7ef4\u62a4":63,"bugfix\u5206\u652f\u9700\u8981\u5206\u522b\u7ed9\u4e3b\u7248\u672c\u5e93\u7684":63,"byte":[27,44,83],"c99\u662f\u76ee\u524dc\u6700\u5e7f\u6cdb\u7684\u4f7f\u7528\u6807\u51c6":45,"c\u6709\u6807\u51c6\u7684abi":45,"c\u8bed\u8a00\u662f\u6709\u5bfc\u51fa\u7b26\u53f7\u7684\u6807\u51c6\u7684":45,"case":[12,18,20,21,26,30,40,46,53,57,59,60,69,95,108,119],"cc\u4e2d\u7684":76,"cells\u7b49":82,"char":14,"ci\u73af\u5883\u4f7f\u7528":63,"ci\u7f16\u8bd1wheel\u5b8c\u6210\u540e\u4f1a\u81ea\u52a8\u5c06docker\u955c\u50cfpush\u5230dockerhub":63,"class":[4,7,18,19,20,21,24,25,29,30,31,32,34,35,37,40,45,49,50,55,56,60,61,62,64,65,66,68,70,74,75,76,83,109],"cmake\u4e2d\u5c06":108,"cmake\u5b98\u65b9\u5bf9android\u5e73\u53f0\u7684\u4ea4\u53c9\u7f16\u8bd1\u63d0\u4f9b\u4e86\u901a\u7528\u7684\u652f\u6301":116,"cmake\u627e\u5230\u7684python\u5e93\u548cpython\u89e3\u91ca\u5668\u7248\u672c\u53ef\u80fd\u6709\u4e0d\u4e00\u81f4\u73b0\u8c61":78,"cmake\u7cfb\u7edf\u5bf9\u4ea4\u53c9\u7f16\u8bd1\u63d0\u4f9b\u4e86\u652f\u6301":116,"cmake\u7f16\u8bd1\u65f6":0,"cmake\u7f16\u8bd1\u7684\u76ee\u6807\u5e73\u53f0":[116,117,118],"cmake\u914d\u7f6e\u4e2d\u5c06":108,"cmake\u914d\u7f6e\u5b8c\u6210\u540e":[116,117,118],"compute\u51fd\u6570":41,"const":[7,12,14,19,29,31,37,38,39,54,55,57,61,64,66,68,70,74,75,76],"container\u4e2d":96,"core\u4e2d\u7684\u6a21\u578b\u8fd8\u5728\u4f7f\u7528\u8fd9\u4e2a\u53c2\u6570":46,"core\u4e2d\u8fd9\u4e00\u7c7b\u578b\u63a5\u53e3\u7684\u667a\u80fd\u6307\u9488":46,"core\u662f\u5426\u8fd8\u5728\u4f7f\u7528\u8fd9\u4e2a\u5b9e\u4f8b":46,"core\u6982\u5ff5":46,"cost\u63a5\u6536y_predict\u4e0ey\u4f5c\u4e3a\u8f93\u5165":84,"cost\u8fd8\u5927\u4e8e\u8fd9\u4e2a\u6570":83,"count\u4e2agpu\u4e0a\u4f7f\u7528\u6570\u636e\u5e76\u884c\u6765\u8ba1\u7b97\u67d0\u4e00\u5c42":105,"count\u548cgpu":105,"csr\u5b58\u50a8\u683c\u5f0f\u901a\u8fc7":89,"cuda\u5171\u4eabkernel\u5b9e\u73b0\u5728":75,"cuda\u5b9e\u73b0\u5171\u4eab\u540c\u4e00\u4e2a":75,"cuda\u5b9e\u73b0\u5728":75,"cuda\u5e93":103,"cuda\u7684\u4ee3\u7801\u53ef\u4ee5\u590d\u7528":75,"cuda\u76f8\u5173\u5e93\u4f1a\u5728\u9884\u6d4b\u7a0b\u5e8f\u8fd0\u884c\u65f6\u52a8\u6001\u88c5\u8f7d":87,"cudnn\u5e93":[0,103],"cumtime\u7684\u6bcf\u6b21\u8c03\u7528\u5e73\u5747\u65f6\u95f4":107,"data\u5230\u5206\u5e03\u5f0f\u5b58\u50a8\u8865\u5145\u8bad\u7ec3\u6570\u636e":11,"data\u63a5\u53e3\u5206\u914d\u5b9e\u9645\u7684\u5185\u5b58":76,"data\u76ee\u5f55\u4e2d\u5b58\u653e\u5207\u5206\u597d\u7684\u6570\u636e":97,"dataprovider\u5171\u8fd4\u56de\u4e24\u4e2a\u6570\u636e":111,"dataprovider\u5171\u8fd4\u56de\u4e24\u7ec4\u6570\u636e":111,"dataprovider\u7f13\u51b2\u6c60\u5185\u5b58":81,"decoder\u5faa\u73af\u5c55\u5f00\u7684\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u4f1a\u5f15\u7528\u5168\u90e8\u7ed3\u679c":113,"decoder\u63a5\u53d7\u4e24\u4e2a\u8f93\u5165":113,"decoder\u6bcf\u6b21\u9884\u6d4b\u4ea7\u751f\u4e0b\u4e00\u4e2a\u6700\u53ef\u80fd\u7684\u8bcd\u8bed":113,"decoer\u67b6\u6784":113,"default":[4,7,8,18,20,24,33,37,44,47,48,57,58,64,65,66,69,71,95,96,97,105,116,119],"device\u5c31\u80fd\u62ff\u5230\u6b63\u786e\u7684\u6570\u636e":42,"dist\u76ee\u5f55\u4e0b\u751f\u6210\u8f93\u51fa\u7684whl\u5305":0,"dnn\u4e09\u8005\u5173\u7cfb\u5982\u4e0b\u8868":42,"dnn\u4e2d\u7684":42,"dnn\u4e2d\u7684\u6392\u5217\u65b9\u5f0f\u4e0d\u6b62\u8fd9\u4e00\u79cd":42,"dnn\u4f1a\u4f5c\u4e3a\u7b2c\u4e09\u65b9\u5e93\u96c6\u6210\u8fdbpaddlepaddl":42,"dnn\u4f1a\u7528\u5230":42,"dnn\u5171\u540c\u4f7f\u7528":42,"dnn\u524d\u540e\u7684cnn\u7f51\u7edc\u6027\u80fd":42,"dnn\u5728\u53d1\u5e03":42,"dnn\u5b9e\u73b0":42,"dnn\u5e0c\u671b\u7684\u683c\u5f0f":42,"dnn\u6570\u5b66\u5e93":0,"dnn\u6570\u636e\u7684\u4e0d\u540c\u683c\u5f0f\u4ee5\u53ca\u76f8\u4e92\u4e4b\u95f4\u7684\u8f6c\u6362":42,"dnn\u7684":42,"dnn\u7684\u5e93\u76ee\u524d\u53ea\u6709\u52a8\u6001\u5e93":42,"dnn\u7684\u6027\u80fd":42,"dnn\u7684\u60c5\u51b5\u4e0b":42,"dnn\u7684\u64cd\u4f5c\u90fd\u662f\u76f4\u63a5\u8986\u76d6\u7684\u5f62\u5f0f":42,"dnn\u7684\u6d4b\u8bd5":42,"dnn\u7684\u73af\u5883\u4e0b":42,"dnn\u7684\u76f8\u5173\u529f\u80fd":42,"dnn\u7684\u7ed3\u679c":42,"dnn\u7684\u9ad8\u6027\u80fd\u683c\u5f0f\u4e0epaddlepaddle\u539f\u6709\u7684":42,"dnn\u7684layer":42,"dnn\u7684layers\u90fd\u4f1a\u7ee7\u627f\u4e8e":42,"docker\u5b89\u88c5\u65b9\u5f0f\u53ef\u4ee5\u8fdb\u5165docker\u5bb9\u5668\u6267\u884c":101,"docker\u5b89\u88c5\u8bf7\u53c2\u8003":77,"docker\u5b89\u88c5\u8bf7\u53c2\u8003docker\u7684\u5b98\u7f51":77,"docker\u5b98\u7f51":1,"docker\u5bb9\u5668\u4e2d\u5c06\u9ed8\u8ba4\u4f7f\u7528":116,"docker\u7684\u5b98\u7f51":77,"docker\u80fd\u5728\u6240\u6709\u4e3b\u8981\u64cd\u4f5c\u7cfb\u7edf":116,"docker\u955c\u50cf":1,"docker\u955c\u50cf\u4e3a\u4e86\u51cf\u5c0f\u4f53\u79ef":1,"docker\u955c\u50cf\u9ed8\u8ba4":1,"dockerhub\u7f51\u7ad9":1,"double\u7c7b\u578b\u65f6\u4e3a8":83,"dropout\u7684\u6bd4\u4f8b":74,"eigenscalar\u7684\u8f6c\u6362":76,"encode\u6210\u7684\u6700\u540e\u4e00\u4e2a\u5411\u91cf":111,"encoder\u548cdecoder\u53ef\u4ee5\u662f\u80fd\u591f\u5904\u7406\u5e8f\u5217\u7684\u4efb\u610f\u795e\u7ecf\u7f51\u7edc\u5355\u5143":113,"encoder\u8f93\u51fa":113,"entropy\u4f5c\u4e3acost":83,"enum":[12,14,20,47,55,56,65,66,71],"export":[1,30,35,77,78,91],"f\u4ee3\u8868\u4e00\u4e2a\u6d6e\u70b9\u6570":84,"fc1\u548cfc2\u5c42\u5728gpu\u4e0a\u8ba1\u7b97":105,"fc3\u5c42\u4f7f\u7528cpu\u8ba1\u7b97":105,"final":[5,6,21,35,48,49,67,70],"flatten\u65b9\u6cd5\u662f\u628apaddle\u4e2d\u7684\u4e00\u4e2atensor\u8fdb\u884creshape\u64cd\u4f5c":76,"float":[24,29,37,39,66,68,75,76,89,108],"float\u7b49":105,"forward\u7684output\u7684\u503c":81,"from\u65b9\u6cd5\u662f\u628apaddle\u4e2d\u7684\u4e00\u7ef4tensor\u8f6c\u4e3aeigen\u7684\u4e00\u7ef4tensor":76,"from\u662feigentensor\u6a21\u677f\u63d0\u4f9b\u7684\u4e00\u4e2a\u63a5\u53e3":76,"full\u53c2\u6570\u63d0\u4ea4":79,"function":[4,6,7,9,13,14,15,17,18,19,20,21,24,25,29,31,34,37,43,48,49,53,54,55,56,57,59,60,61,62,64,66,70,107,114,119],"function\u4f7f\u7528":82,"git\u6d41\u5206\u652f\u6a21\u578b":72,"github\u9996\u9875":72,"glibc\u81f3\u5c11\u5305\u542bglibc_2":3,"golang\u53ef\u4ee5\u4f7f\u7528":45,"golang\u7684":45,"gpu\u4e8c\u8fdb\u5236\u6587\u4ef6":0,"gpu\u5219\u8fd8\u9700\u8981\u9ad8\u5e76\u884c\u6027":108,"gpu\u6027\u80fd\u8c03\u4f18":106,"gpu\u6267\u884c":76,"gpu\u6838\u5728\u8bad\u7ec3\u914d\u7f6e\u4e2d\u6307\u5b9a":103,"gpu\u7684docker\u955c\u50cf\u7684\u65f6\u5019":78,"gpu\u7b49":63,"group\u6559\u7a0b":112,"group\u7684\u5b9e\u73b0\u65b9\u5f0f":82,"gru\u6216lstm":114,"h\u5e76\u4e0d\u56f0\u96be":45,"html\u5373\u53ef\u8bbf\u95ee\u672c\u5730\u6587\u6863":77,"i\u4ee3\u8868\u4e00\u4e2a\u6574\u6570":84,"id\u6307\u5b9a\u4f7f\u7528\u54ea\u4e2agpu\u6838":103,"id\u6307\u5b9a\u7684gpu":105,"id\u65e0\u6548":103,"image\u91cc":96,"images\u6570\u636e\u96c6\u4e0a\u4f20\u5230\u4e91\u7aef\u7684":11,"imikolov\u6570\u636e\u96c6":91,"import":[3,4,7,8,18,20,23,31,33,35,36,43,48,49,56,64,67,75,84,86,90,91,95],"infer\u63a5\u53e3\u7684\u8fd4\u56de\u503c\u662f\u4e00\u4e2apython":81,"ingress\u9700\u8981\u628apfsclient\u7684\u8eab\u4efd\u4fe1\u606f\u4f20\u7ed9pfsserv":27,"instance\u4e0e\u751f\u6210\u6570\u636e\u96c6\u65f6":11,"instance\u5305\u6db5\u4e24\u4e2a\u503c":11,"instance\u662f\u4e00\u6a21\u4e00\u6837\u7684":11,"int":[6,7,12,13,14,17,18,20,22,36,37,41,42,43,45,46,55,56,58,59,65,66,68,70,74,76,89,91,105],"interface\u6587\u4ef6\u7684\u5199\u6cd5\u975e\u5e38":45,"ios\u5e73\u53f0\u53ef\u9009\u914d\u7f6e\u53c2\u6570":117,"issue\u7f16\u53f7":72,"job\u662f\u672c\u6b21\u8bad\u7ec3\u5bf9\u5e94\u7684job":97,"job\u7684\u540d\u5b57":97,"kernel\u5b9e\u73b0":75,"kernel\u6ce8\u518ccpu\u5b9e\u73b0\u5728":75,"kernel\u7684\u5b9e\u73b0\u57fa\u4e8eeigen":75,"kubernetes\u4e3a\u8fd9\u6b21\u8bad\u7ec3\u521b\u5efa\u4e863\u4e2apod\u5e76\u4e14\u8c03\u5ea6\u5230\u4e863\u4e2anode\u4e0a\u8fd0\u884c":97,"kubernetes\u5206\u5e03\u5f0f\u8bad\u7ec3":94,"kubernetes\u5355\u673a\u8bad\u7ec3":94,"kubernetes\u53ef\u4ee5\u901a\u8fc7yaml\u6587\u4ef6\u6765\u521b\u5efa\u76f8\u5173\u5bf9\u8c61":97,"kubernetes\u5c31\u4f1a\u521b\u5efa3\u4e2apod\u4f5c\u4e3apaddlepaddle\u8282\u70b9\u7136\u540e\u62c9\u53d6\u955c\u50cf":97,"kubernetes\u6709job\u7c7b\u578b\u7684\u8d44\u6e90\u6765\u652f\u6301":96,"label\u662f\u539f\u59cb\u6570\u636e\u4e2d\u5bf9\u4e8e\u6bcf\u4e00\u53e5\u8bdd\u7684\u5206\u7c7b\u6807\u7b7e":111,"labels\u662f\u6bcf\u7ec4\u5185\u6bcf\u4e2a\u53e5\u5b50\u7684\u6807\u7b7e":111,"layer1\u5fc5\u987b\u662f\u4e00\u4e2a0\u5c42\u5e8f\u5217":110,"layer1\u5fc5\u987b\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":110,"layer\u4f5c\u4e3a\u4e00\u4e2a\u6574\u4f53\u6765\u5b9e\u73b0":82,"layer\u62ff\u5230\u7684\u7528\u6237\u8f93\u5165":113,"layer\u65f6":[42,82],"layer\u662f\u6211\u4eec\u7684\u79ef\u6728":84,"layer\u7684\u540e\u9762\u63a5\u6709cpu":42,"layer\u7c7b\u53ef\u4ee5\u81ea\u52a8\u8ba1\u7b97\u4e0a\u9762\u7684\u5bfc\u6570":74,"layer\u8ba1\u7b97\u7684\u8f93\u51fa":82,"linux\u4e2d":1,"list\u4e2d":90,"list\u4f5c\u4e3a\u68c0\u67e5\u5217\u8868":63,"list\u5982\u4e0b\u6240\u793a":105,"list\u6307\u5b9a\u6d4b\u8bd5\u7684\u6a21\u578b\u5217\u8868":105,"long":20,"memory\u4e0d\u80fd\u72ec\u7acb\u5b58\u5728":113,"memory\u4e5f\u53ef\u4ee5\u5177\u6709":114,"memory\u4e5f\u53ef\u4ee5\u662f\u5e8f\u5217":114,"memory\u53ea\u80fd\u5728":113,"memory\u53ef\u4ee5\u7f13\u5b58\u4e0a\u4e00\u4e2a\u65f6\u523b\u67d0\u4e00\u4e2a\u795e\u7ecf\u5143\u7684\u8f93\u51fa":111,"memory\u6307\u5411\u4e00\u4e2alay":113,"memory\u662f\u5728\u5355\u6b65\u51fd\u6570\u4e2d\u5faa\u73af\u4f7f\u7528\u7684\u72b6\u6001":114,"memory\u662fpaddlepaddle\u5b9e\u73b0rnn\u65f6\u5019\u4f7f\u7528\u7684\u4e00\u4e2a\u6982\u5ff5":111,"memory\u7684":114,"memory\u7684\u521d\u59cb\u72b6\u6001":113,"memory\u7684\u65f6\u95f4\u5e8f\u5217\u957f\u5ea6\u4e00\u81f4\u7684\u60c5\u51b5":111,"memory\u7684\u66f4\u591a\u8ba8\u8bba\u8bf7\u53c2\u8003\u8bba\u6587":113,"memory\u7684\u8f93\u51fa\u5b9a\u4e49\u5728":114,"memory\u7684i":113,"memory\u9ed8\u8ba4\u521d\u59cb\u5316\u4e3a0":113,"mkl\u5e93\u7684":41,"mklml\u4ee5\u53camkl":42,"mklml\u53ef\u4ee5\u4e0emkl":42,"mklml\u7684\u5e93\u76ee\u524d\u90fd\u662f\u52a8\u6001\u5e93":42,"mnist\u624b\u5199\u6570\u5b57\u8bc6\u522b\u76ee\u5f55":90,"mode\u4e0b\u7684\u7ed3\u679c":41,"model\u505a\u5206\u652f\u7ba1\u7406":63,"name\u7ec4\u5408\u53ef\u4ee5\u627e\u5230\u672c\u6b21\u8bad\u7ec3\u9700\u8981\u7684\u6587\u4ef6\u8def\u5f84":97,"ndarray\u7c7b\u578b\u7684\u503c\u548c\u6574\u578b\u7684\u503c":11,"ndk\u4e2d\u5305\u542b\u4e86\u6240\u6709android":116,"new":[5,6,7,8,9,12,13,14,15,16,19,20,21,24,29,30,40,41,43,47,49,53,58,59,60,62,66,67,70,72,74,95,119],"note\u7684\u4e66\u5199":63,"null":[35,74,89,103],"num\u51b3\u5b9a\u603b\u7aef\u53e3\u4e2a\u6570":91,"num_gradient_servers\u53c2\u6570":97,"num_samples_processed\u4e3a\u5df2\u8bad\u7ec3\u6837\u672c\u6570":83,"only\u7684\u4e8c\u8fdb\u5236":0,"op\u4e0d\u9700\u8981\u5b9a\u4e49opprotomak":75,"op\u5355\u5143\u6d4b\u8bd5\u7ee7\u627f\u81ea":75,"op\u5b9a\u4e49":75,"op\u6709\u8ba1\u7b97\u51fd\u6570":75,"op\u6ce8\u518c\u5b9e\u73b0\u5728":75,"op\u7684\u4fe1\u606f":42,"op\u8ba1\u7b97\u51fd\u6570\u7684\u57fa\u7c7b":75,"openmp\u7528\u4e8e\u63d0\u9ad8mklml\u7684\u6027\u80fd":42,"opprotomake\u5b9a\u4e49":75,"org\u5de5\u5177\u7684\u8be6\u7ec6\u4fe1\u606f":77,"org\u76ee\u524d\u9075\u5faa":63,"outer_mem\u662f\u4e00\u4e2a\u5b50\u53e5\u7684\u6700\u540e\u4e00\u4e2a\u5411\u91cf":111,"output\u53ef\u4ee5\u662f\u4efb\u610f\u7ef4\u5ea6\u7684tensor":76,"output\u6587\u4ef6\u5939\u5b58\u653e\u8bad\u7ec3\u7ed3\u679c\u4e0e\u65e5\u5fd7":97,"output\u7684\u539f\u6709shape\u4fe1\u606f\u4e0d\u53d8":76,"packages\u91cc\u9762":78,"packages\u91cc\u9762\u7684python\u5305":78,"packed\u4f18\u5316\u540elayer\u7684\u6d4b\u8bd5":41,"packed\u76f8\u5173\u529f\u80fd":41,"paddepaddle\u901a\u8fc7\u7f16\u8bd1\u65f6\u6307\u5b9a\u8def\u5f84\u6765\u5b9e\u73b0\u5f15\u7528\u5404\u79cdbla":0,"paddle\u4e00\u4e2a\u52a8\u6001\u5e93\u53ef\u4ee5\u5728\u4efb\u4f55linux\u7cfb\u7edf\u4e0a\u8fd0\u884c":45,"paddle\u4e2d\u7ecf\u5e38\u4f1a\u5c06\u65f6\u95f4\u5e8f\u5217\u6210\u4e3a":111,"paddle\u4e8c\u8fdb\u5236\u5728\u8fd0\u884c\u65f6\u6355\u83b7\u4e86\u6d6e\u70b9\u6570\u5f02\u5e38":81,"paddle\u5185\u5d4c\u7684python\u89e3\u91ca\u5668\u548c\u5916\u90e8\u4f7f\u7528\u7684python\u5982\u679c\u7248\u672c\u4e0d\u540c":45,"paddle\u5185\u90e8\u7684\u7c7b\u4e3ac":45,"paddle\u7684\u591a\u8bed\u8a00\u63a5\u53e3\u5b9e\u73b0\u5305\u62ec\u4e00\u4e0b\u51e0\u4e2a\u65b9\u9762":45,"paddle\u7684\u7c7b\u578b\u5168\u90e8\u9000\u5316\u6210":46,"paddle\u7684\u94fe\u63a5\u65b9\u5f0f\u6bd4\u8f83\u590d\u6742":45,"paddle\u7684c":46,"paddle\u8bad\u7ec3\u4efb\u52a1":11,"paddle\u8def\u5f84\u4e0b":46,"paddle\u9700\u8981\u4e00\u4e2a\u591a\u8bed\u8a00\u63a5\u53e3":45,"paddle\u9700\u8981\u66b4\u9732\u7684api\u5f88\u591a":46,"paddle\u9759\u6001\u5e93\u94fe\u63a5\u590d\u6742":45,"paddle_\u7c7b\u578b\u540d":46,"paddle_\u7c7b\u578b\u540d_\u51fd\u6570\u540d":46,"paddlepaddle\u4e2d":[110,113],"paddlepaddle\u4e2d\u4e00\u4e2a\u8ba1\u7b97\u5c42\u7684\u8f93\u51fa\u6570\u636e\u7ec4\u7ec7\u65b9\u5f0f\u548c\u8f93\u5165\u6570\u636e\u7ec4\u7ec7\u65b9\u5f0f\u5b8c\u5168\u76f8\u540c":89,"paddlepaddle\u4e2d\u7684\u8bb8\u591alayer\u5e76\u4e0d\u5728\u610f\u8f93\u5165\u662f\u5426\u662f\u65f6\u95f4\u5e8f\u5217":111,"paddlepaddle\u4e2d\u7684cudnn\u90e8\u5206\u4f7f\u7528\u7684\u4e5f\u662f":42,"paddlepaddle\u4e2d\u795e\u7ecf\u7f51\u7edc\u8ba1\u7b97\u5c42\u8f93\u5165":89,"paddlepaddle\u4e2d\u8fd8\u5305\u542b":82,"paddlepaddle\u4e2d\u901a\u8fc7reader\u6765\u52a0\u8f7d\u6570\u636e":84,"paddlepaddle\u4e3a\u4ea4\u53c9\u7f16\u8bd1\u63d0\u4f9b\u4e86\u5de5\u5177\u94fe\u914d\u7f6e\u6587\u6863":[116,117],"paddlepaddle\u4e3a\u6df1\u5ea6\u5b66\u4e60\u7814\u7a76\u4eba\u5458\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684api":84,"paddlepaddle\u4e3ano":1,"paddlepaddle\u4f1a\u81ea\u52a8\u8bbe\u5b9a":82,"paddlepaddle\u4f7f\u7528\u540c\u6b65\u5c4f\u969c":92,"paddlepaddle\u4f7f\u7528\u5747\u503c0":83,"paddlepaddle\u4f7f\u7528avx":78,"paddlepaddle\u4f7f\u7528git":63,"paddlepaddle\u4fdd\u5b58\u7684\u6a21\u578b\u53c2\u6570\u6587\u4ef6\u5185\u5bb9\u753116\u5b57\u8282\u5934\u4fe1\u606f\u548c\u7f51\u7edc\u53c2\u6570\u4e24\u90e8\u5206\u7ec4\u6210":83,"paddlepaddle\u4fdd\u5b58\u7684\u6a21\u578b\u53c2\u6570\u6587\u4ef6\u524d16\u5b57\u8282\u4e3a\u5934\u4fe1\u606f":83,"paddlepaddle\u53d1\u5e03\u7684\u5b89\u88c5\u5305\u4f1a\u5c3d\u91cf\u5bf9\u9f50":3,"paddlepaddle\u53ef\u4ee5\u4f7f\u7528\u5e38\u7528\u7684python\u5305\u7ba1\u7406\u5de5\u5177":3,"paddlepaddle\u53ef\u4ee5\u4f7f\u7528cudnn":0,"paddlepaddle\u53ef\u4ee5\u540c\u65f6\u652f\u6301\u540c\u6b65\u968f\u673a\u68af\u5ea6\u4e0b\u964d":92,"paddlepaddle\u53ef\u4ee5\u6bd4\u8f83\u7b80\u5355\u7684\u5224\u65ad\u54ea\u4e9b\u8f93\u51fa\u662f\u5e94\u8be5\u8de8\u8d8a\u65f6\u95f4\u6b65\u7684":111,"paddlepaddle\u53ef\u4ee5\u901a\u8fc7\u8be5\u673a\u5236\u5224\u65ad\u662f\u5426\u5df2\u7ecf\u6536\u96c6\u9f50\u6240\u6709\u7684\u68af\u5ea6":74,"paddlepaddle\u5728\u5b9e\u73b0rnn\u7684\u65f6\u5019":111,"paddlepaddle\u5728\u6fc0\u6d3b\u51fd\u6570\u91cc\u5b9e\u73b0dropout":82,"paddlepaddle\u5728\u7f16\u8bd1\u65f6":0,"paddlepaddle\u5b58\u7684\u662f\u6709\u503c\u4f4d\u7f6e\u7684\u7d22\u5f15":84,"paddlepaddle\u5b89\u88c5\u5305\u7531\u4e8e\u4e0d\u4ec5\u4ec5\u5305\u542b":3,"paddlepaddle\u5c06\u4f1a\u6839\u636e":117,"paddlepaddle\u5c06\u4f1a\u81ea\u52a8\u9009\u62e9":117,"paddlepaddle\u5c06\u6839\u636e":116,"paddlepaddle\u5c06\u81ea\u52a8\u4e0b\u8f7d\u548c\u7f16\u8bd1\u6240\u6709\u7b2c\u4e09\u65b9\u4f9d\u8d56\u5e93":[116,117,118],"paddlepaddle\u5e93\u5df2\u7ecf\u5b89\u88c5\u5b8c\u6210":117,"paddlepaddle\u5f00\u53d1\u8fc7\u7a0b\u4f7f\u7528":63,"paddlepaddle\u63d0\u4f9b\u4e13\u7528\u7684":11,"paddlepaddle\u63d0\u4f9b\u4e86\u57fa\u4e8edocker\u7684\u5b89\u88c5\u65b9\u5f0f":2,"paddlepaddle\u63d0\u4f9b\u4e86\u591a\u79cdpython":2,"paddlepaddle\u63d0\u4f9b\u4e86c":88,"paddlepaddle\u63d0\u4f9b\u7684":82,"paddlepaddle\u652f\u6301":0,"paddlepaddle\u652f\u6301\u4e0d\u540c\u7c7b\u578b\u7684\u8f93\u5165\u6570\u636e":84,"paddlepaddle\u652f\u6301\u4f7f\u7528pip\u5feb\u901f\u5b89\u88c5":86,"paddlepaddle\u652f\u6301\u7528\u6237\u7075\u6d3b\u5730\u8bbe\u7f6e\u5404\u79cd\u547d\u4ee4\u884c\u53c2\u6570":104,"paddlepaddle\u652f\u6301\u975e\u5e38\u591a\u7684\u4f18\u5316\u7b97\u6cd5":81,"paddlepaddle\u652f\u6301sparse\u7684\u8bad\u7ec3":81,"paddlepaddle\u6587\u6863\u4f7f\u7528":77,"paddlepaddle\u662f\u6e90\u4e8e\u767e\u5ea6\u7684\u4e00\u4e2a\u6df1\u5ea6\u5b66\u4e60\u5e73\u53f0":84,"paddlepaddle\u6bcf\u6b21\u53d1\u65b0\u7684\u7248\u672c":63,"paddlepaddle\u6bcf\u6b21\u53d1\u7248\u672c\u9996\u5148\u8981\u4fdd\u8bc1paddlepaddl":63,"paddlepaddle\u7684":96,"paddlepaddle\u7684\u4e3b\u7248\u672c\u5e93\u9075\u5faa":63,"paddlepaddle\u7684\u5185\u5b58\u5360\u7528\u4e3b\u8981\u5206\u4e3a\u5982\u4e0b\u51e0\u4e2a\u65b9\u9762":81,"paddlepaddle\u7684\u53c2\u6570\u4f7f\u7528\u540d\u5b57":83,"paddlepaddle\u7684\u5404\u7248\u672c\u955c\u50cf\u53ef\u4ee5\u53c2\u8003":96,"paddlepaddle\u7684\u5b89\u88c5\u53ef\u4ee5\u53c2\u8003":101,"paddlepaddle\u7684\u5df2\u7ecf\u5b89\u88c5\u5b8c\u6210":116,"paddlepaddle\u7684\u6240\u6709layer\u90fd\u6709\u552f\u4e00\u7684nam":82,"paddlepaddle\u7684\u6587\u6863\u5305\u62ec\u82f1\u6587\u6587\u6863":77,"paddlepaddle\u7684\u6587\u6863\u6784\u5efa\u6709\u4e09\u79cd\u65b9\u5f0f":77,"paddlepaddle\u7684\u6e90\u7801":72,"paddlepaddle\u7684\u7f16\u8bd1\u9009\u9879":0,"paddlepaddle\u7684activation\u4f1a\u76f4\u63a5\u4f7f\u7528":42,"paddlepaddle\u7684bas":74,"paddlepaddle\u7684c":116,"paddlepaddle\u7684cmake\u7cfb\u7edf\u4f1a\u81ea\u52a8\u7f16\u8bd1\u6240\u6709\u7684\u7b2c\u4e09\u65b9\u4f9d\u8d56\u5e93":117,"paddlepaddle\u7684cmake\u7cfb\u7edf\u5c06\u6839\u636e\u8be5\u503c\u81ea\u52a8\u63a8\u5bfc\u548c\u8bbe\u7f6e\u9700\u8981\u4f7f\u7528\u7684\u4ea4\u53c9\u7f16\u8bd1\u5668":116,"paddlepaddle\u7684cmake\u7cfb\u7edf\u5c06\u6839\u636e\u8be5\u503c\u81ea\u52a8\u8bbe\u7f6e\u9700\u8981\u4f7f\u7528\u7684\u4ea4\u53c9\u7f16\u8bd1\u5668":118,"paddlepaddle\u7684cmake\u7cfb\u7edf\u624d\u8ba4\u4e3a\u5728\u662f\u5728\u4ea4\u53c9\u7f16\u8bd1raspberri":118,"paddlepaddle\u7684cmake\u7cfb\u7edf\u624d\u8ba4\u4e3a\u662f\u5728\u4ea4\u53c9\u7f16\u8bd1android\u7cfb\u7edf\u7684\u7248\u672c":116,"paddlepaddle\u7684dock":96,"paddlepaddle\u7684softmax\u4e0d\u80fd\u6307\u5b9a\u8ba1\u7b97\u7ef4\u5ea6":82,"paddlepaddle\u76ee\u524d\u53ea\u652f\u6301\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u4e2d":111,"paddlepaddle\u76ee\u524d\u63d0\u4f9b\u4e24\u79cd\u53c2\u6570\u521d\u59cb\u5316\u7684\u65b9\u5f0f":83,"paddlepaddle\u76ee\u524d\u652f\u63018\u79cdlearning_rate_schedul":83,"paddlepaddle\u793e\u533a":80,"paddlepaddle\u7f16\u8bd1\u9700\u8981\u4f7f\u7528\u5230\u4e0b\u9762\u7684\u4f9d\u8d56":0,"paddlepaddle\u82e5\u68c0\u6d4b\u5230\u7528\u6237\u4f7f\u7528\u7684cmake\u7248\u672c\u4e0d\u4f4e\u4e8e3":116,"paddlepaddle\u8d1f\u8d23\u5b8c\u6210\u4fe1\u606f\u548c\u68af\u5ea6\u5728\u65f6\u95f4\u5e8f\u5217\u4e0a\u7684\u4f20\u64ad":113,"paddlepaddle\u8d1f\u8d23\u5b8c\u6210\u4fe1\u606f\u548c\u8bef\u5dee\u5728\u65f6\u95f4\u5e8f\u5217\u4e0a\u7684\u4f20\u64ad":113,"paddlepaddle\u9488\u5bf9\u4e0d\u540c\u7684\u7528\u6237\u7fa4\u4f53\u63d0\u4f9b\u4e86\u591a\u79cd\u5b89\u88c5\u65b9\u5f0f":2,"paddlepaddle\u955c\u50cf\u9700\u8981\u63d0\u4f9b":97,"paddlepaddle\u9700\u8981\u4f7f\u7528docker\u73af\u5883\u5b8c\u6210\u7f16\u8bd1":0,"pass\u4e2a\u6a21\u578b\u5230\u7b2c":103,"pass\u5c06\u4e0d\u8d77\u4f5c\u7528":103,"pass\u8f6e\u5f00\u59cb\u8bad\u7ec3":103,"pass\u8f6e\u7684\u6a21\u578b\u7528\u4e8e\u6d4b\u8bd5":103,"passes\u8f6e":103,"patch\u53f7":63,"patch\u53f7\u52a0\u4e00":63,"path\u6307\u5b9a\u6d4b\u8bd5\u7684\u6a21\u578b":105,"perftools\u6765\u8fdb\u884c\u6027\u80fd\u5206\u6790":107,"period\u4e2a\u6279\u6b21\u5bf9\u6240\u6709\u6d4b\u8bd5\u6570\u636e\u8fdb\u884c\u6d4b\u8bd5":103,"period\u4e2a\u6279\u6b21\u6253\u5370\u65e5\u5fd7\u8fdb\u5ea6":103,"period\u4e2a\u6279\u6b21\u8f93\u51fa\u53c2\u6570\u7edf\u8ba1":103,"period\u4e2a\u6279\u6b21\u8f93\u51fa\u7b26\u53f7":103,"period\u6574\u9664":103,"period\u8f6e\u4fdd\u5b58\u8bad\u7ec3\u53c2\u6570":103,"pfsclient\u9700\u8981\u548cingress\u4e4b\u95f4\u505a\u53cc\u5411\u9a8c\u8bc1":27,"pfsclient\u9700\u8981\u5728\u4f20\u8f93\u5b8c\u6bd5\u6700\u540e\u4e00\u4e2achunk\u7684\u65f6\u5019\u68c0\u67e5destination\u6587\u4ef6\u7684md5\u503c\u662f\u5426\u548csource\u6587\u4ef6\u4e00\u81f4":27,"pfsserver\u63d0\u4f9brest":27,"pi\u5e73\u53f0\u4e0a\u9002\u7528\u7684paddlepaddle\u7684\u65b9\u6cd5\u548c\u6b65\u9aa4":118,"pi\u7248\u672c\u7684\u5e93":118,"pi\u7248\u672cpaddlepaddle\u5e93\u65f6":118,"pi\u7684\u914d\u7f6e\u4fe1\u606f\u5728":118,"pi\u7cfb\u7edf\u4e0a\u6765\u6784\u5efa":118,"pi\u7cfb\u7edf\u7684\u7248\u672c":118,"pserver\u5730\u5740\u7b49\u53c2\u6570\u4f7ftrainer\u53ef\u4ee5\u6b63\u786e\u8fde\u63a5\u5230pserv":91,"pserver\u76d1\u542c\u7684\u8d77\u59cb\u7aef\u53e3":91,"public":[7,19,29,32,37,55,61,64,66,67,68,70,74,75,76,95,96],"pwd\u53d8\u91cf\u4f1a\u5c55\u5f00\u4e3a\u5f53\u524d\u8def\u5f84\u7684\u7edd\u5bf9\u8def\u5f84":1,"py\u4e2d":63,"py\u7a0b\u5e8f":3,"pydataprovider\u4f7f\u7528\u7684\u662f\u5f02\u6b65\u52a0\u8f7d":81,"pypi\u4e0a\u7684package\u540d\u79f0\u4e3apaddlepaddle\u548cpaddlepaddl":63,"pypi\u4e0d\u652f\u6301\u8986\u76d6\u4e0a\u4f20":63,"pypi\u5b89\u88c5\u5305\u53ef\u4ee5\u5728":3,"python\u5b89\u88c5\u5305\u652f\u6301linux":78,"python\u5c01\u88c5\u7684\u5b9e\u73b0\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u5728\u914d\u7f6e\u6587\u4ef6\u4e2d\u4f7f\u7528\u65b0\u5b9e\u73b0\u7684\u7f51\u7edc\u5c42":74,"python\u5e93yep":107,"python\u6807\u51c6\u5e93\u4e2d\u63d0\u4f9b\u4e86\u6027\u80fd\u5206\u6790\u7684\u5de5\u5177\u5305":107,"reader\u7684\u4f7f\u7528\u65b9\u5f0f\u90fd\u662f\u4e00\u81f4\u7684":11,"reader\u8f93\u51fa\u7684data":11,"recommendation\u6587\u4ef6\u5939\u5185\u5b58\u653e\u8bad\u7ec3\u6587\u4ef6":97,"request\u524d":72,"request\u7684":72,"request\u88ab\u5408\u5e76\u540e":72,"resnet\u7684mkl":42,"return":[4,5,6,7,11,12,14,17,18,19,20,25,29,31,32,33,35,37,38,40,43,48,49,50,55,56,57,61,64,66,68,70,74,76,81,83,84,95,97,114],"rnn\u5373\u65f6\u95f4\u9012\u5f52\u795e\u7ecf\u7f51\u7edc":111,"rnn\u5bf9\u4e8e\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u901a\u8fc7\u4e86\u4e00\u4e2alstm\u7f51\u7edc":111,"rnn\u603b\u662f\u5f15\u7528\u4e0a\u4e00\u65f6\u523b\u9884\u6d4b\u51fa\u7684\u8bcd\u7684\u8bcd\u5411\u91cf":113,"rnn\u6a21\u578b":106,"rnn\u90e8\u5206\u4e2d":41,"rnn\u914d\u7f6e":112,"root\u66ff\u6362\u4e3apaddlepaddle\u9884\u6d4b\u5e93\u7684\u5b89\u88c5\u8def\u5f84":87,"s3\u4e4b\u7c7b\u7684\u5206\u5e03\u5f0f\u5b58\u50a8\u4e4b\u4e0a":11,"sdk\u7684\u63a5\u53e3\u9700\u8981\u662f\u6ee1\u8db3c\u6807\u51c6\u7684\u63a5\u53e3":88,"search\u7684\u65b9\u6cd5":103,"sentences\u662f\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217\u7684\u6570\u636e":111,"seq\u53c2\u6570\u5fc5\u987b\u4e3afals":113,"server\u4e2a\u6279\u6b21\u6253\u5370\u65e5\u5fd7\u8fdb\u5ea6":103,"server\u4e4b\u4e0a":10,"server\u4e4b\u95f4\u7684\u7f51\u7edc\u5e26\u5bbd":10,"server\u4f1a\u6682\u505c\u53c2\u6570\u66f4\u65b0\u5e76\u7b49\u5f85":10,"server\u4f1a\u83b7\u53d6parameters\u5185\u5b58\u7684":10,"server\u5185\u5b58\u4e2d\u7684\u6a21\u578b\u6570\u636e\u7684\u5b8c\u6574\u955c\u50cf":10,"server\u540c\u6b65\u7684\u4fdd\u5b58\u4e00\u4e2a\u7279\u5b9a\u65f6\u95f4\u70b9\u7684\u5168\u5c40\u68c0\u67e5\u70b9":10,"server\u5728\u96c6\u7fa4\u4e2d\u542f\u52a8\u540e":10,"server\u6545\u969c\u540e\u88abkubernetes\u91cd\u65b0\u542f\u52a8":10,"server\u6b64\u65f6\u8fd8\u9700\u8981\u901a\u8fc7\u7f51\u7edc\u8bbf\u95ee\u5206\u5e03\u5f0f\u5b58\u50a8\u4ee5\u4fdd\u5b58\u5feb\u7167":10,"server\u751f\u6210\u4e00\u4e2auuid":10,"server\u7684\u5355\u70b9\u6216\u591a\u70b9\u540c\u65f6\u6545\u969c":10,"server\u7684\u6570\u636e\u5feb\u7167":10,"server\u7684\u68c0\u67e5\u70b9\u5404\u81ea\u72ec\u7acb\u4fdd\u5b58":10,"server\u7b2c\u4e00\u6b21\u542f\u52a8\u6216\u4efb\u610f\u65f6\u95f4paramet":10,"short":[29,33,58,64,67,70],"simd\u6307\u4ee4\u63d0\u9ad8cpu\u6267\u884c\u6548\u7387":78,"size\u4e3a512":103,"size\u53ef\u80fd\u4f1a\u5bf9\u8bad\u7ec3\u7ed3\u679c\u4ea7\u751f\u5f71\u54cd":81,"size\u672c\u8eab\u662f\u795e\u7ecf\u7f51\u7edc\u7684\u8d85\u53c2\u6570":81,"softmax\u6fc0\u6d3b\u7684\u8f93\u51fa\u7684\u548c\u603b\u662f1":74,"sparse\u8bad\u7ec3\u9700\u8981\u8bad\u7ec3\u7279\u5f81\u662f":81,"static":[14,46,64,66,95,119],"step\u51fd\u6570\u4e2d\u7684memori":113,"step\u51fd\u6570\u5185\u90e8\u53ef\u4ee5\u81ea\u7531\u7ec4\u5408paddlepaddle\u652f\u6301\u7684\u5404\u79cdlay":113,"store\u4e0b\u8f7d\u5b89\u88c5xcode\u5373\u53ef":117,"subseq\u7684\u6bcf\u4e2a\u5143\u7d20\u662f\u4e00\u4e2a0\u5c42\u5e8f\u5217":110,"super":[58,74],"swig\u652f\u6301\u7684\u8bed\u8a00\u6216\u8005\u89e3\u91ca\u5668\u6709\u5c40\u9650":45,"swig\u66b4\u9732\u7684\u63a5\u53e3\u4fdd\u7559\u4e86c":45,"swig\u751f\u6210\u7684\u4ee3\u7801\u4e0d\u80fd\u4fdd\u8bc1\u591a\u8bed\u8a00\u4ee3\u7801\u98ce\u683c\u7684\u4e00\u81f4\u6027":45,"swig\u76f4\u63a5\u8bfb\u53d6c":45,"swig\u9700\u8981\u5199\u4e00\u4e2ainterface\u6587\u4ef6":45,"switch":[7,46,95],"tag\u4e3a":63,"tag\u53ef\u4ee5\u662flatest\u6216latest":63,"tag\u7684\u66f4\u65b0\u65f6\u95f4\u662f\u5426\u5728\u4e0a\u8ff0\u7f16\u8bd1wheel\u5305\u5b8c\u6210\u540e\u662f\u5426\u6700\u65b0":63,"tensor\u5230\u5bf9eigentensor\u7684\u8f6c\u6362":76,"tensor\u5230eigentensor":76,"tensor\u5b9a\u4e49\u5728framework\u76ee\u5f55\u4e0b":76,"tensor\u662f\u4e00\u4e2a\u6b63\u5728\u5f00\u53d1\u4e2d\u7684\u6a21\u5757":76,"tensor\u6a21\u5757\u5bf9el":76,"tensor\u6a21\u5757\u6765\u5b9e\u73b0":75,"tensor\u6a21\u5757\u7684\u6587\u6863\u8f83\u5c11":76,"tensor\u6a21\u5757\u7684\u8be6\u7ec6\u4ecb\u7ecd\u8bf7\u53c2\u8003":76,"tests\u7684paddlepaddl":72,"tflops\u4e86":108,"throw":95,"tottime\u7684\u6bcf\u6b21\u8c03\u7528\u5e73\u5747\u65f6\u95f4":107,"trainer\u542f\u52a8\u9700\u8981\u4f20\u5165\u7aef\u53e3":91,"trainer\u63a5\u6536\u4e09\u4e2a\u53c2\u6570":84,"trainer\u8282\u70b9\u4e2a\u6570":91,"trainer\u9700\u8981\u548cpserver\u4fdd\u6301\u7f51\u7edc\u8054\u901a\u4ee5\u5b8c\u6210\u8bad\u7ec3":91,"true":[4,6,7,12,30,36,41,50,52,56,57,58,59,63,66,70,74,81,83,89,90,91,95,97,105,114],"true\u8868\u793a\u53cd\u5411\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":114,"try":[8,9,12,13,14,30,35,40,43,59,64,67,78],"type\u5b57\u6bb5\u5747\u4e0d\u5c3d\u76f8\u540c":46,"type\u6307\u5b9a\u4e3a":107,"ubuntu\u4e0b\u5b89\u88c5\u547d\u4ee4\u4e3a":107,"unit\u5728\u4e00\u4e2a\u65f6\u95f4\u6b65\u5185\u8ba1\u7b97\u5f97\u5230\u7684\u4e2d\u95f4\u503c":82,"unsupported\u6a21\u5757":75,"update\u53c2\u6570\u65f6\u624d\u6709\u6548":103,"v1\u7248\u672c":78,"var":[6,7,18,19,20,24,31,32,34,36,39,50,52,56,57,58,60,64,69,70,77],"vector\u662frank\u4e3a1\u7684tensor":76,"void":[7,12,14,19,26,29,31,32,37,39,43,44,45,46,56,57,65,66,68,74,75,76],"wheel\u5305":[2,63],"while":[7,16,19,20,30,35,38,40,49,53,54,59,62,64,68,97],"wise\u8ba1\u7b97\u63d0\u4f9b\u4e86\u5f3a\u5927\u7684\u652f\u6301":76,"wmt14\u6570\u636e\u7684\u63d0\u4f9b\u6587\u4ef6\u5728":114,"words\u5373\u4e3a\u8fd9\u4e2a\u6570\u636e\u4e2d\u7684\u5355\u5c42\u65f6\u95f4\u5e8f\u5217":111,"words\u662f\u539f\u59cb\u6570\u636e\u4e2d\u7684\u6bcf\u4e00\u53e5\u8bdd":111,"x86_64\u548cmaco":78,"x\u4e0ey\u4e3a\u4e4b\u524d\u63cf\u8ff0\u7684\u8f93\u5165\u5c42":84,"x\u548cwindow":116,"y\u8868\u793a\u8f93\u5165\u6570\u636e\u662f\u4e00\u4e2a\u7ef4\u5ea6\u4e3a1\u7684\u7a20\u5bc6\u5411\u91cf":84,"yaml\u6587\u4ef6\u4e2d\u5404\u4e2a\u5b57\u6bb5\u7684\u5177\u4f53\u542b\u4e49":97,"yaml\u6587\u4ef6\u63cf\u8ff0\u4e86\u8fd9\u6b21\u8bad\u7ec3\u4f7f\u7528\u7684docker\u955c\u50cf":97,"zero\u4e09\u79cd\u64cd\u4f5c":103,AGE:[95,96],AWS:[11,94,99,100],Added:67,And:[12,16,17,26,33,35,47,51,52,55,59,64,68,95],But:[5,32,38,47,55,64,78,119],For:[4,6,7,13,14,15,17,18,19,21,24,25,30,31,32,34,37,39,40,44,47,48,49,53,54,55,56,57,58,59,60,61,62,65,66,67,68,71,108,119],IDE:0,IDs:[16,20,49],IRs:21,Into:95,Its:[31,65,95],K8s:119,NOT:58,Not:[4,9,40,67,119],OPs:[21,23],One:[5,16,39,44,47,52,64,67],Ops:[60,62,66],PFS:27,QoS:96,Such:[37,58,67,70],TLS:[4,27,95],That:[19,51],The:[4,5,6,8,9,13,15,16,17,19,20,21,23,24,25,28,29,31,35,38,39,40,43,44,46,48,49,51,52,53,55,56,57,58,59,62,64,65,66,67,68,70,71,74,75,76,89,95,97,109],Their:9,Then:[18,21,32,37,40,51,55,57,95],There:[4,7,8,9,14,16,17,20,21,28,29,30,35,40,47,48,49,52,53,54,55,58,62,64,65,68,95],These:[6,7,19,24,29,34,50,62,65,66,67,71],Use:[4,22,28,59,60,67,95],Used:[60,68],Uses:40,Using:[9,30,53,59,60,62,64],VPS:95,Will:84,With:[18,19,24,30,51,56,67,70],YES:17,Yes:[1,42],___embedding_0__:97,___embedding_1__:97,___fc_layer_0__:95,__align__:29,__cuda_align__:29,__device__:29,__doc__:66,__file__:17,__forceinline__:29,__fp16:29,__global__:29,__hadd:29,__half:29,__half_raw:29,__impl__:66,__init__:[24,25,33,40,50,58,70,74,107],__main__:33,__name__:33,__rnn_step__:114,__square_error_cost_0__:97,__va_args__:61,__x:29,_addup_repetitive_outputs_:6,_append_backward_ops_:[6,24],_append_backward_vars_:6,_binari:8,_create_global_var:58,_def:40,_dtype:35,_filer:43,_filter:43,_fwd:43,_fwd_pd:43,_input:43,_librari:8,_live_in:40,_live_out:40,_loss:33,_op:[35,75],_output:43,_presucessor:40,_program:40,_recurrent_group:114,_remove_no_grad_branch_:6,_reorder_input:43,_source_language_embed:114,_src_input:43,_sucessor:40,_target_language_embed:114,_test:8,_update_op:25,_use:40,_value_index:35,a75:29,a_op:75,a_prev:67,aaaaa:11,aaaaaaaaaaaaa:95,abil:33,abl:[4,6,19,20,21,37,50,55,58,119],about:[5,7,8,17,23,28,31,40,48,59,64,66,67,68,95],abov:[4,5,6,7,8,9,13,18,20,21,29,30,31,32,34,39,43,48,49,50,51,53,55,56,58,66,67,69,70,95,108,119],abs:[5,33,81],abs_numerical_grad:5,absolut:[5,81],acc:21,acceler:[10,42,51,53],accept:[4,60],access:[4,8,13,16,17,18,20,21,58],accessmod:95,accessor:58,accord:[5,6,14,21,23,34,49,60,70],account:[60,119],accoust:67,accrodingli:12,accumul:[9,14,25,51,52,53,67],accur:[5,16,55],accuraci:[25,67],achiev:[19,23,51,52,53,67,68],acquir:30,across:[6,21,48,67],act1:35,act2:35,act:[7,19,21,35,49,58,70,81,84,86,109,114],act_output:66,act_typ:35,action:[20,95],activ:[8,35,40,49,52,55,58,62,66,81,84,86,114],actual:[12,24,30,33,35,39,43,47,53,66,68],actual_layout:43,adadelta:81,adagrad:[53,65,91],adagradoptim:50,adam:[4,14,21,33,83],adapt:[39,55],add:[5,6,7,8,12,16,19,20,21,23,25,29,32,36,38,50,52,53,57,58,60,62,64,68,69,72,76,78,109],add_activ:58,add_bia:58,add_depend:8,add_execut:8,add_input:[48,74],add_memori:48,add_output:48,add_scalar:[7,49,56],add_sum:58,add_test:[8,74],add_to:82,add_two:[7,48],add_unittest_without_exec:74,addattr:[52,66,75],addbia:74,addcom:[66,75],added:[7,23,24,29,47,51,53,62,72],adding:62,addinput:[52,66,75],addit:[6,20,51,55,60,62,70,71],addition:48,addmemori:43,addop:32,addoutput:[52,75],addprimit:43,addprimitivedesc:43,addr:9,address:[9,14,18,21,89,119],addrow:74,addtyp:66,adjust:[6,24],admin:119,administr:[16,119],adopt:[29,33],advanc:5,advantag:[5,29,30,53,59],adversari:[33,59],affect:7,affili:49,afford:13,aforement:8,after:[6,7,8,13,14,16,21,22,23,24,26,28,29,40,43,51,54,55,58,67,72,95],aftern:55,again:[4,5,9,53],against:95,age:97,agg_level:[110,111],aggreg:[25,51,95],aggregatelevel:[110,111],ago:8,agre:[81,84],ahead:67,alexnet_pass1:105,alexnet_pass2:105,algo:43,algo_hrnn_demo:111,algorithm:[6,13,24,39,40,43,49,53,62,67],all:[4,6,7,8,9,12,14,16,17,18,19,20,21,22,24,26,28,30,33,34,35,38,40,43,44,46,47,48,49,50,51,52,53,55,56,58,60,66,67,68,78,81,84,95,97,109,113,119],all_output_nam:6,alloc:[14,17,40,43,52,68,76,109],allow:[4,14,18,19,21,24,30,52,53,62,95],allow_only_one_model_on_one_gpu:[102,103,105],allreduc:[51,52],alpha:[8,62],alreadi:[8,9,30,43,58,64,78,95],also:[4,6,7,8,12,15,19,20,21,29,30,32,33,34,35,38,40,47,48,49,52,53,54,55,56,57,58,59,62,64,66,67,68,70,71,108,119],although:[6,51],altogeth:119,alwai:[8,20,44,65,95,97],amazon:[95,96],amazonaw:95,amazonec2fullaccess:95,amazonelasticfilesystemfullaccess:95,amazonroute53domainsfullaccess:95,amazonroute53fullaccess:95,amazons3fullaccess:95,amazonvpcfullaccess:95,ambigu:[59,67],amd64:95,amd:47,amend:72,amodei:67,among:95,amort:51,analys:55,analysi:55,analyz:40,ancestor:[56,58],andd:95,andrew:40,android:116,android_abi:116,android_api:116,android_arm_neon:116,android_native_api_level:116,android_standalone_toolchain:116,android_toolchain:116,androideabi:116,ani:[4,8,9,14,16,17,18,20,21,26,29,30,37,39,40,44,49,51,52,53,58,59,61,62,67,81,84,95],announc:29,anoth:[4,6,7,17,19,20,30,31,39,43,49,58,64,66,68,95],anroid_arm_mod:116,ans:95,answer:[18,30,95],anymor:51,anyth:[49,59,95],anytim:33,apach:[42,81,84],apart:71,api:[3,4,6,8,14,15,17,18,19,25,27,32,33,35,48,54,55,60,63,70,71,87,88,89,90,91,95,97,107,108,116,119],api_pydataprovider2:81,api_shar:8,api_test:8,api_trainer_config_helpers_lay:114,api_v2:110,apiserv:95,apivers:[95,96,97],appar:6,appear:[18,30,34,68],appel:40,append:[6,24,25,49,58,59,67,91,97,114],append_backward:[6,50,52,107],append_clip_op:24,append_op:[24,38,58],append_oper:58,append_optim:52,appleyard:108,appli:[33,34,51,52,55,64],applic:[18,20,29,30,31,34,58,60,81,84,95,96,108,119],applyl1:12,appreci:67,approach:[21,22,23,51,53,54,62,67,119],approxim:53,apt:[1,107],arbitrari:[21,44],arch:116,archetectur:67,architectur:[29,67],archiv:[45,46,87],area:33,arg:[6,35,50,66,75,83,97],argpars:97,args_ext:97,argu:57,argument:[6,7,12,13,21,50,54,57,58,87,89,90,97,116],argumentpars:97,arithmet:29,arm64:[116,117],arm64_standalone_toolchain:116,arm:[29,116,117,118],arm_standalone_toolchain:116,armeabi:116,armv7:[29,117],armv8:29,arn:95,around:[16,40,58,95,119],arrai:[14,18,20,34,49,56,58,59,60,70,75,81,83,84,89],arrang:70,arrari:89,array_to_lod_tensor:40,arrow:33,articl:[31,34,72],artifact:[63,95],artifici:40,arxiv:[33,67],as_step_input:7,asduplic:52,asgd:53,ask:[6,9,16],asr:67,assgin:40,assign:[6,13,18,22,24,29,31,51,67,89,95,119],assigne:67,assignmemt:40,associ:[54,61],assum:[7,21,43],assumpt:21,ast:18,astyp:[59,75],asyc:9,async:[9,23,52,102],async_count:[102,103],async_lagged_grad_discard_ratio:[91,103],async_lagged_ratio_default:[102,103],async_lagged_ratio_min:[102,103],asynchron:[9,20,51,55],atom:22,attr:[7,18,35,38,43,56,57,58,66,75,81,82,83,114],attr_map:66,attrdesc:56,attribu:43,attribut:[6,7,23,24,38,56,58,60,64,66,70],attributemap:75,attrproto:66,attrtyp:[56,66,75],attrvalu:66,auc:[25,102],audio:67,augment:67,authent:95,author:[27,67,81,84,95],auto:[0,7,12,22,31,39,43,45,57,60,64,70,74,75,76,108],autom:95,automat:[4,6,14,21,23,24,32,50,60,66,67,95],avail:[9,14,23,29,30,40,95,119],averag:[13,81],average_test_period:[102,103],avg:[108,110],avg_cost:[21,109],avoid:[5,7,9,20,21,38,43,51,52,53,54,108],avx:1,awai:30,await:96,awar:[4,18,20,25,31,48,58,95],awk:98,awni:67,aws:27,aws_account_id:95,awsaccountid:95,awskeymanagementservicepowerus:95,axi:81,axiom:20,b363:96,b8561f5c79193550d64fa47418a9e67ebdd71546186e840f88de5026b8097465:96,ba5f:95,back:[6,9,21,29,33,53],background:[52,62,67],backpropag:[5,6],backward:[5,7,12,14,24,33,41,42,50,52,53,54,57,61,62,69,74,75,109],backward_first:114,backward_op:5,backwardactiv:74,bacth:19,baidu:[30,67,96],bake:21,balanc:[23,51,95],bandwidth:[29,51],bare:[96,119],barrier:92,barrierstatset:108,basci:35,base:[4,13,19,24,25,29,30,37,43,47,50,51,53,55,60,61,62,68,70,95,109],baseerrorclipattr:24,baselin:67,basematrix:74,bash:[0,1,72,95,96,97,101],basi:[81,84],basic:[21,35,43,55,56,60,61,67,70],batch:[4,7,9,11,12,19,20,21,25,26,30,33,36,47,48,49,51,53,67,70,81,84,89,95,96,97],batch_id:[33,81,84],batch_im:33,batch_label:33,batch_norm:[33,67],batch_read:[11,19,59],batch_siz:[21,33,41,49,81,84],batch_szi:33,batch_z:33,batchnorm:[33,67],batchread:19,batchsiz:74,bazel:8,bbbbb:11,bcm2708:118,bdist_wheel:63,beacus:35,beam:114,beam_gen:114,beam_search:[49,113,114],beam_siz:[49,102,103,105,114],becaus:[4,5,7,8,9,14,29,39,49,54,58,59,62,64,65,69,70,71,95],becom:[22,23,64,68],been:[6,8,13,20,30],befor:[6,9,16,20,24,28,31,34,47,53,54,55,59,62,75,78,81,95,119],begin:[6,12,14,19,25,28,34,49,51],beginiter:4,beginn:114,beginpass:4,begintrain:4,behavior:20,behind:[30,70],being:[6,16,24,30,57,59],belong:[19,21,64],below:[7,9,14,21,23,29,30,44,54,59,62,70,71,95],benchmark:[44,67],benefit:[16,17,49],besid:[21,40,47,51],best:[8,43],besteffort:96,beta:33,better:[8,30,39,40,43,49,95,119],between:[6,8,9,14,21,23,29,30,43,46,51,54,61,64,95],bia:[49,58,74],bias_attr:[58,81,83,114],biases_:74,biasparameter_:74,biassiz:74,bidi:96,bidirect:67,bidirectional_lstm:82,big:[18,23,40,119],bigger:9,bilinearfwdbwd:108,bin:[0,1,90,91,95,96,97,101],binari:[8,17,21,29,31,33,44,90,95],bind:[18,20,29,32,64,68],bit:29,bitcod:117,black:33,blank:95,blob:0,block0:[40,52],block1:[40,52],block2:[40,52],block3:52,block:[6,10,12,14,18,20,21,22,23,24,25,26,30,37,40,47,48,50,68,71,76],block_expand:67,block_id:[18,26],blockdesc:[7,34,52,58,60],blockdescbind:37,blockingcount:22,blueprint:49,book:[1,60,67,77,109,114],bool:[7,19,29,36,38,41,42,43,57,58,65,66,70,71,74,89,103,105],boost:[47,67,68],boot:[113,114,119],boot_lay:114,boot_stat:70,bootstrapp:119,borrow:[33,70],bos_id:114,both:[4,7,8,9,16,19,20,21,23,29,30,33,37,40,47,49,51,55,57,65,67,68,95],bottl:51,bottleneck:55,bottom:67,bound:40,boundari:21,box:33,brace:[7,34],brain:16,branch:[4,7,8,21,30,36,56,63,69,72],break_if:70,brief:[8,14,29,68,76],bring:[30,40],broadcast:[9,51,60,119],broken:72,browser:95,bsd:[20,51],bsp:20,bucket_nam:95,buf:12,buf_siz:21,buffer:[12,20,43,44,53,59,64,109],buffer_s:20,buffered_read:59,bug:[72,95],build:[0,8,17,21,34,35,40,42,53,62,63,66,67,72,77,78,87,95,97,99,100,101,107,116,117,118],build_doc:77,build_model:33,buildtool:63,built:[8,18,21,29,31,40,47,51,66,67,70,107,108,119],bulk:20,bunch:44,button:95,c11:45,c703c041:72,c99:46,c99e:95,cach:[29,81],cache_pass_in_mem:81,cachetyp:81,cacul:25,caff:[7,30],caffe2:[7,18,20,30],caffe_poli:83,calcul:[5,6,9,14,22,25,29,40],calcut:40,calendar:55,call:[4,5,6,7,12,13,14,15,17,18,20,21,24,31,33,34,40,48,49,50,52,55,58,60,61,64,66,68,70,95,97,107,108],callback:[24,74],caller:[5,95],can:[4,5,6,7,8,9,12,13,16,17,18,19,20,21,23,24,26,29,30,31,32,33,34,35,37,38,39,40,43,47,48,49,50,51,52,53,55,56,57,58,59,60,61,62,66,68,70,71,95,108,119],cancel:16,candid:[49,67],cannot:[19,39,60,64,70,78],cantain:35,capabl:[29,54,60],capac:[62,95],capi:[45,87],capi_priv:87,capi_prvi:46,caption:49,card:51,care:[17,40,59,67,68,119],carefulli:67,carpedm20:33,cast:[29,39],cast_to_op_attr:66,cat:[1,97,98],categori:9,categoryfil:96,caus:[9,28],caution:95,cbla:[41,87],cc_:8,cc_binari:8,cc_test:8,cclient:15,cduadevicecontext:[47,68],cento:[3,119],central:62,ceph:11,cephf:[11,17,27],cer:67,certain:[19,38,47,50,55,64,68],certif:[4,27,78,95],cffi:45,cfg:[40,96],cgo:45,ch1:20,ch2:20,chain:[6,34],challeng:[5,9,30,36,68],chan:20,chanc:[4,29],chang:[8,13,17,21,30,43,54,56,59,61,63,64,67,72,95],changes:43,channel:[18,71,108],chapter:[48,49,67],chapter_data:48,chapter_out:48,charact:[20,67],check:[6,7,8,24,43,57,60,69,72,74,78,83,89,95,103],check_attr:66,check_eq:74,check_grad:[5,75],check_l:74,check_output:75,check_sparse_distribution_batch:[102,103],check_sparse_distribution_in_pserv:[102,103],check_sparse_distribution_ratio:[102,103],check_sparse_distribution_unbalance_degre:[102,103],checker:[5,60],checkgrad:103,checkgrad_ep:103,checkmark:119,checkout:72,checkpoint:[23,57],checksum:27,child:7,chip:30,chmod:95,choic:[8,30],choos:[20,38],chosen:[33,47],chunk:[13,27],circl:34,circular:20,circumst:68,claim:95,claimnam:[95,97],clang:[29,45,72,116],clariti:49,classic:[40,67],classif:34,classifi:33,classification_cost:81,claster:95,clean:[7,8,26,54,60,72,78],clear:[8,39,49,54,64],clearer:[54,58],clearli:64,cli:95,click:95,client:[12,15,60],clip:103,clip_op:24,clip_op_desc:24,clipe:52,clone:[0,77,87,107,116,118],close:[20,59,72],close_channel:20,cloud:[8,9,17,27,28,60,119],cludform:95,cluster:[4,7,9,14,21,67,91,93,97],cluster_test_fil:91,cluster_train:[81,93],cluster_train_fil:91,cluster_train_v2:[93,94,98],cm469:95,cmake:[0,46,72,75,77,78,87,108,116,117,118],cmake_build_typ:[116,117,118],cmake_c:[116,117],cmake_system_nam:[116,117,118],cmakefil:78,cmakelist:[8,41,42,74],cmatrix:[45,46],cname:95,cnn:96,coars:32,code:[4,6,8,16,19,20,21,23,26,29,32,33,34,38,44,47,50,52,53,54,55,57,59,60,61,62,66,70,74,95,96],codebas:60,coded_stream:83,codedinputstream:83,colindic:89,collabor:9,collect:55,collectbia:74,colum:89,column:[34,59,89],com:[0,1,8,33,63,72,77,78,87,95,96,107,109,116,118,119],combin:[40,50,60,64],come:[21,25,40,56,67,70,71],comma:14,command:[0,8,12,17,28,74,95,96,97,99,100,105],commandlin:[97,108],comment:[8,35,66,67,97],commit:[8,96],common:[11,62,68],commonli:[28,62],commun:[9,14,15,20,21,23,51,95],compani:30,compar:[5,8,18,60],comparison:[8,30],compat:[29,32,51],compil:[0,8,21,30,35,37,40,47,51,61,65,66,71,101,116,117,118],complaint:8,complet:[6,7,9,13,14,24,27,34,44,47,60,95,96,97,119],complex:[16,20,40,49,60],complianc:[81,84],complic:[21,32,39,59,70],compon:[20,21,35,67,70,71],compos:[4,20,32,35,48,58,60],composit:32,compress:[13,89],compris:6,comput:[4,5,9,20,21,23,26,29,30,31,35,39,40,44,47,50,51,52,53,55,61,64,67,68,71,75,76,95,109],computationgraph:35,con:51,concat:[33,114],concaten:[33,48,70,81],concentr:60,concept:[4,18,19,20,30,32,33,35,43,48,49,53,54,56,64,70,71],conceptu:[20,26,30,33,35],concern:[4,20,25],concis:[33,70],concret:[60,68],concurr:[9,16,23,55],cond:[7,30,36,56],condit:[13,21,30,36,43,67,69,81,84,96],condtion:33,conf:[83,93,97,111],conf_paddle_gradient_num:[95,97],conf_paddle_n:[95,97],conf_paddle_port:[95,97],conf_paddle_ports_num:[95,97],conf_paddle_ports_num_spars:[95,97],config:[11,28,49,74,84,87,95,96,97,102,103,105,119],config_:[12,103],config_arg:[102,103,105],config_lay:74,config_len:14,config_pars:[41,42,74],config_proto:14,configmap:21,configprotostr:83,configur:[6,12,14,16,17,21,23,30,35,38,39,58,67,68,74,86,119],confirm:28,conflict:[64,72],confus:[33,38],connect:[17,18,21,23,67,79,95,96,119],consid:[6,57,68,119],consider:[47,67],consist:[13,19,20,31,44,56,59,60,61,66,67,71],consol:95,consolid:7,constant:[35,37,38,47,83],constraint:64,construct:[4,26,35,40,48,58,60,64,66],constructbackwardgraph:34,constructoptimizationgraph:34,constructor:[24,29,55,58,60,64,66],consum:9,consumpt:40,contact:16,contain:[0,4,6,7,13,26,33,35,43,44,47,54,55,58,60,61,64,65,66,67,70,71,95,96,97],containerport:95,content:[14,28,44,49,77],content_dir:77,content_len:14,context:[24,43,64,65,68,75,76,81,109,114],contin:95,continu:[6,9,44,67],contrib:62,contribut:[62,67],contributor:60,control:[7,18,20,69,95,96,119],controlflowgraph:40,conv2d:33,conv:[33,39,43],conv_fwd:43,conv_pool_2:21,conveni:[4,6,35,50,66,67],convent:[6,14],convers:[29,30],convert:[11,21,22,23,29,30,31,43,59,61,67],convlut:67,convolut:[33,47,58,68],convolution_algorithm_opt:43,cool:72,cooper:67,coordin:[9,14],copi:[4,13,16,28,34,48,49,51,53,70,81,84,95],copy_from:24,copyright:[81,84],copyvariablewithtensor:39,core:[6,35,38,46,53,54,70,109],coreo:[95,119],corner:60,corpu:67,correct:[5,6,29,51,95],correctli:[6,29,33],corresond:29,correspoind:4,correspond:[4,6,7,8,24,29,35,36,43,47,48,49,58,60,61,62,66,68,69,83],corss_entropi:4,cortex:29,cos:66,cosin:66,cosineop:66,cosineopproto:66,cosineopprotomak:66,cost:[4,6,21,34,39,50,51,56,57,81,84,109],cost_np:57,could:[4,5,13,18,20,21,22,23,29,30,31,48,50,53,54,56,58,59,61,69,95],count:[9,17,25,57,59,67,91,96,103,105,108],counter:[9,13,22,34],cours:[17,47],cover:[30,67],cp27:3,cp27m:[3,63],cp27mu:[3,63],cpp:[5,12,32,41,42,45,46,54,60,71,74,83,97,108,111],cprofil:107,cprofilev:107,cpu:[0,5,17,29,38,39,47,52,53,54,55,60,62,63,68,75,76,96,105,108,109],cpu_avx_mkl:3,cpu_avx_openbla:3,cpu_kernel:38,cpu_noavx_openbla:3,cpu_ns_:55,cpu_per_pserv:21,cpu_per_train:21,cpudevicecontext:[47,68,75],cpuelapsedu:55,cpuengin:42,cpuinfo:1,cpuplac:[21,38,39,43,47,52,68,75,76,109],cpusparsematrix:46,crash:[9,108],creat:[4,5,7,9,14,18,19,22,24,25,26,27,28,29,30,32,33,34,43,47,48,50,51,53,54,58,61,62,67,72,74,77,81,83,84,89,96,97,98,119],create_backward_pass:50,create_bias_paramet:74,create_block:58,create_doc_str:66,create_input_paramet:74,create_local_scop:26,create_oper:32,create_optimization_pass:50,create_paramet:58,create_python_ops_creatation_funct:66,create_rnn:7,create_rnn_op:48,create_tmp_var:58,create_tmp_vari:58,create_var:58,create_whileloop:70,creategradientoper:61,creatememori:43,createop:66,createoper:7,createprimitivedesc:43,createstack:95,createvari:7,creation:[32,95],creationd:95,creator:[11,60,61],creator_:61,credenti:28,crf:[39,68],critic:33,crlf:72,crop:68,crop_grad:68,cropgradkernel:68,cropkernel:68,cross:[58,83,116,117,118],cross_entropi:[4,21,33,39,40,52],crt:27,csc:74,csr:[74,89],csv:83,ctc_error_evalu:67,ctest:[0,72,75],ctor:58,ctrl:[0,93],ctx:[39,43,75,76],cubla:47,cublas_handle_:68,cublashandle_t:68,cuda7:[3,86],cuda8:[0,1,3],cuda:[8,31,47,55,60,68,75,103,108],cuda_context:31,cuda_dir:[102,103],cuda_fp16:29,cuda_so:[1,78],cuda_visible_devic:81,cudaconfigurecal:108,cudadevicecontext:[31,47,68,75],cudadevicegetattribut:108,cudaelapsedu:55,cudaevent_t:55,cudaeventcr:108,cudaeventcreatewithflag:108,cudafre:108,cudagetdevic:108,cudagetdevicecount:108,cudagetdeviceproperti:108,cudagetlasterror:108,cudahostalloc:108,cudalaunch:108,cudamalloc:108,cudamemcpi:108,cudaplac:[39,47,68],cudaprofilerstart:108,cudaprofilerstop:108,cudaprofilestop:108,cudaruntimegetvers:108,cudasetdevic:108,cudasetupargu:108,cudastream_t:68,cudastreamcr:108,cudastreamcreatewithflag:108,cudastreamsynchron:108,cudeviceget:108,cudevicegetattribut:108,cudevicegetcount:108,cudevicegetnam:108,cudevicetotalmem:108,cudnn:[8,38,39,43,47,68],cudnn_conv_workspace_limit_in_mb:[102,103],cudnn_dir:[102,103],cudnn_kernel:38,cudnnv5:0,cudrivergetvers:108,cuinit:108,cumtim:107,cur_mem:49,curl:95,curli:[7,34],current:[6,7,8,9,12,14,18,23,25,30,38,39,47,48,49,53,54,55,58,64,70,77,95],current_block:[56,58],current_oper:56,current_word:[81,114],curv:4,custom:[4,17,29,33,49,53,60,67,95],custom_batch_read:59,cut:70,cxx:[116,117],cxx_compil:[116,117,118],cxx_flag:[116,117],cxxabi_1:3,cycl:9,cython:45,d3e0:95,d_b0:33,d_b1:33,d_b2:33,d_block:33,d_f:33,d_g:33,d_h0:33,d_h0_bn:33,d_h0_relu:33,d_h1:33,d_h1_bn:33,d_h1_relu:33,d_h2:33,d_loss:33,d_loss_fak:33,d_loss_real:33,d_optim:33,d_step:33,d_t:33,d_w0:33,d_w1:33,d_w2:33,dandroid_abi:116,dandroid_arm_mod:116,dandroid_arm_neon:116,dandroid_standalone_toolchain:116,dario:67,darwin:95,dash:33,dat:11,data:[4,5,7,11,12,13,20,23,25,27,29,30,33,34,35,37,38,40,43,44,47,48,49,50,51,52,53,54,56,58,60,62,64,65,66,67,68,70,71,76,81,84,86,89,92,96,97,99,102,109,114],data_batch:81,data_grad:52,data_i:33,data_lay:[12,81],data_layout_:39,data_read:59,data_reader_creator_random_imag:59,data_shar:70,data_typ:[39,44,65,67,71,82,84,86,89,114],data_type_:[38,39,47],data_x:33,datacent:[11,28],datacenter1:11,datacenter2:11,datacenter_1:11,datacenter_2:11,datacenter_nam:11,datafeed:109,dataflow:35,dataflow_analysi:40,datalayout:39,dataparallel:21,dataprovid:[81,83,97],dataprovider_convert:67,dataset:[11,17,21,53,59,67,84,86,91,107,114],datatransform:39,datatyp:[38,39,43,65,67],dcgan:33,dcmake_build_typ:[77,87],dcmake_install_prefix:[87,116,117,118],dcmake_system_nam:[116,117,118],dcuda_arch_nam:0,dcudnn_root:0,ddim:[19,47,68,76],dead:9,deal:[6,119],deb:72,debug:[5,6,21,28,30,58,77,107],debug_str:35,decai:52,decayr:12,decent:13,decid:[4,16,33,44,53,61,62,65],declar:[7,33,48,71],decod:[67,113,114],decoder_boot:114,decoder_dim:49,decoder_group_nam:114,decoder_input:[49,81,114],decoder_mem:[49,114],decoder_s:[81,114],decoder_st:114,deconv:33,decor:19,decrement:22,decrementcount:22,decrypt:95,deduc:60,deep:[6,16,20,26,33,34,40,42,55,60,62,67,68,108],deeper:31,deepspeech2:41,def:[4,5,6,11,17,24,25,32,33,35,38,40,48,49,50,58,59,66,70,74,75,81,83,84,97,114],def_block:33,default_block:33,default_decor:97,default_devic:105,default_main_program:109,default_param_attr:58,default_st:70,default_start_up_program:52,default_startup_program:109,default_valu:105,defaultdict:40,defaultinfervartyp:37,defect:54,defer:16,defin:[4,6,7,8,9,16,18,19,22,23,24,29,30,31,32,33,35,38,40,47,48,51,56,58,59,60,64,66,68,70,71,75,81,84,109],define_py_data_sources2:83,definit:[6,7,9,13,21,26,31,38,52,56,61,66,70,109],definiton:32,delai:[53,68],delet:[17,27,72],deletestack:95,delimit:83,deliv:119,delta:5,demand:[9,68],demo:[60,96,99],dens:[14,15,65,67,95],dense_arrai:82,dense_vector:[84,86,89],dense_vector_sequ:89,dense_vector_sub_sequ:89,densescann:67,dep:8,depart:67,depend:[7,8,9,17,19,20,21,23,35,51,57,65,119],dependent_var:57,deploi:119,deploy:[35,44,60,95,119],deprec:67,depth:[7,30,67],dequeu:23,deriv:[4,19,21,24,36,50],desc:[7,24,43,44,58,66,70],desc_:7,descend:70,descent:[9,53],descproto:44,describ:[4,6,7,8,13,18,21,26,31,38,39,43,44,48,49,54,56,58,60,65,66,71,95,96],describestack:95,describestackev:95,describestackresourc:95,descripotor:43,descript:[7,8,37,39,42,44,47,61,65,67,71,95,97],descriptor:[20,39,43],deseri:[44,54],deserializ:60,desgin:34,design:[6,12,19,38,40,45,53,55,62,119],desir:[9,21,53,95,96],destin:[14,28],destroi:[7,26],destruct:64,destructor:55,detail:[5,6,13,17,21,23,28,30,33,35,39,40,43,44,47,48,55,58,62,64,68,70,71,95,119],detect:[37,72],determin:[7,21,40,47,60],dev:[0,1,60,72,78,107,116,119],dev_ctx:[7,43,55],devel:63,develop:[0,6,8,30,37,54,55,58,61,63,67,72,109,117],devic:[1,5,18,21,25,29,35,39,42,43,47,51,54,55,60,76,78,105,109],device_:55,device_context:[43,75],devicecontext:[7,47,55,75],deviceid:[42,105],deviceid_:42,deviceplac:68,devot:67,dhcp:119,diagram:48,diamond:33,dict:[6,58,83,97],dict_dim:81,dict_siz:[12,49],dictionari:[4,5,58,81],did:54,diff:[72,81],diff_mat:5,differ:[5,6,7,8,9,14,16,19,21,22,23,24,25,26,29,30,33,35,36,39,40,43,47,49,51,53,55,57,61,64,67,69,70,71,95],differenti:32,difficult:[5,30],dig:95,digraph:35,dilat:43,dim0:75,dim1:75,dim:[12,43,44,48,60,65,68,71,74,75,76],dim_:[68,76],dimens:[33,60,65,67,68,70,76,81],dimension:89,dios_arch:117,dios_enable_bitcod:117,dios_platform:117,dios_use_veclib_for_bla:117,dir:[78,97,116],direcit:67,direct:[30,40,53,67],directli:[8,15,17,19,21,29,38,39,54,66,70],directori:[8,11,16,27,28,68,72,77,96,108],disabl:[55,83],disadvantag:[53,58],discard:[9,13,49,72,103],discexp:83,discov:9,discoveri:95,discrim:33,discuss:[4,7,13,14,15,21,43,67],disk:44,dispatch:[21,54],displai:[17,28],dist:[0,63,78],dist_train:[4,17],distinguish:8,distribut:[7,13,14,15,16,18,20,25,31,51,60,67,71,81,84,99,100,103,119],distribute_test:[102,103],distributedli:21,disucss:4,divid:[6,25,66,71],diy_beam_search_prob_so:[102,103],dnn:[43,67,78],dns:95,do_forward_backward:59,doc:[35,48,70,75,77,91,93,97],doc_cn:77,docker:[0,1,63,72,77,78,95,96,97,99,100,101,116,119],docker_build:4,docker_clust:[93,98],docker_push:4,dockerfil:[0,72,97,116,118],document:[5,19,21,27,34,48,49,55,60,67],doe:[9,13,14,16,17,18,19,21,23,26,29,35,40,48,54,58,60,61,62,109],doesn:[4,7,18,20,59],doing:[12,16,21,34,52],domain:95,don:[4,8,32,34,40,59,67,95],done:[6,8,9,13,14,21,22,37,40,44,53,61,62,67,72,95,97,108],dot:75,dot_period:[97,103,105],doubl:[21,29,34,39,55,75,103],down:[67,108],download:[8,9,12,16,27,78,96,119],doxygen:72,dozen:8,dpython_execut:78,dpython_include_dir:78,dpython_librari:78,draw:49,drive:64,drop:[19,49],drop_fc:82,drop_rat:82,dropout:82,dropout_r:82,drpi_arm_neon:118,drpi_toolchain:118,drwxr:96,ds2:67,dst:[14,43],dst_primitive_desc:43,dtoh:108,dtype:[20,21,35,58,83,109],due:[13,16,33,40,49,58,107],dummi:13,dump:[44,90],dump_config:90,dump_v2_config:90,duplic:[23,52],durat:13,dure:[6,7,9,13,16,17,25,30,40,51,53,55,58,60,67,71,95,119],duse_eigen_for_bla:116,dwith_c_api:[46,87,116,117,118],dwith_doc:77,dwith_golang:87,dwith_gpu:[0,77,87,118],dwith_mkl:[77,87],dwith_profil:108,dwith_python:[46,87,118],dwith_swig_pi:[46,87,116,117,118],dwith_test:[0,75,117],dwith_tim:108,dynam:[14,46,48,58,59],dynamic_cast:74,dynamic_recurrent_op:70,e2e:119,each:[5,6,8,9,12,13,14,16,17,18,19,20,21,24,25,26,31,34,37,39,40,43,47,48,49,51,52,54,55,57,58,59,60,61,64,65,66,67,68,69,70,71,95,119],eager:30,earli:[29,31],eas:37,easi:[5,6,49,53,59,60,62],easier:[4,23,29,30,59,69,70],easili:[4,33,51,55,59,61,64,68],echo:[1,78],edg:40,edit:[20,95],editor:58,edu:[95,96],eeoi3ezpr86c:95,effect:95,effici:[21,44,59,67,68],effort:[21,67],efs:95,efs_dns_nam:95,efsvol:95,egd:40,eigen:[29,47,53,60,62,68,75],eigen_device_:68,eigen_use_gpu:75,eigenmatrix:76,eigentensor:76,eigenvector:76,either:[4,21,33,36,37,48,53,62,81,84],elabor:67,elb:95,elbapis:95,electr:40,electron:96,element:[5,13,20,23,35,49,60,75],element_typ:14,elementari:60,elif:[4,66,69],els:[1,4,12,17,20,21,23,24,30,33,36,37,38,40,64,66,69,74],elsewher:55,emac:0,emailweixu:8,emb1:12,emb2:[12,111],emb:[81,83,96],emb_para:83,emb_param_fil:83,emb_sum:81,embed:[4,7,12,23,37,49,65,70,83,114],embedding_lay:[12,81,111],embedding_nam:114,embedding_s:114,emplace_back:74,emploi:[6,24,66],empti:[6,9,19,20,49],emul:29,enabl:[7,8,13,18,23,24,35,55,95,108],enable_grad_shar:[102,103],enable_parallel_vector:103,enc_proj:114,enc_vec:114,encapsul:14,encod:[13,49],encoded_proj:114,encoded_sequ:114,encoded_vector:114,encoder_ctx:49,encoder_ctx_expand:49,encoder_ctx_proj:49,encoder_dim:49,encoder_out_seq:49,encoder_s:114,encount:12,encourag:[21,26],encrypt:95,encrypt_decrypt:95,end2end:119,end:[6,7,21,24,31,35,40,49,54,55,59,64,67,69,72,114],end_pass:4,endforwardbackward:81,endian:44,endif:[47,55],enditer:[4,81,84],endpass:[4,84],endpoint:[11,95],endtrain:4,enforc:71,engin:[17,42,43,67],english:67,enough:[6,7,38,40,47],enqueu:23,ensur:[9,43,51,64],enter:[7,26],enterpris:60,entir:[14,16],entiti:[7,64],entranc:26,entri:[13,17,37,95],entropi:58,entry_point:17,enumer:[47,83],env:[77,81,95,97,107],environ:[4,19,21,78,95,96,108],environmenterror:91,eol:72,eos_id:114,epoch:33,epol:20,equal:[9,70,75],equat:[40,75],equival:[4,7,18,20,24,30,36,66,119],eras:19,erlang:20,error:[4,5,13,28,29,30,43,64,67,78,83,87,95,103],error_clip:24,error_clip_callback:24,error_clipping_threshold:81,errorclipbyvalu:24,especi:42,essenc:[4,6],essenti:[4,26,29],establish:18,estim:[4,23,53],eta:96,etc:[7,20,21,25,43,51,53,59,64,67,95,119],etcd:[9,13,14,16],etcd_addr:14,eth0:[95,97],etyp:20,eval:[7,25,33,60],eval_program:25,eval_result:25,evalu:[16,35,57,67,90,108,109],even:[4,29,51,58,59],evenli:[14,95],event:[81,84,96],event_:55,event_block:55,event_handl:[4,81,84],eventkind:55,eventlist:55,eventu:[21,70],everi:[4,9,13,14,16,24,25,34,35,37,39,40,43,47,48,51,58,64,66,84,109],everyth:[21,23,33],evid:54,evolv:30,exactli:[19,81,95],exampl:[7,17,19,21,23,25,28,30,31,32,33,34,35,37,39,40,43,47,48,49,54,55,56,58,59,60,61,62,65,68,69,70,95,109],exc_path:78,except:[16,18,30,34,55,67,70,84],excess:40,exchang:54,exe:[21,109],execut:[8,9,13,17,18,19,20,21,25,26,31,33,35,40,43,51,52,55,61,71,95],executioncontext:[39,43,75,76],executor:[18,21,25,29,30,31,33,39,50,52,56,58,71,107,109],exist:[4,7,9,19,28,30,49,58,59,61,66,68,70,76,95],exit:[14,28,96],exp:83,expand:49,expand_a:110,expand_level:110,expandlevel:110,expect:39,expected_desc:43,expected_kernel_kei:39,experi:[44,67],expert:8,expir:9,explain:[9,18,30,32,34],explan:[17,18,21,39,64],explicit:[19,55,70,74],explicitli:[4,21,26],explod:24,explor:[49,62],expos:[6,15,20,43,44,68,70,95],express:[4,23,25,35,40,81,84,95],extend:[53,70],extens:[16,23,49],extent:46,extern:[8,42,45,46,60,67],extern_mklml:78,external_librari:8,extingrad_:42,extinval_:42,extoutgrad_:42,extoutval_:42,extra:[21,62,68,119],extraattr:105,extract:[30,54,67,95],extralayerattribut:[81,82],extrem:[18,30],f1205:83,f120da72:96,f7e3:95,fa0wx:96,fabric:[93,94],face:[8,62],fact:[18,30,51,56,58],factori:45,fail:[9,13,49,78,83,96,103],failur:[9,14],fake:33,fake_imag:59,faked_imag:33,fall:[29,57],falloc:27,fals:[5,6,7,30,36,38,41,48,56,57,59,65,71,74,75,81,84,86,89,91,96,105,114],false_block:[7,36,56],false_label:59,false_neg:25,false_posit:25,false_read:59,familiar:20,faq:115,far:[24,70],fashion:21,fast:[13,30,108],faster:[9,30,52],fastest:30,father:6,fault:[13,60],fbd1f2bb71f4:96,fc1:[35,74,105],fc1_bia:35,fc1_weight:35,fc2:[35,105],fc3:[35,105],fc4:105,fc8a365:95,fc8a:95,fc_grad:52,fc_layer:[58,66,81,83,105],fc_op:66,fc_out:7,fc_output:66,fc_without_b:7,fclayer:74,fcop:32,feasibl:53,featur:[6,21,29,35,51,52,55,67,72],feed:[4,21,34,48,62,84,109],feed_dict:33,feed_list:109,feed_minibatch:71,feeder:[21,109],fetch:[9,12,19,21,57,72,78,109],fetch_list:[21,58,71,109],fetch_op:57,few:[8,9,20,21,40,53,59,65,67],fewer:[20,58],fft:67,field:[7,35,37,44,57,58,61,65,66,81,95],fifth:34,figur:[4,8,21,23,33,42,48,55,58,67],file:[4,6,8,9,11,13,14,16,17,19,20,27,28,30,31,35,44,46,59,60,67,68,71,72,75,81,84,109,119],file_nam:83,filelist:67,filenam:[11,58,81,107],fileoffset:27,filesystem:[16,17,21,27,95],fill:[9,13,47,58,89,95],fill_zero_grad:60,fill_zeros_like_op:6,filter:[24,43],find:[7,9,16,20,29,35,39,43,49,64],find_var:5,findmemori:43,findop:7,findprimit:43,findprimitivedesc:43,findvar:[7,64],fine:[13,32],fingerprint:95,finish:[9,13,16,17,26,40,51,66,95,96],first:[4,6,7,9,13,16,17,18,21,26,28,30,33,34,35,43,48,49,52,56,57,58,60,65,66,67,68,69,70,75,76,89,95,119],first_seq:114,firstseen:96,fit:[29,38,40,44,49,60],five:56,fix:[21,40,45,58,67,72,89],flag:[41,42,55],flatten0:35,flatten:[35,56,58,76],flatten_result:81,flexibl:[4,14,21,30,34,38,48,49,53,59,68,70],flist:91,fliud:18,float16:20,float16_t:29,float32:[21,29,32,33,58,59,75,83,109],float_to_half_rn:29,floor:83,flow:[7,18,20,48,55,63,69],fluid:[6,19,21,23,26,39,47,52,55,58,68,69,107],fluid_cuda_create_tensor:31,fluid_cuda_mult:31,fluid_cuda_read:31,fly:6,fmt:[20,83],fname:83,fnt03:95,focu:[20,35],folder:[8,11,17,28,95],follow:[4,5,6,7,8,9,13,17,18,19,20,21,23,26,29,30,31,32,33,34,35,36,37,39,40,43,47,48,49,51,52,53,55,56,57,58,59,60,61,62,64,65,66,67,68,69,70,71,95,99,100,109,119],footprint:31,forbid:4,forc:[39,51,58],force_cpu:38,force_cudnn:38,force_load:45,forest:7,forev:20,forget:4,form:[19,20,25],formal:39,format:[5,13,19,21,29,30,47,49,67,70,72,74,86,89,95],former:[4,8,30,40,53],formula:[5,40],forth:33,forward:[5,6,7,12,14,24,30,33,41,42,43,44,50,54,56,59,60,61,62,65,74],forward_infer:43,forward_list:55,forward_op:5,forward_train:43,forwardactiv:74,found:[29,56,62,64],four:[20,25,30,34,43,47],foward:57,fp16:[29,60,71],fp32:[39,47,60,71],fp64:[47,71],fparam:83,fpga:[47,109],fpgadevicecontext:68,fpgaengin:42,fpgaplac:[47,68],frame:[26,60,67,70],framework:[4,6,7,20,24,25,29,30,35,47,51,52,53,55,56,60,62,64,66,68,75,107,109],free:[31,68,119],freememoryop:31,frequenc:67,frequent:[13,59,60,62,68],fresh:16,friend:64,friendli:33,from:[5,6,7,8,9,11,12,13,14,18,19,20,21,23,24,25,28,29,30,32,33,34,35,36,38,39,40,43,48,49,50,51,52,54,56,58,59,60,61,64,67,68,69,70,71,75,76,78,90,95,96,107,108,113,119],from_no_sequ:110,from_sequ:110,from_tar:84,fromfil:[59,83],fromstr:83,front:[35,40],fulfil:108,full:[9,16,20,48,51,53,119],full_matrix_project:114,fulli:[21,23,67,119],fullsiz:12,fullyconnect:[35,58],fullyconnectedlay:74,func:[13,18,31,61],functor:[32,35],fundament:[19,20,23,29,60],further:[19,66,119],futur:[16,21,29,40,48,60],fvs:66,fwd_desc:43,fwd_op:61,fwd_primit:43,fwd_primitive_desc:43,fwd_var:24,g_b0:33,g_b1:33,g_b2:33,g_block:33,g_command_config_arg:[41,42],g_h0:33,g_h0_bn:33,g_h0_relu:33,g_h1:33,g_h1_bn:33,g_h1_relu:33,g_h2:33,g_im:33,g_loss:33,g_optim:33,g_program:58,g_state:55,g_step:33,g_w0:33,g_w1:33,g_w2:33,gan:4,gangliao:8,gatedrecurrentlay:41,gather:[6,40,51,54,75],gaussian_normal_random:33,gcc:[0,29,31,45,60,116,118],gcc_3:3,gcreators_:66,gemm:41,gen_proto_pi:77,gen_rand_param:83,gender:97,gendrated_id:49,gener:[4,5,6,7,8,9,11,13,14,16,18,21,30,32,37,40,43,47,51,53,56,57,58,59,60,61,65,66,67,68,69,70,84,89,95,97,105,108],generated_id:49,generated_scor:49,generatedinput:[113,114],genr:97,get:[1,5,6,7,8,9,13,14,16,17,19,27,30,33,35,38,39,40,41,42,43,47,48,49,55,58,60,61,64,66,70,74,81,89,93,95,96,98,107],get_all_op_proto:66,get_block:58,get_config_arg:105,get_data:96,get_data_from_prefetch_queu:52,get_dim:5,get_float_el:5,get_grad:81,get_grad_op_desc:6,get_input_lay:74,get_numeric_gradi:5,get_numerical_gradi:5,get_output:5,get_plac:52,get_program:40,get_sample_from_lin:81,get_support:[3,78],get_symbol:35,get_tensor:5,get_vari:7,get_worker_addr:18,getactualkerneltyp:38,getattr:24,getbatchs:74,geteigendevic:76,getengin:43,getenv:[4,17,91,97],getexpectedkerneltyp:[38,39,43],gethostbynam:97,gethostnam:97,getidmap:97,getinfervartyp:37,getinput:74,getinputgrad:74,getinputvalu:74,getkerneltyp:29,getkerneltypeforvar:39,getlayeroutput:81,getlibrari:43,getmat:12,getoptconfig:12,getoutputgrad:74,getoutputvalu:74,getparam:12,getparameterconfig:12,getparameterptr:74,getparameterspars:12,getparametersremot:12,getplac:[43,68,75,76],getpodlist:97,getsiz:74,gettask:13,gettempl:95,gettensor:39,gettranspos:74,getw:74,getweight:74,getwgrad:74,gflag:87,gflags_complet:87,gflags_declar:87,git:[0,63,72,77,78,87,116,118],github:[0,8,33,47,63,72,77,78,87,107,109,116,118],give:[9,39,48,58,60,95],given:[6,14,16,20,23,24,30,32,33,49,59,62,70],glibc:[116,118],glibc_2:3,glibcxx_3:3,glide:8,global:[0,4,7,8,9,31,35,38,39,54,55,60,64,66,68,95,108],global_block:58,globalstat:108,globalstatinfo:108,glog:87,gnueabihf:118,go_librari:8,go_test:8,goal:[19,20,23,29,34,51,60,67],gob:13,godep:8,godoc:45,goe:[9,30,36,64,109],going:[6,32,53,119],golang:8,good:[20,33,53,58,59,62,119],googl:[4,55,60,83,87,107,116],googleapi:95,googlenet:42,goroutin:[18,20],got:[38,64],govern:[81,84],gpg2:95,gpg:95,gprof:107,gprotos_:66,gpu:[1,3,5,17,20,25,29,39,40,47,51,52,53,54,55,60,62,63,68,78,86,89,91,105,108,109,119],gpu_id:[81,103,105],gpu_per_train:21,gpu_plac:52,gpudevic:68,gpugpu_id:102,gpukernel:60,grab:9,grad:[5,6,14,24,42,52,58,65,81,103],grad_info_map:6,grad_n:24,grad_nam:24,grad_op:24,grad_op_class:60,grad_op_desc:24,grad_op_maker_:61,grad_op_typ:[60,61],grad_op_type_:61,grad_s_block:6,grad_share_block_num:[102,103],grad_to_var:[6,24],grad_var_nam:5,gradient:[9,13,20,22,24,34,37,50,51,52,53,54,58,60,65,90,91,92,97,103],gradient_clipping_threshold:81,gradient_flat:5,gradient_machin:[46,87],gradientmachin:[46,54,97],gradientmachine_:12,gradopdescmak:[37,61],gradopdescmakerbas:61,gradopmak:61,grain:32,gram:67,grant:95,graph:[6,7,8,9,18,20,21,22,23,25,30,33,48,51,53,56],great:[23,67,119],greater:[24,53,89],greaterthan:66,greedi:67,green:[18,33],grep:[1,98],groudtruth:114,group:[13,35,43,68,119],group_input1:114,group_input2:114,group_input:114,grpc:119,gru:[49,67],gru_decod:114,gru_decoder_with_attent:114,gru_out:49,gru_step:[49,114],grumemori:[82,114],gserver:[41,42,74],gsizex:108,gtx:40,guarante:[43,58],guard:12,guest:3,guid:[27,40,60,95,96],gzip:[13,96],h0_bn:33,h1_grad:52,h2_grad:52,h_prev:7,hadoop:4,half:[29,95],half_to_float:29,hand:[40,60,67,68],handi:8,handl:[4,6,17,18,21,35,40,43,47,54,59,64,68,70,109],handler:7,hannun:67,happen:[13,66],hard:[21,30,49,67,70,95],hardwar:[30,31,68],has:[4,5,6,7,8,9,13,14,16,19,20,21,23,24,25,29,30,33,35,39,40,44,47,49,51,55,56,60,65,66,68,89,95,108,109,119],has_kei:[6,24],has_var_recurs:6,hasdependentvar:57,hash:[47,51],hasn:30,hasnext:19,have:[4,5,6,7,8,9,13,14,16,17,19,20,21,23,24,26,29,30,31,32,33,34,38,39,40,43,44,47,48,49,51,52,53,54,55,56,58,59,60,61,64,65,67,68,71,95,119],haven:30,hdf:11,head:[75,98],header:[14,44,46,60,68,83],headip:98,height:[7,45,59,74,75,83,89],height_:65,held:9,hello:4,help:[7,28,30,35,43,49,59,60,70,72],helper:[21,43,61,70],henc:[21,53,58,61,62,64],here:[4,8,9,15,20,23,24,26,28,30,34,35,43,47,48,59,62,66,71,89,95,119],heterogen:[21,23,55],heurist:[23,49],hidden:[50,58,82,83,95],hidden_a:83,hidden_b:83,hidden_out:7,hierach:113,hierarch:[56,58,60],hierarchi:60,high:[29,51,67,68,119],higher:[32,48,70],highest:7,highli:[67,70],him:4,hint:38,histor:32,hl_get_sync_flag:74,hold:[4,6,9,13,15,19,20,29,33,35,37,39,40,64,66,68,76,95],holder_:[68,76],home:[1,11,28,95,96,97,98,107],honor:13,host:[8,17,55,95,96],host_c:[116,117,118],hostfil:98,hostnam:95,hostnetwork:97,hostpath:[96,97],hostport:95,hous:86,how:[4,7,9,13,18,19,20,21,26,28,30,32,35,38,39,43,48,49,54,55,62,66,95],howev:[5,6,16,20,21,26,30,39,40,47,53,54,58,59,61,62,65,66,67,68,95],howto:[91,93,97],hpp:[29,45],htod:108,http:[0,1,8,17,33,63,72,77,78,81,84,87,95,96,107,109,116,118,119],hub:63,huge:53,human:[55,67],hyper:33,hyperparamet:62,i1116:97,i1117:108,i386:117,iOS:117,iamfullaccess:95,iamusersshkei:95,icc:31,icml:67,id_rsa:98,idea:[8,20,30,31,53,59,62],ideal:[21,39],ident:[61,95],identifi:[36,47],idmap:97,ids:[49,81,89],ids_arrai:89,idx:[13,19,33,40,74],ies:28,if_els:69,if_else_op:6,ifdef:[47,55],ifels:[7,56],ifelseop:56,iii:67,iil:83,illustr:[9,14,20,21,32,48],im_siz:33,imag:[0,4,21,30,33,34,49,50,56,59,67,72,95,96,97,99,100,119],image_a:59,image_b:59,image_conv_lay:67,image_fil:59,image_lay:59,image_nam:4,image_path:59,image_reader_cr:59,imagenet:11,imagepullpolici:[95,97],images_reader_cr:59,imagin:34,imgsiz:108,imgsizei:108,imgsizex:108,immedi:[40,43,53,62,95],immutable_paramet:4,imper:18,imperfect:60,implement:[7,13,14,15,16,17,18,20,21,23,30,32,35,36,37,39,40,43,45,46,47,49,52,54,57,64,66,67,68,69,70],implemet:12,impli:[8,81,84],implicitli:18,imposs:[19,49,119],impractic:39,improv:[22,23,40,60,67,95],in_arg:89,inarg:12,inbound:95,inc_path:78,includ:[4,7,8,14,17,19,20,29,30,33,35,40,45,46,48,49,55,56,58,60,66,72,75,87,95,108,116,117,118],inclus:49,incom:[18,38],increas:[9,13,29,83],increment:[20,25,34,40],incupd:74,inde:20,independ:[5,6,14,22,52,64,68,119],index:[5,6,7,9,13,18,56,58,70,95],indic:[6,7,14,26,33,48,56,61,65,68,70,89,95],indice_map:70,indices_map:70,individu:[9,51,95],industri:[9,44,119],ineffici:[39,54],infer:[4,6,7,9,25,30,36,37,38,39,40,41,45,47,57,58,60,65,67,84,86],infer_shap:58,infer_var_type_:37,inferer:67,inferfac:37,inferior:16,infershap:[7,58,60,75,76],infershapecontext:[75,76],infervartypefn:37,info:[29,48,74,81,84,93,97,119],inform:[7,17,28,35,38,40,43,44,47,48,51,58,62,64,65,95],infrastructur:[30,95],ingrad_:42,ingredi:[20,67],inherit:[7,19,50,60,68],ininst:4,init:[7,22,33,42,48,49,52,74,84,86,91,95,97],init_attr:58,init_model_path:[102,103,105],initi:[6,8,13,18,21,22,23,25,34,48,51,53,58,62,66,70,86,103,109],initial_max:83,initial_mean:83,initial_min:83,initial_std:83,initialize_op_attr:58,initrd:119,inlin:[68,76,95],inner:[81,89],inner_pos_arrai:89,inner_seq_pos_arrai:89,input0:76,input1:76,input:[5,6,7,12,16,18,19,21,22,23,24,25,29,30,31,32,33,34,35,37,38,39,40,42,43,47,48,49,53,54,57,58,59,60,61,64,66,67,68,70,74,75,76,81,82,83,84,86,89,97,105,109,110,113,114],input_data:74,input_data_target:74,input_hassub_sequence_data:74,input_index:74,input_label:74,input_lay:74,input_nam:4,input_seg:70,input_sequence_data:74,input_sequence_label:74,input_sparse_float_value_data:74,input_sparse_non_value_data:74,input_t:74,input_to_check:5,input_typ:81,input_valu:5,input_var:[5,58],inputbuff:12,inputdef:74,inputgradi:61,inputlayers_:74,inputs_to_check:5,inputsizechang:43,insert:[6,24,31,51,57,60,61,72],insid:[6,9,21,23,24,25,38,43,54,55,59,60,61,71,95],inspir:55,instal:[0,1,3,17,42,63,72,77,78,86,87,96,101,107,116,117,118],install_android:116,install_step:86,instanc:[5,7,9,11,15,18,19,21,22,24,26,31,36,43,48,49,53,58,60,61],instance_ip:95,instanti:[9,26,109],instead:[5,6,8,12,17,18,20,21,29,30,34,35,67],instrins:29,instruct:[7,34],int16:71,int32:[47,56,70,71,103],int64:[21,27,39,47,65,71],int64_t:55,int8:47,integ:[13,17,18,20,29,45,49,84],integer_sequ:81,integer_valu:[81,84,89],integer_value_sequ:[49,67,89,114],integer_value_sub_sequ:89,integr:119,intel:[30,47,68],intellig:40,inteloptimizedpaddl:42,intens:67,inter:21,interact:[21,95],interchang:[34,60],interconnect:51,interest:[18,29,51],interfac:[7,13,17,19,28,35,51,54,60,61,67,68,95,119],intermedi:[21,28,31,33,40,50,67],intern:[29,67,95,107],internel:42,internet:[8,9,119],interpret:[26,30,31,71],intrins:[18,26,29],introduc:[7,9,19,33,41,44,62,64,66],intuit:[16,60],inval_:42,invalid:[59,64],invent:30,invoc:[8,32,60],invok:[6,19,21,24,39,54,58,60,61,66,95,108],involv:49,ios:117,ios_arch:117,ios_deployment_target:117,ios_development_root:117,ios_enable_bitcod:117,ios_platform:117,ios_sdk_root:117,ios_use_veclib_for_bla:117,ip_str:97,ips:[95,97],ipt:[58,66,83,114],ipx:119,ipython:4,is_async:91,is_cpu_plac:43,is_inf:90,is_mkldnn_librari:43,is_seq:114,is_stat:83,is_target:57,is_tensor:66,is_test:43,is_traget:57,isbinari:89,isinst:[24,81,84],ismkldnnkernel:43,ispodallrun:97,isspars:74,issu:[0,8,33,67,80],issue_numb:72,istag:63,item:[16,29,59,86,97,119],iter:[4,9,21,30,31,40,43,53,55,59,67,70],iter_multiple_input_and_param:58,its:[4,6,7,9,13,18,19,20,23,24,25,30,31,33,34,35,37,39,40,44,48,49,51,53,54,57,58,60,61,64,65,66,68,95,108],itself:[6,9,16,19,31,43,53,64],ivector:[89,90],ivs:66,java:[7,45,56,60],jeremi:108,job:[6,16,18,21,24,60,97,102,103,105],job_desc:21,job_dispatch_packag:93,job_nam:[17,95,97],job_namespac:[95,97],job_path:[95,97],job_path_output:97,job_workspac:93,jobdesc:21,jobnam:[21,97],jobpath:[95,97],jobport0:95,jobport1:95,jobport2:95,jobport3:95,jobselector:97,jobserv:17,join:9,jpg:19,json:[35,67,95,96],juditski:53,jupyt:17,just:[8,13,14,18,21,30,31,33,37,43,53,54,58,59,60,61,62,64,65,95],jx4xr:95,jypyt:4,k8s:[18,97,119],k8s_data:[95,97],k8s_job:4,k8s_token:4,k8s_train:[95,97],k8s_user:4,kafka:11,kcpu:55,kcuda:55,kdisabl:55,kebilinearinterpbw:108,kebilinearinterpfw:108,keep:[9,20,30,31,34,49,53,58,64,66,119],kei:[0,5,6,7,9,11,13,27,29,38,43,60,61,66,67,70,72,81,97,108],kenlm:67,kept:[40,58],kera:62,kernel:[5,20,29,31,38,39,42,53,55,62,65,67,68,75,76],kernel_hint:38,kernel_type_for_var:39,kerneltyp:[38,43],key1:103,key2:103,key_pair_nam:95,keyid:95,keymetadata:95,keypair:95,keyserv:95,keystat:95,keyusag:95,keyword:[58,69,97],kforcecpu:38,kill:[9,95],kind:[4,5,9,15,19,21,24,31,34,38,39,43,50,51,55,68,71,81,84,95,96,97],kind_:55,kinput:52,kmark:55,kms:95,knchw8c:47,knchw:47,knhwc:47,know:[4,13,18,19,40,44,95],knowledg:67,known:[6,7,20,30,32,48],koutput:52,kparallelblock:52,kparallelscop:52,kparamet:52,kplace:52,kpoprang:55,kpushrang:55,kqueue:20,kselectedrow:65,kstate:55,kube_cluster_tl:4,kube_ctrl_start_job:4,kube_get_workers_addr:18,kube_list_containers_in_job_and_return_current_containers_rank:4,kubeconfig:95,kubectl:[93,96,97,98],kuberent:[9,95],kubernet:[4,9,18,21,60,94,97,99,100,119],kubernetes_service_host:4,kusecudnn:38,kusemkldnn:38,kwarg:[25,35,58,66],l1_regularization_op:62,l2_regularization_op:62,l2regular:[81,91],l93:12,label:[21,25,30,33,34,35,39,50,52,56,59,67,81,84,88,96,109],label_fil:59,label_lay:59,label_path:59,labelselector:97,lag:103,lambda:[18,24],lan:101,languag:[18,20,30,34,40,55,60,64,67,69,81,84],larg:[21,23,24,40,44,53,67,72],larger:[40,69],larger_than:[7,36,56],last:[6,24,40,48,55,56,110],last_seq:[49,111],lastseen:96,latenc:[29,67,95],later:[8,60,62,67,68,76,95],latest:[1,7,8,9,16,63,72,77,78,96,97,116],latter:[53,70],launch:[43,95],launcher:4,law:[81,84],layer1:[81,110],layer2:[81,110],layer:[6,7,12,18,20,21,23,30,33,34,36,50,52,53,56,59,60,62,66,67,68,70,74,81,83,84,86,89,90,109,110,113,114],layer_0:74,layer_att:82,layer_attr:[81,82,105,114],layer_expand:110,layer_first_seq:110,layer_help:38,layer_last_seq:110,layer_nam:81,layer_num:105,layer_pool:110,layer_s:89,layer_typ:[41,42],layerbas:74,layerconfig:74,layergradutil:74,layerhelp:[38,58],layermap:74,layers_test:78,layout:[39,43],layout_:[38,47],layouttyp:38,lazi:[53,62],ld_library_path:87,lead:[19,40,47],leaki:33,learing_r:50,learn:[1,4,6,14,16,20,21,23,26,33,34,40,42,49,51,53,55,59,60,62,68,108],learning_r:[14,21,81,83,91,109],learning_rate_arg:83,learning_rate_decay_a:83,learning_rate_decay_b:83,learning_rate_schedul:83,leas:9,least:9,leav:[7,95],lectur:40,left:7,left_scor:81,legal:66,len:[14,18,27,30,58,74,86,97],length:[14,29,41,44,48,49,60,67,70,96],leran:40,less:[4,24,119],less_equ:69,less_than:[4,40],let02:96,let:[4,7,16,18,20,31,32,34,38,39,43,47,48,49,50,61,68,95],level:[29,32,35,44,48,49,55,68,70,71,89,113],lgtest:8,lgtest_main:8,lib64:[1,78,103],lib:[0,46,87,107,116,117,118],lib_path:78,libapi:8,libari:46,libc:3,libcuda:[1,78],libgcc_:3,libgflag:87,libglog:87,libgoogl:107,libiomp5:42,libmkldnn:42,libmklml_intel:42,libnvidia:[1,78],libopenbla:87,libpaddl:[45,46,60,72,107],libpaddle_capi:46,libpaddle_capi_engin:87,libpaddle_capi_lay:87,libpaddle_capi_shar:87,libpaddle_capi_whol:87,libpaddle_gserv:46,libpaddle_math:46,libprotobuf:[83,87],librari:[8,15,20,21,39,42,43,46,51,67,103],library_:47,library_type_:39,librarydevicecontext:47,librarytyp:39,libstdc:3,libz:87,licens:[42,51,81,84],life:9,lifecycl:[55,119],lifetim:64,lightweight:32,like:[6,7,8,9,12,17,18,20,26,30,31,32,33,34,35,37,39,43,47,51,52,53,58,59,60,61,62,64,65,67,69,70,71,95,109,119],limit:[30,40,44,49,60,62,81,83,84,108],linaro:118,line:[8,12,17,20,28,34,53,56,58,60,62,72,81,83,95,105],line_count:83,linear:[49,81,83,84,86],lineno:107,link1:29,link2:29,link:[8,27,28,64,95,113,119],linux:[0,3,20,27,95,116,118],linux_x86_64:[3,63,78],lipo:117,list:[4,6,7,8,13,17,18,26,28,30,33,47,50,52,54,55,58,61,64,70,81,95,105,107],listdir:91,listen:[9,18,21],listen_and_do:18,listenanddo:18,lite:87,littl:[14,38,44],live:109,live_in:40,live_out:40,load:[4,9,21,33,51,58,84,95,97],load_missing_parameter_strategi:[102,103,105],load_mnist:33,load_paramet:83,loadsave_parameters_in_pserv:[12,102,103],local:[0,5,7,9,15,16,20,34,40,48,52,56,58,60,97,102,103],local_scop:5,local_w1_grad:52,local_w2_grad:52,localhost:[1,77],localip:97,localpath:28,locat:[8,30,47,55,68,70],lock:[8,9,13,14],lod:[20,44,48,65,70,71],lod_desc:65,lod_expand:49,lod_level:[58,65,71],lod_rank_t:71,lod_tensor:[48,65,71],lod_tensor_arrai:71,lodtenosr:19,lodtensor:[19,20,37,44,60,71],lodtensordesc:[44,65],log:[3,13,21,28,33,74,79,83,91,93,95,96,97,98,103],log_barrier_abstract:[102,103],log_barrier_lowest_nod:[102,103],log_barrier_show_log:[102,103],log_clip:[102,103],log_error_clip:[102,103],log_period:[96,97,103,105],log_period_serv:[102,103],logger:81,logic:[16,21,23,24,33,37,50,51,54,64,70],logit:[33,39],longer:[9,21,40],look:[7,17,18,20,30,31,34,52,53,58,61,62,67,71,95,109],lookahead:67,lookup:[37,49,109],lookup_t:40,loop:[5,7,19,30,40,55,59,64],loop_var:70,loss:[6,21,33,35,50,52,53,62,67],loss_gard:52,lot:[21,47,49,53,58,62,68,119],low:[50,51,67,68,70],low_rnn:48,lower:[29,48,49],lower_level_rnn:48,lpaddle_capi_engin:87,lpaddle_capi_lay:87,lpaddle_capi_shar:46,lpaddle_capi_whol:46,lrelu:33,lstm:[96,114],lstm_last:111,lstmemori:[82,114],lstmemory_group:82,lstmemory_unit:82,lstmlayer:41,luckili:40,mac:[46,116],machin:[21,23,30,33,40,42,51,53,62,81,90,95,98,113,119],machine_transl:114,maco:[0,3],macro:[32,47,61],made:[9,14,30],mai:[5,7,21,25,29,31,38,39,40,43,51,55,59,60,64,67,71,81,84,95],main:[18,20,24,30,31,35,51,56,60,87,95,107],main_program:[6,25,52],mainli:[15,40,47,68],maintain:[7,13,53,58,60,95],majel:8,major:[21,29,39],make:[0,4,6,7,8,9,13,14,16,20,21,22,29,30,34,48,49,52,53,54,58,59,60,62,67,70,72,74,75,77,78,87,95,108,116,117,118,119],make_chan:20,make_channel:20,make_ddim:76,make_function_oper:32,make_vari:66,maker:[60,61],malloc:68,man:27,manag:[9,14,15,18,20,21,28,55,64,68,77],mandarin:67,mani:[6,8,13,18,20,30,33,38,39,40,49,54,55,58,60,61,64,65,66,69,70],manili:35,manipul:[30,58,61],manner:[53,62,67,68],mantain:40,manual:[21,50,53,61,83,119],manufactur:30,manylinux1:3,manylinux1_x86_64:[3,63,78],manylinux:63,map:[4,7,13,24,43,47,58,61,64,66,68,70,84,87,119],map_fn:70,mapreduc:4,mark:[6,23,33,34,48,49,55,64,119],marker:55,market:29,master:[4,16,60,63,118],mastermind:8,mat:[45,46,89],mat_cache_row:12,mat_norm:12,mat_normal_shar:12,mat_sparse_row:12,mat_sparse_row_auto_grow:12,mat_sparse_row_id:12,mat_sparse_row_prefetch:12,mat_sparse_row_prefetch_full_s:12,mat_value_shar:12,match:[8,29,69,81],matchbox:119,math:[42,45,60,74,75,108],mathemat:62,matmul:[7,35,48,70,75],matrix:[12,45,46,74,75,87,89,90],matrixptr:74,matrixtyp:46,mattyp:12,max:[5,22,24,40,58,83,105,108,110],max_diff:5,max_length:[49,114],max_relative_error:[5,75],maxim:24,maximum:[7,14,52],maxoutfunctor:68,mayb:[7,43],md5:10,mean:[6,8,21,22,24,35,39,49,52,57,59,64,67,81,95,103,109,119],meant:70,measur:25,mechan:[6,15,19,25,43,58,61,95],mem:[7,17,49],mem_per_pserv:21,mem_per_train:21,member:[4,24,34,35,47,54,58,64],memcpi:[54,108],memori:[6,7,12,13,17,29,31,39,42,43,44,47,49,53,55,60,76,96,108,109,114],memory_nam:82,memory_optim:40,memory_threshold_on_load_data:[102,103],memoryalloc:68,memorydesc:43,mention:[6,8,13,21,23,30,48,51,53,55],merg:[14,16,22,25,42,48,51,52,54,72,90],merge_model:90,merge_v2_model:90,merge_v2_modelss:90,messag:[7,18,20,26,30,31,34,44,55,56,57,58,60,61,65,71,72,78,96],metadata:[27,95,96,97],metal:119,metaphor:34,metaplotlib:4,method:[5,7,16,18,19,21,22,24,29,33,34,35,38,39,50,51,58,59,60,64,65,70,107],methodolog:53,metric:[25,55],mfs:97,microarchitectur:29,might:[7,8,18,20,30,40,56,67,95],min:[22,24,58,95,105,108],min_block:7,min_count:23,min_desc:7,min_pool_s:81,mini:[7,9,20,25,26,30,36,48,81],mini_batch:59,minibatch:[7,25,34,36,56],minim:[7,21,23,24,30,33,50,60,109],minimum:67,minsizerel:[116,117,118],minu:61,minus_grad:61,minusgradop:61,minusop:61,minusopgradmak:61,minusopprotoandcheckermak:61,minut:[9,16,95],mip:116,mirror:8,mislead:14,miss:33,mistak:30,mit:95,mix:[52,55,70,114],mkdir:[28,77,87,95,98],mkl:[0,39,43,60,68,78,87],mkl_packed_:41,mkldnn:[39,42,47],mkldnn_:42,mkldnnactiv:42,mkldnnbase:42,mkldnnlayer:42,mkldnnmatrix:42,mkldnnstream:42,mkldnntester:42,mklml:[42,78],mklml_lnx_2018:78,mklpack:41,mklpackedgatedrecurrentlay:41,mklpackedgemm:41,mklpackedlstmlay:41,mklpackedrecurrentlay:41,mlp:35,mnist:[11,21,33,34,56,59,60,90,107],mnist_random_image_batch_read:59,mnist_train:59,mnist_train_batch_read:59,mnist_v2:90,mnt:97,mobil:[29,30,40,60,77],mode:[29,41,51,52,72,97],model:[6,7,9,10,18,21,23,24,25,34,39,40,41,50,51,53,60,62,67,70,77,84,86,90,95,105],model_list:[103,105],model_path:105,modelparallel:21,modern:40,modif:67,modifi:[21,29,35,62,72,95],modul:[21,32,33,49,67,70,75,83,107],modular:49,momentum:[64,81,84],momentumop:107,mon:96,monitor:[20,55],month:8,more:[4,5,6,8,9,13,16,17,19,20,21,23,28,29,30,31,32,34,38,40,43,47,48,49,50,55,58,59,60,62,67,68,69,70,83,108,109,119],most:[4,6,8,16,20,21,31,34,35,47,49,53,55,59,62,67,68,69,109,119],mostli:[29,119],motiv:60,mount:[17,95],mountpath:[95,96,97],move:[9,13,19,28,30,53,95,119],movidiu:30,movie_id:97,mpi:[20,51,98],mpirun:98,mse:[30,34,50,56],much:[9,30,43,50,59,62,70],mul:[32,40,58,74,75],mul_grad:75,mul_op:75,mul_result:58,mulgradkernel:75,mulkernel:75,mulop:[32,75],mulopgrad:75,mulopmak:75,mult:[18,31],multi:[25,39,51,54,119],multigradientmachin:54,multipl:[4,5,13,14,16,18,20,21,23,25,30,31,32,38,39,51,52,55,60,67,71,84,95],multiple_input:58,multiple_param_attr:58,multipli:18,multithread:52,must:[6,14,19,24,40,43,44,47,55,57,58,59,60,66,71,74,75,76,81,91,95],mutabl:[68,76],mutable_data:[43,68,75,76],mutex:20,mutuable_data:68,mxnet:[7,18,20,30],my_cluster_nam:95,my_cost:83,my_external_dns_nam:95,my_lib:91,my_net:52,myerrorclip:24,mypaddl:[96,97],naiv:18,name:[4,5,6,7,9,11,12,14,17,18,19,21,25,29,32,35,38,42,43,44,46,47,49,55,56,58,60,63,65,66,70,71,74,83,84,86,96,97,99,100,105,108,109,114,119],name_:55,name_prefix:11,namespac:[7,36,45,58,74,75,96,97],nativ:29,natur:[13,16,23,49,70],ncall:107,nccl1:51,nccl2:51,ncclinit:51,nchw8:39,nchw8c:39,nchw:[42,47],ndarrai:[11,81],ndk:116,nearest:29,nearli:[5,19],necess:70,necessari:[6,7,14,16,24,25,40,44,49,54,58,66,70],necessarili:18,neck:51,need:[4,5,6,8,12,13,14,16,17,19,20,21,23,24,25,28,30,31,32,33,38,40,43,47,49,50,51,52,53,54,55,57,58,60,61,62,64,65,66,67,68,70,71,78,95,97,108,119],need_tran:83,neighberhood:51,neon:29,nervana:30,nessesari:67,nest:[6,7,55,56,71,89],net:[0,7,33,48,64,90],netop:[7,60],network:[4,5,6,7,9,12,21,23,25,33,35,39,40,41,42,48,50,53,55,58,59,62,64,66,67,68,71,82,84,86,89,90,97,105,111,119],network_config:105,networkadministr:95,neural:[4,6,7,9,21,35,39,40,41,42,48,53,62,64,68,71,86,111,113],neuralnetwork:54,never:[40,59,64,95,96,97],new_block_idx:58,new_op_desc:24,new_scop:39,new_stat:48,newblock:58,newbuff:43,newest:14,newli:[29,119],newop:7,newopdesc:58,newprogram:58,newscop:39,newvardesc:58,next:[6,9,15,19,20,24,49,51,70,95],nextlay:42,nfs4:95,nfs:[95,97],nfsdir:97,nfsver:95,nic:[97,102,103],nil:[13,20],nmt_without_attent:81,nnz:[74,89],no_grad_dict:6,no_grad_set:[5,6,75],no_gradi:6,no_sequ:84,node0:97,node1ip:98,node2ip:98,node3ip:98,node:[8,16,18,21,23,35,40,49,51,60,95,96,97,98,119],node_0:[95,97],node_1:[95,97],node_2:[95,97],node_id:91,nodeattr:35,nodeentri:35,nodefil:93,nodesep:35,nohup:91,nois:[9,33],noisi:33,non:[9,20,29,30,65,95],none:[4,5,6,7,20,24,25,33,35,36,48,49,50,56,58,66,70,109,114],noneedtran:43,nor:18,norm:[33,47],normal:[52,53,67,96,97],notat:40,note:[4,6,7,12,13,17,39,40,44,47,51,59,60,68,76,95],notebook:[1,17],noteworthi:30,noth:[38,58,64,72],notic:[24,30,51,61],notimplementederror:24,notin:39,notingradi:75,notion:70,notori:5,now:[6,8,9,19,20,23,33,44,47,53,60,61,62,64,95,113],nproc:0,nullptr:[43,55,61,64,74],num:[91,97,103],num_class:35,num_gradient_serv:[91,102,103],num_hidden:35,num_parameter_serv:4,num_pass:[84,96,97,102,103,105],num_pserv:21,num_row:65,num_samples_process:83,num_shard:11,num_step:70,num_train:21,number:[7,9,11,23,25,40,53,55,59,60,66,70,95],numdevices_:105,numeric_grad:5,numerical_grad:5,numlogicaldevices_:105,numpi:[0,11,29,33,58,59,75,81,83,84],numreal:12,numsampl:108,numtimeout:13,nv_:8,nv_librari:8,nv_test:8,nvcc:[8,29,31],nvidia:[1,29,47,51,68,78],nvlink:51,nvprof:55,obj:83,object:[4,12,19,21,24,25,33,35,40,45,50,55,58,60,62,64,108],obtain:[16,19,53,68,81,84],obvious:[8,47],occup:[40,97],occupi:[29,55],occur:40,occurr:7,oct:96,off:[0,46,72,77,87,101,116,117,118,119],offer:[7,60,66],offici:[8,95],offlin:[9,11,119],offset:[12,89],often:[12,35,40,47],ograd:74,old:[5,14,16,49,60],older:30,omega:62,omit:81,omp_num_thread:107,ompi_comm_world_rank:91,onc:[9,13,18,21,23,25,30,34,53,69,95],one:[4,5,6,7,9,12,13,14,16,17,18,19,20,21,24,25,26,29,30,31,32,33,35,37,38,39,43,44,47,48,49,50,51,53,54,56,57,58,59,60,61,64,65,67,68,69,70,89,95,109,119],onehotcrossentropyopkernel:75,ones:[32,33,60],onli:[4,5,6,8,12,13,14,15,16,17,18,19,21,23,24,25,26,28,29,30,33,34,39,40,43,48,49,50,51,54,55,58,60,65,66,67,68,69,70,71,89,95,113,119],onlin:[9,11,40,59],only_cpu:5,onnx:30,onto:[21,23,95],op1:[39,40],op1_2_op2:39,op1_to_op2:39,op2:[39,40],op3:40,op_:75,op_check:75,op_class:[60,66],op_desc:[24,37,57],op_info:109,op_kei:43,op_maker_class:[60,66],op_proto:66,op_registri:109,op_siz:24,op_test:75,op_typ:[60,75],op_unique_kei:43,opattrcheck:75,opcreat:66,opdesc:[7,24,34,56,57,58,60,61,66,71],opdescbind:[37,61],opdescbuild:7,opeartor:52,open:[4,11,30,33,42,59,81,83,84,95],openbla:[0,1,87],openmpi:[94,98],opensourc:51,oper:[5,7,18,20,21,22,23,25,26,29,30,31,33,34,35,37,38,39,48,49,50,51,52,55,57,62,64,67,68,71,75,76,95,109],operand:29,operartor:76,operat:52,operator_grad:5,operator_list:55,operatorbas:[7,32,60,61,66,75],operatorwithkernel:[39,75],opinfo:[37,60,61],opinfomak:37,opinfomap:61,opkernel:[75,76],opkernelkei:60,opkerneltyp:[39,47],opmak:66,opproto:75,opprotoandcheckermak:[61,75],opprotomak:[66,75],opregist:66,opregistri:66,ops:[5,6,7,8,18,19,31,34,35,52,53,56,57,58,60,68,75,119],ops_:7,ops_test:8,opt:[0,4,50,57,66,97],opt_op_list:50,optest:75,optim:[5,6,21,22,23,31,33,51,53,54,56,60,62,65,67,81,83,84,91,109],optimis:50,optimize_op_attr:58,optimzi:81,option:[4,8,21,33,38,44,56,57,58,60,65,66,67,71,116,119],optmization_op_list:50,opts_np:57,optyp:[37,66],opwithkernel:65,order:[6,34,44,55,59,62,70,95,97,107,119],oregon:95,org:[11,27,33,81,84],orient:66,origin:[5,29,33,64,70,72],other:[7,9,14,18,28,29,30,31,37,39,40,43,47,48,53,57,62,64,66,67,68,71,95,109,119],otherwis:[4,6,9,14,16,33,37,43,59,67],our:[4,6,8,19,20,21,23,33,37,40,47,51,53,64,70,95],out:[4,7,8,13,16,19,21,24,30,35,39,40,43,48,49,58,75,76,81,95,107,113,114],out_dir:[95,97],out_mem:114,outer_mem:111,outgrad_:42,output:[4,5,6,7,11,16,18,19,23,24,28,31,32,33,34,35,36,37,39,40,43,44,48,49,52,53,56,57,58,59,60,61,64,65,66,68,70,75,76,81,84,90,93,96,97,105,114],output_:42,output_all_step:48,output_arg_nam:24,output_fil:90,output_lay:[81,84,86],output_mem:114,output_nam:5,output_num:48,output_path:11,output_seg:70,outputbuff:12,outputgradi:61,outsid:[21,64],outter:89,outter_pos_arrai:89,outter_seq_pos_arrai:89,outupt:70,outv:74,outval_:42,over:[4,30,40,51,52,53,70],overal:[33,53,55,119],overfit:62,overload:[29,38],overrid:[7,9,28,43,68,75,76],overview:[13,14,15,68],overwrit:28,own:[6,14,16,24,26,35,37,50,51,53,62,66,95],pack:[70,83],packag:[13,17,18,32,42,63,78,95,107],pad:[43,67],paddl:[0,1,3,4,7,8,9,11,17,19,21,28,31,32,33,36,41,42,43,44,45,46,48,49,54,56,60,62,63,66,67,68,70,72,74,75,77,81,83,84,86,87,89,90,91,93,95,96,97,98,101,105,107,108,109,114,116,119],paddle_arguments_get_sequence_start_po:89,paddle_arguments_set_id:89,paddle_arguments_set_sequence_start_po:89,paddle_arguments_set_valu:89,paddle_begin_init_param:14,paddle_capi:87,paddle_dir:75,paddle_doc:77,paddle_docs_cn:77,paddle_element_typ:14,paddle_element_type_float32:14,paddle_element_type_float64:14,paddle_element_type_int32:14,paddle_element_type_int64:14,paddle_element_type_uint32:14,paddle_element_type_uint64:14,paddle_enforc:[7,19,43],paddle_enforce_eq:[75,76],paddle_error:[45,46],paddle_exampl:17,paddle_finish_init_param:14,paddle_get_param:14,paddle_gradi:14,paddle_gradient_machine_create_shared_param:90,paddle_gradient_machine_forward:90,paddle_gradient_machine_load_parameter_from_disk:90,paddle_init:90,paddle_init_num_gradient_serv:91,paddle_init_param:14,paddle_init_port:91,paddle_init_ports_num:91,paddle_init_ports_num_for_spars:91,paddle_init_pserv:91,paddle_init_trainer_count:91,paddle_init_trainer_id:91,paddle_init_use_gpu:91,paddle_ivector:89,paddle_ivector_cr:89,paddle_job:17,paddle_manylinux_devel:0,paddle_matrix:[45,46,89,90],paddle_matrix_cr:[46,89],paddle_matrix_create_spars:89,paddle_matrix_get_row:89,paddle_matrix_get_shap:45,paddle_matrix_shap:45,paddle_matrix_sparse_copy_from:89,paddle_n:97,paddle_new_etcd_pserver_cli:14,paddle_new_pserver_cli:14,paddle_on_cloud:17,paddle_output:96,paddle_paramet:14,paddle_port:97,paddle_ports_num:97,paddle_ports_num_spars:97,paddle_process_by_paddl:97,paddle_pserver2:93,paddle_pserver_cli:14,paddle_pserver_client_releas:14,paddle_r:89,paddle_root:87,paddle_save_model:14,paddle_send_grad:14,paddle_server_num:97,paddle_train:[46,63,93,97],paddle_with_cuda:55,paddle_with_mkldnn:47,paddlepaddl:[0,1,3,8,9,11,14,15,16,17,18,21,27,28,32,33,34,36,38,44,48,49,50,54,55,58,59,60,64,70,71,72,81,84,86,87,89,90,93,96,97,99,100,101,107,108,114,116,118,119],paddlepaddle_gpu:3,paddlepaddlebook:1,paddlepaddlehub:[1,116],page:95,pair:[6,7,21,34,50,55,60],pakcag:8,panic:20,paper:[33,67],para:12,paradigm:[18,26,60],paragraph:48,paragraph_data:48,paragraph_out:48,parallel:[18,20,21,23,39,51,52,55,60,95,96,97,105,108],parallel_do_grad:52,parallel_do_op:52,parallel_for:18,parallel_nn:[102,103],paralleldo:[22,52],parallelfor:18,paralleliz:67,param:[5,7,14,52,54,58,68,76,81,83],param_attr:[12,58,81,83,114],param_config_proto:14,param_fil:[83,90],paramattr:[81,83,114],paramet:[5,6,7,8,10,12,16,18,21,22,24,26,28,30,31,33,34,35,37,44,48,50,51,56,59,64,66,67,70,81,83,84,86,89,92,93,97,103,109],parameter_block_s:[102,103],parameter_block_size_for_spars:[102,103],parameter_list:[6,50],parameter_nam:4,parameter_serv:4,parameter_valu:12,parameterattribut:12,parameterclient2:97,parameterclient_:12,parametermap:74,parametermutex_:12,parameters_:74,parameters_and_grad:50,parameterserver2:12,parameterset:4,parameterupdat:54,parameterupdater_:12,params_grad:50,params_pass_4:90,params_pass_90:84,params_pass_:84,paramt:95,paraspars:74,parent:[7,18,56,58,60],parent_:[7,64],parent_block:52,parent_idx:58,parenthes:60,pars:[0,8,21,35,95],parse_known_arg:97,parsefromstr:83,parser:97,part:[6,7,16,21,30,43,44,56,58,67,68,119],particular:[34,39,44,60,67],partit:[9,11,21,23,60,95],paserv:97,pass:[6,7,9,20,24,25,30,33,40,44,50,52,53,54,57,58,59,60,62,64,67,69,70,72,81,84,95,96,97,103,105,108],pass_id:[21,81,84],pass_idx:59,pass_manu:83,passtyp:74,past:[4,95],patch:27,path:[9,13,14,17,40,49,59,67,87,95,96,97,103,116,117,118],path_to_paddlepaddle_working_directori:77,pattern:[9,45,53,62,95],paus:[9,16],pcie:51,pd_api:89,peer:[51,79],pem:[4,11,95],pend:[9,13],pep425tag:[3,78],per:[9,14,51,53,59,62],percal:107,perf_test:107,perfectli:67,perform:[5,14,20,21,25,29,30,33,39,40,51,54,55,59,60,62,67,68,102,107,108],perftool:[55,107],period:[9,16,103],permiss:[81,84,95],persist:[26,65,67,71,95],persistentvolum:95,persistentvolumeclaim:[95,97],person:[4,38],perspect:60,perturb:5,pex:119,pfs:[11,28],pfsclient:11,pfspath:28,pgp:95,phase:[43,49,51,53,59,61,67,119],philosophi:[53,62],photo:33,physic:119,pick:95,pickl:[91,98],pictur:51,piec:[18,55],pillow:17,pip:[0,3,63,72,77,78,86,107],pipelin:[25,67],pivot:43,pixel:21,place:[6,7,9,16,21,23,26,38,39,43,51,52,60,76,109],place_:[38,39,47,68],place_list:71,placehold:[33,68,76],placement:23,plain:[17,44,46,47],plan:[9,18,43,60,67],platform:[3,7,31,39,43,47,55,68,75,76,95,109,116,117],pleas:[4,9,13,14,15,18,20,31,35,47,48,58,59,60,67,68,71,77,78,95,97],plot:4,plug:[51,53],pnpairvalid:102,pod:[11,17,18,71,95,96,97],pod_nam:95,podip:97,podlist:97,podtyp:71,point:[7,9,17,20,29,40,43,51,68,76,89,108,119],pointer:[7,14,35,40,47,58,60,64,68,76,89],poli:83,polici:95,poll:20,pollut:16,polyak:53,ponit:35,pool3:74,pool:[22,40,67],pooling_lay:81,pooling_typ:[81,110],pop:[7,26],popul:14,popular:[8,33,35,55],port:[8,18,91,95,96,97,102,103],port_num:102,portabl:35,portal:77,ports_num:[91,97,103],ports_num_for_spars:[12,91,97,102,103,105],pose:9,posit:89,possibl:[4,7,13,20,23,40,58,62,71],post:[0,17,27],postpon:62,potenti:[29,108],pow:83,power:[29,40,51,67,119],ppo_workspac:77,pprof:107,pre:[4,14,38,40,95],pre_activ:58,pre_bia:58,pre_stat:[48,70],preambl:58,precis:[25,29,53],precompil:26,pred:[35,40],predecessor:40,predict:[21,52,62,81,86,90],predict_fil:[102,103],predict_output_dir:[102,103],prefer:[30,38],prefetch:[12,52,74],prefix:[9,11,49,67,95],pregel:20,pregrad:74,prepand:58,prepar:[5,17,54,67,91,98,99],prepend:58,prepend_oper:58,preprocess:[67,70],present:[4,6,7,55,70,72],preserv:28,prev_batch_st:[102,103],prevent:[4,9,13,16,24,62],preview:60,previou:[6,9,23,28,48,49,95],previous_memori:7,price:[60,86],prim:43,primari:[30,34],primarili:[53,62],primit:[29,42,43,51,52,70],primitive_desc:43,primitivedesc:43,principl:[4,8,47],print:[3,4,20,21,30,35,58,69,78,84,86,98],print_graphviz:35,printallstatu:108,println:20,printstatu:108,prior:52,prioriti:60,privat:[7,46,55,58,64,65,66,68,70,72,76],privileg:95,pro:51,prob:86,probabl:[49,67],problem:[4,5,8,16,19,30,33,34,53,60,62],proc:1,proce:[9,59,95],procedur:[7,44],process:[4,6,7,11,12,13,16,18,19,20,21,25,26,30,31,35,39,40,42,44,51,52,62,66,81,83,95,97],processor:[29,108],prod:72,produc:[9,30,35,59],product:[17,30,84,95],productgraph:96,prof:107,profil:[28,55,67,107,108],profilerst:55,profl:107,proflier:[55,108],prog:97,program:[4,6,11,14,16,21,23,26,34,36,40,50,51,52,55,59,60,64,69,71,97,108],programdesc:[18,21,26,30,40,44,52,57,58,61,71],programm:[21,30,58],progress:[9,13],project:[17,46,67],promis:49,prompt:[28,30],prone:4,pronunc:67,prop_kind:43,propag:[6,30,53],proper:38,properli:38,properti:[35,62],propos:[7,22,23,49,50,51,53,70],proprietari:42,protect:[19,29,66,74,75],proto:[20,38,44,47,56,60,66,71,75],proto_:66,protobuf:[7,17,18,21,26,30,31,34,35,40,44,56,58,60,61,66,83,87,90],protocol:[109,119],protomak:75,provi:91,provid:[4,7,14,17,18,25,26,29,30,33,35,37,38,47,51,53,55,58,62,66,67,68,69,70,81,86,95,102,119],provis:[95,119],prune:7,ps_desir:9,pserver:[12,14,15,17,60,91,93,95,97,102,103],pserver_addr:14,pserver_cpu:17,pserver_id:10,pserver_mem:17,pserver_num_thread:[12,102,103],pseudo:[4,6,17,61,70],pseudocod:70,psize:74,ptr:[46,68],pub:98,pull:[1,8,60,63,116],purpos:[9,21,23,38,108],push:[7,26,30,55,63,97],push_back:74,put:[8,9,12,23,40,43,58,68],pvc:95,pwd:[0,1,72,77,116],pxe:119,py_paddl:78,pybind:[7,20,29],pydataprovid:81,pydataprovider2:97,python2:107,python:[0,3,4,7,15,19,20,25,26,30,32,33,34,35,38,45,49,52,54,55,60,63,68,70,72,74,75,77,78,81,86,90,91,98,107,109,114],pythonpath:78,pytorch:[30,55],qualcomm:29,queri:95,question:[4,18,23,66,95],queue:[20,23],quick:35,quick_start:[17,95,96,97,99],quick_start_data:96,quickli:[49,58,60],quickstart:96,quit:49,r14b:116,rais:[24,35,91],rajathkmp:33,ran:[23,108],rand:[33,83,89,103,105,108],rand_max:89,random:[11,20,33,47,54,58,59,75,83],random_imag:11,randomli:[16,89],rang:[11,18,21,29,33,40,55,59,66,97],rank0:51,rank1:51,rank:[4,70,95],rankdir:35,rapid:61,raspberrypi:118,raspbian:118,rasspberri:118,rate:[14,67,81,97],rather:[6,17,33,70,95],ratio:103,raw:44,rdma_tcp:[102,103],reach:[9,40,51],read:[4,6,9,11,18,19,20,21,23,30,31,59,60,67,70,83,95,119],read_from_arrai:40,read_from_realistic_imag:4,read_from_rng:4,read_lock:10,read_minibatch:30,read_mnist_imag:4,read_next_from_fil:81,read_paramet:83,read_ranking_model_data:4,readabl:[55,60],reader:[11,21,29,33,34,56,67,71,84,91,107],reader_cr:11,reader_creator_bool:59,reader_creator_random_imag:59,reader_creator_random_image_and_label:59,readi:[9,20,95,96,119],readlockguard:12,readm:[46,72],readnext:19,readwritebuffer_:12,readwritemani:95,real:[12,33,59],realist:4,realiti:67,realiz:[7,48],realli:[30,62],reason:[4,5,9,20,30,96],receiv:[9,17,20,21,23,48],recent:[40,53],recognit:67,recommend:[4,97],record:[13,43,55,66,95],recordev:55,recordio:[4,11,13,19],recov:[9,70],recover:60,recoveri:13,recurr:[41,48,64,67,111,112],recurrent_group:[67,81,82,111,113,114],recurrent_op:70,recurrentgradientmachin:[46,49,70],recurrentlay:41,recurs:[6,7,8,28,40,60],recv:[18,21,23,51,95],recvparametertyp:12,red:33,reduc:[23,29,51,60,107],reduce_by_kei:60,reduce_mean:33,refactor:[21,23,34,49,53,54,58,62,70],refer:[7,9,13,14,15,18,29,35,43,47,48,51,56,58,60,62,64,68,70,71],referenc:13,refine_unknown_arg:97,reflect:13,reg:66,regard:[19,119],region:64,regist:[20,39,40,47,61,68],register_gpu_profil:108,register_lay:74,register_op:[32,60,61,66,75],register_op_cpu_kernel:[68,75],register_op_cuda_kernel:[68,75],register_op_without_gradi:[60,75],register_oper:[37,61],register_tim:12,register_timer_info:108,registerop:66,registr:109,registri:[17,37,68,96,119],regular:[6,81,91,95],reinit:19,reiniti:[19,43],rel:[5,16,62],relat:[9,16,17,29,39,47,52,55,64,119],relationship:[61,68],releas:[63,67,78,87,95,116,117,118],reli:[5,18,49,50,53,62],reliabl:[9,62],relu1:35,relu2:35,relu:[33,35,40],relwithdebinfo:107,remain:70,remot:[8,12,21,60,72,95,103,105],remoteparameterupdat:[12,15],remov:[6,21,28,30,49,72],removing_docker_contain:0,renam:[6,28,29],reorder:43,reorder_primit:43,repeat:[7,34,56,57,65,66,71],repeatedli:[34,40],replac:[8,13,37,53,61,67],replic:21,replicaset:17,repo:[8,118],report:[13,29,30,55],reportdataset:13,repositori:[77,116],repres:[6,7,13,18,21,23,24,30,35,44,47,49,52,53,58,60,62,65,68,70,71,95],represent:[14,21,31,33,34,40,47,49,65],request:[8,9,12,16,18,60,63,95,96,119],requir:[4,6,9,14,16,17,19,21,23,24,28,29,35,40,42,48,52,53,55,56,57,60,62,65,66,67,71,77,81,84,95,119],requisit:40,research:[21,30],reserv:[28,81,84],reserveoutput:74,reset:[9,25,79],reset_program:25,resetingrad:42,resetinvalu:42,resetoutgrad:42,resetoutvalu:42,resetxxx:42,reshap:[5,59,83],residu:20,resiz:[12,68,75,76],resolv:[8,72,96],resourc:[21,26,51,55,68,95],respect:[5,19,24,29,33,48],respons:[12,20,21,25,33,51,53,54,62,95,96],rest:[7,17,27,31,39,119],restart:[9,14,95,96,119],restartpolici:[95,96,97],restor:[5,53],restrict:[62,64,107],result:[5,6,13,20,25,33,34,35,40,44,49,50,51,54,84,95,108,109],resum:16,retran:95,retriev:[7,49,64],reuqest:63,reus:[7,16,49,59,60],rev:0,revamp:21,reveal:4,revers:[6,113,114],review:[18,72,96],reviews_electronics_5:96,rewrit:[8,20],rid:[19,30],right:[5,6,7,8,17,25,40,60,62,81,84],right_scor:81,ring:51,risk:6,rkt:[0,17],rmsprop:[53,81],rmspropoptim:53,rnn:[7,30,33,49,58,60,64,67,102,113,114],rnn_bias_attr:114,rnn_layer_attr:114,rnn_out:114,rnn_output:70,rnn_use_batch:[41,102,103],rnnalgorithm:49,rnnstep:70,roadmap:[67,70],rocmplac:47,role:[4,13,14,21,51,95],rollback:58,root:[6,51,95,96,97],roughli:67,round:[29,51],routin:[29,42,51],row:[12,20,89],row_offset:89,rowoffset:89,rows_:65,rpc:13,rpcserver:13,rpi:118,rpi_arm_neon:118,rpi_toolchain:118,rsize:95,rstrip:97,rtk:119,rule:[6,21,24,30,34,95],run:[0,1,4,5,6,7,8,9,17,18,19,20,21,22,23,25,29,30,31,32,33,34,35,39,40,43,47,48,50,51,52,53,55,56,57,58,60,63,64,65,67,68,69,72,76,77,78,93,95,96,97,99,100,101,107,108,116,119],run_test:0,runinitfunct:[97,108],runnabl:23,running_on_cloud:17,runserv:77,runtim:[7,18,20,21,37,48,60,71,78],runtime_table_:7,s_block:6,safe:17,sai:[19,31,34,36,40,59],said:30,same:[4,5,13,14,16,18,19,20,21,32,33,35,38,39,40,48,49,51,58,60,61,64,67,70,76,95],sampl:[25,33,58,66,89],sampler:33,satifi:40,satisfi:[8,43,65,95],save:[9,11,13,14,17,18,21,34,35,40,44,53,65,71,95,96],save_dir:[96,97,103,105],save_only_on:[102,103],save_parameter_to_tar:84,savetxt:83,saving_period:[97,102,103],saving_period_by_batch:[102,103,105],scalabl:60,scalar:[6,7,36,69,70],scale:[21,23,53,61,66,67,75],scaleop:75,scaleopmak:[60,75],scan:[6,13,40,60],scatter:[6,51],scenario:49,schdule:95,schedul:[13,17,23,95],scheme:[12,62],scienc:40,scope:[5,18,22,26,31,39,52,109],score:[49,81],score_diff:81,scorer:67,scp:98,script:[0,51,77,93,95,98,116],sdk:117,search:[9,64,114],second:[4,18,28,30,33,35,48,49,56,57,59,64,66,75],secret:95,section:[6,23,30,58,95],see:[4,6,9,18,20,23,29,30,58,67,81,83,84,95],seed:[83,103,108],seem:[8,20,29,30,67],seen:62,segment:[48,70],sel:20,select:[49,95],selected_generation_scor:49,selected_id:49,selected_row:[65,71],selected_rows_desc:65,selected_scor:49,selectedrow:[37,71],selector:96,self:[5,24,25,33,35,40,41,42,44,50,58,70,74,75],self_addr:18,semant:[4,49,63],semaphor:20,semat:4,send:[9,14,18,21,23,38,51,60,66,95],send_back_parameter_typ:12,sendbackparameterspars:12,sendbackparametertyp:12,sendparameterrequest:12,sendparameterrespons:12,sens:[53,62],sent:[4,14,18,21,60,66,71,96],sentanc:81,sentenc:[30,48,49,70,114],sentence_input:70,separ:[14,21,32,53,61,62],seper:[52,70],seq_len:70,seq_po:89,seq_pool:110,seq_pos_arrai:89,sequenc:[6,7,18,26,30,34,41,50,56,67,70,81,84,89,111,113],sequence_layer_group:111,sequence_nest_layer_group:111,sequence_recurr:83,sequence_start_posit:89,sequencegen:111,sequencetyp:84,sequenti:[7,18,20],seri:[19,111],serial:[7,13,44,52,54,60,71],serializ:[60,71],serv:[21,29,60,70,95],server:[0,4,8,12,15,16,21,31,51,60,79,91,92,93,97,103,119],serverless:9,servic:119,sess:[33,35,50],session:[35,50,57],set:[0,4,6,9,17,19,33,37,40,43,47,48,49,55,57,58,60,61,64,67,68,70,75,76,77,81,83,84,89,95,96,108],set_active_typ:74,set_attr:24,set_drop_r:74,set_float_el:5,set_input:24,set_output:24,set_shap:19,set_siz:74,set_typ:[24,74],setdatatyp:65,setdefault:75,setp:95,setq:0,settotalbyteslimit:83,setup:[21,53,63,74,75,119],seven:67,sever:[5,12,21,23,33,48,49,51,54,55,58,65,68,70,95],sexstant:119,sgd:[4,9,17,23,52,53,54,65,84,92,102,109],sgd_optim:109,shall:[6,8],shape:[5,6,7,19,21,33,36,47,48,56,58,60,65,67,68,84,109],shapes_:19,shard:[9,10,11,12,13,14,16,21,23,92,95],share:[8,19,33,46,54,58,60,62,67,68,70],shared_librari:8,shared_ptr:[43,45,46,64,68,76],shell:95,shoul:5,should:[4,5,6,7,14,17,20,21,24,25,29,31,32,33,37,38,39,43,47,48,49,50,53,54,55,56,59,60,61,62,65,66,67,69,70,71,75,77,95,113],should_be_fals:4,should_be_tru:4,show:[0,6,7,9,19,20,28,30,36,40,44,48,51,53,56,69,70,95],show_check_sparse_distribution_log:[102,103],show_layer_stat:[102,103],show_parameter_stats_period:[96,102,103,105],shown:[4,21,25,51,52,55,67,95],shrunk:24,shuf:81,shuffl:[19,21,81],shuffleread:19,sid:95,side:[21,25,40,54],sig:95,sigint:93,sigmod:66,sigmod_op:66,sigmod_output:66,sigmoid:[7,66,70],sign:[27,44,95],signatur:95,signific:67,similar:[7,18,20,21,23,26,30,39,49,53,55,59,60,62,67,68,70,95,119],similarli:[30,40],simpl:[18,23,29,31,34,35,40,48,53,56,62,64,66,67,70,97],simple_attent:114,simple_gru:114,simple_lstm:82,simple_rnn:114,simpler:54,simplest:95,simpli:[4,14,21],simplifi:[4,49,58,66,67],simul:[30,117],simultan:95,sinc:[9,13,15,16,20,21,22,23,30,37,40,43,47,53,58,59,61,62,70,95,119],singl:[6,9,19,21,23,25,29,38,51,52,60,64,67],singleton:[18,22],sit:21,site:[8,95,107],situat:[6,39,57],size:[9,11,12,14,20,21,29,33,40,44,49,52,53,58,59,65,66,67,68,70,74,75,76,81,83,84,86,89,109,114],size_in_byt:43,size_t:[12,19,68,70,74],sizeof:[7,89],skip:[6,59,72,83,95],sleep:97,slice:18,slide:9,slight:30,slightli:33,small:[5,18,31,33,42,49],small_messag:[102,103],smaller:[5,9,29,49],smart:64,snap:96,snapdragon:29,snapshot:[10,16,95],snippet:[20,32,50,95],sock:17,sock_recv_buf_s:[102,103],sock_send_buf_s:[102,103],socket:97,softmax:[4,7,21,23,30,35,36,49,52,56,74,81,114],softmax_grad:52,softmax_param:83,softmaxoutput:35,softwar:[29,55,81,84,119],solid:33,solut:[51,119],solv:[4,6,19,40,60],some:[4,6,7,8,12,13,14,16,17,19,20,21,23,24,29,31,32,33,34,38,39,40,43,47,48,49,50,56,57,58,59,60,61,64,68,70,95,119],some_c_api_funct:46,some_inst:46,some_op:[37,48,70],some_python_class:45,somecppclass:45,somegotyp:45,someth:[6,12,58],sometim:[55,59],somewhat:14,somewher:64,soon:9,sort:[70,95,97],sort_by_length:70,sortagrad:67,sourc:[5,8,28,30,33,42,44,46,49,59,60,95],source_dict_dim:[49,114],source_dict_s:49,source_language_word:[49,114],space:[23,29,58,62,67],span:55,spars:[12,20,74,81,89,91,95,97,103,105],sparse_binary_vector:[81,84,89],sparse_binary_vector_sequ:89,sparse_binary_vector_sub_sequ:89,sparse_float_vector:84,sparse_remot:12,sparse_upd:[12,81],sparse_vector:[81,89],sparse_vector_sequ:89,sparse_vector_sub_sequ:89,sparseparam:74,sparseprefetchrowcpumatrix:74,spec:[95,96,97],special:[6,14,21,29,31,37,47,49,50],specialvartypeinfer:37,specif:[6,8,9,19,21,24,28,31,49,60,64,68,81,84],specifi:[4,5,12,13,14,17,18,20,21,22,24,25,26,28,33,44,55,58,64,66,70,77,95],spectrogram:67,speech:67,speed:[29,44,51,53,119],speedup:55,sphinx:[45,77],split:[16,18,19,22,30,36,49,60,70,91,95],split_count:[91,95,97],spread:6,squar:35,square_error_cost:[84,109],squash:72,srand:[89,103],src:[8,43,78,91,93,97],src_backward:114,src_dict:83,src_dict_path:83,src_embed:[49,114],src_forward:114,src_primitive_desc:43,src_word_id:[49,114],src_word_vec:49,ssh:[95,98],ssh_server:93,sstabl:4,stabil:[5,40],stabl:[63,95],stack:[26,60,70,95],stage:[8,15,22,33,40,43,67,71,72],stale:9,stamp:78,standalon:116,standard:[20,30,60,62,67],stanford:[5,96],star:8,start:[6,8,9,12,13,14,16,17,20,21,22,49,51,54,55,72,78,89,96,97,103],start_mpi_train:98,start_op_idx:6,start_paddl:97,start_pass:[102,103],start_program:52,start_pserv:[102,103],startpaddl:97,startup:[9,17,30,95],stat:[103,108],state:[7,9,25,26,48,49,55,64,67,70,82,96,113],statem:40,statement:[20,30,34,40,95],statfulset:97,static_cast:[43,76],staticinput:[113,114],statist:[25,55],statset:108,statu:[17,49,72,95,96,97,108],status:96,std:[8,12,19,35,37,38,43,45,46,55,57,60,61,64,66,68,74,75,76,103],stdbuf:91,stderr:93,stdout:93,step:[5,7,9,14,20,21,23,25,30,33,34,41,49,52,53,54,58,60,66,67,70,95,111,113,114,119],step_gradi:6,step_id:70,step_input:70,step_net:7,step_output:70,step_scop:[60,71],stepnet:[7,48,60,64],still:[6,13,16,21,30,40,61],stirng:58,stmt1482205552000:95,stmt1482205746000:95,stochast:[9,13,16,53],stop:58,stop_gradi:58,storag:[27,29,95],store:[5,7,8,12,26,35,37,44,47,49,54,56,58,60,61,62,64,70,95],str:[6,17,70,97,105],straight:[56,59,65],straightforward:43,strategi:[9,58,103],stream:[21,43,55,68],stream_:68,strict:59,stride:[43,47,67],string:[6,7,13,28,35,38,44,55,56,57,58,60,61,64,65,66,71,74,75,95,103],strip:83,struct:[13,14,27,29,37,38,39,46,47,55,61,66,83],structur:[6,7,13,30,33,44,49,56,58,60,65,95],sts:95,stuff:72,style:[60,66],sub:[4,6,16,18,23,33,40,48,51,54,58],sub_block:6,sub_sequ:84,subclass:58,subcommand:28,subgraph:[23,33],submiss:21,submit:[43,60,95],subnet0:95,subnet:[4,95],subobjectpath:96,subseq:[110,113],subsequ:51,subsequenceinput:111,subtract:5,succ:40,succeed:[13,96],success:[14,95,96],successfulcr:96,sucess:40,sucessor:40,sudo:[0,95],suffer:5,suffix:[17,91],suggest:8,suit:119,suitabl:[65,68],sum:[6,7,10,22,37,58],sum_op:6,summar:[33,55],summari:55,sumopgradmak:61,sumpool:81,supercomput:40,suppli:65,support:[3,5,7,9,16,17,18,20,21,23,30,32,33,39,40,43,44,47,49,53,54,55,57,59,60,61,62,65,67,71,95,119],support_inplac:40,suppos:[8,18,32,65,89],suppress:28,sure:95,svs:66,swagger:27,swig:[0,15,45,46],switch_ord:82,switchop:7,sychron:51,symbol:[7,35,46],symbols_ready_:7,symbolt:[7,60],symlink:72,sync:[9,53,62],sync_with_cpp:107,syncflag:74,synchron:[9,13,20,43,51,55,95],syntax:[18,26,30,49,59],sysroot:116,system:[7,8,9,14,16,20,21,23,27,32,33,40,42,67,81],tabl:[7,18,30,37,44,65,71],tablelookup:65,tablelookupgrad:65,tablelookupop:65,tag:[1,63,72,78,101],tail:49,tainer_id:97,take:[4,6,7,8,9,16,18,19,20,21,24,26,29,31,33,34,36,37,39,40,43,47,53,56,57,58,59,60,61,68,70,95],taken:[24,35,40,47,70],talk:[14,31],tanh:[33,49,74,114],tar:[78,84,90,95],tarbal:95,target:[6,7,8,24,26,33,35,50,57,60],target_block:[6,24],target_dict_dim:114,target_dict_s:49,target_language_word:114,target_link_librari:8,target_word:49,targetinlink:111,task13:67,task14:67,task:[21,44,49,55,66,90],task_queu:13,taskentri:13,taskqueu:13,tbd:[15,43,67,111],tcp:[95,103],tear:108,technic:[6,9],techniqu:40,technolog:30,tee:96,tell:[9,13,14,49,66],templat:[32,43,66,68,75,76,96,97,119],tempor:67,temporari:[6,17,26,40,53,58],tempori:40,tensor:[5,8,18,20,22,23,29,30,31,33,35,37,38,43,44,47,48,49,52,65,70,71,75,109],tensor_arrai:18,tensor_array_read:70,tensor_array_s:70,tensor_array_stack:70,tensor_array_unstack:70,tensor_array_writ:70,tensor_data:44,tensor_in:39,tensor_s:5,tensor_test:8,tensor_to_check:5,tensorarrai:22,tensorarraydesc:70,tensordesc:[44,65],tensorflow:[7,18,20,21,23,30,33,36,62,70],term:[9,61,62,67],termin:96,terminolog:40,tessorarrai:70,test1:11,test:[4,5,8,35,46,53,59,63,72,74,75,76,86,89,91,98,103,105,108,109],test_:75,test_all_data_in_one_period:96,test_check_grad_ingore_i:75,test_check_grad_ingore_x:75,test_check_grad_norm:75,test_check_output:75,test_compar:78,test_comparespars:78,test_comparetwonet:78,test_comparetwoopt:78,test_config_pars:78,test_data_dir:91,test_fcgrad:74,test_gpuprofil:108,test_layergrad:74,test_list:83,test_mkldnn:42,test_mklpack:41,test_mul_op:75,test_networkcompar:78,test_pass:[102,103,105],test_period:[102,103,105],test_predict:78,test_pydataprovid:78,test_pydataprovider2:78,test_pydataproviderwrapp:78,test_recurrent_machine_gener:78,test_recurrentgradientmachin:[78,111],test_sum_op:0,test_swig_api:78,test_train:78,test_traineronepass:78,test_wait:[102,103],testa:4,testb:4,testbilinearfwdbwd:108,testconfig:74,testfcgrad:74,testfclay:74,testlayergrad:74,testmulop:75,testq:4,testutil:74,text1:28,text:[4,44,48,55,67,95],tflop:108,tftp:119,tgz:[3,78],than:[6,9,17,18,19,24,30,31,32,33,58,60,62,69,70,83,89,95,119],the_step:30,theano:30,thei:[4,6,8,9,14,16,18,19,20,23,24,28,30,33,34,38,40,49,50,55,58,60,66,70,95,108],them:[4,5,6,8,9,12,17,19,20,23,24,30,31,32,37,38,39,40,49,58,59,60,61,64,65,66,70,71,95,108],themselv:[6,8],theori:30,therefor:[6,40,53],therein:7,theta:33,theta_d:33,theta_g:33,thi:[3,4,5,6,7,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,29,30,31,32,33,34,35,38,39,40,43,47,48,49,50,51,52,53,54,55,56,58,59,60,61,62,65,66,67,68,70,81,84,89,95,108,109,119],thin:37,thing:[21,33,60,68],think:[4,8],third:[9,35,89],third_parti:[42,78,87,116,117,118],those:[7,8,9,32,34,35,36,56],though:[70,119],thought:8,thread:[18,20,22,52,55,108],thread_count:22,thread_id:55,thread_id_:55,thread_local_rand_use_global_se:[102,103],thread_pool:22,threadid:105,threadloc:108,threadpool:18,three:[5,6,9,20,25,29,30,31,34,43,49,50,54,55,56,59,67,68],threshold:[9,13,24,103],through:[6,8,9,13,15,25,40,50,53,77,109],throughout:26,throughput:[52,108],thrust:60,thu:[16,25,35,40,67,95],tier:96,time:[4,5,8,9,13,16,19,20,21,23,24,30,32,37,40,41,47,48,49,51,55,58,59,60,61,65,66,67,70,71,89,96,97,103,107,108,111,119],timelin:[55,60],timeo:95,timeout:[9,13],timer:108,timestamp:10,timestep:64,titan:40,titl:97,tls:27,tmp:58,to_no_sequ:110,to_sequ:[110,111],to_your_paddle_clone_path:77,todo:[7,9,13,16,49,66,67],togeth:[6,70],token:[4,67,114],toler:5,too:[5,18,20,24,39,43,70],took:119,tool:[55,77,95,97,116,118],toolchain:116,toolkit:67,top:[48,49,67],top_k:49,top_level_rnn:48,topic:43,topk_generated_scor:49,topk_id:49,topk_scor:49,toplevel:0,topolog:[4,9,21,35,40,44,54],topoloi:35,torch:[7,30],tostr:83,total:[9,23,25,51,55,59,96,108,119],total_pass:59,tottim:107,touch:78,toward:30,trace:[7,31,33],track:[9,13,35,58,72],tradit:[7,29,67],traffic:21,train:[1,6,7,11,13,14,16,18,19,24,25,26,30,31,33,34,40,41,44,51,53,54,55,56,57,58,60,62,65,67,68,71,79,81,84,91,93,96,97,98,99,100,103,105,114],train_arg:97,train_args_dict:97,train_args_list:97,train_config_dir:[95,97],train_data:91,train_data_dir:91,train_i:84,train_id:95,train_list:[83,91],train_loop:30,train_read:[21,84],train_x:84,trainabl:[44,58],trainer:[4,10,11,12,13,15,21,23,31,41,42,53,54,60,74,84,91,92,97,103,105],trainer_config:[90,95,96,97],trainer_config_help:74,trainer_count:[81,86,91,95,96,97,102,103,105],trainer_cpu:17,trainer_cr:17,trainer_gpu:17,trainer_id:[91,95,97,103],trainer_intern:12,trainer_mem:17,trainer_packag:17,trainer_prog:21,trainerconfighelp:83,trainerid:[16,97],trainingjob:21,trainonebatch:12,tran:[43,74],trans_var:39,transact:[9,13],transcript:67,transfer:[40,55],transform:[60,67],translat:[40,81],translation_id:49,translation_scor:49,transpar:49,transpil:18,travers:[6,34,40],travi:72,treat:[7,14,40],treatment:[14,29],tree:[7,18,26,30,58,97,109,118],trg_dic_siz:49,trg_embed:[49,114],trick:49,tricki:45,trigger:[16,19,54],trivial:[49,70],true_block:[7,36,56],true_imag:59,true_label:59,true_neg:25,true_posit:25,true_read:59,tune:[67,102,107],tupl:[6,58,59],turn:[58,59,113],tutori:[95,97,98,99,100],twice:[23,33],twine:63,two:[4,6,14,15,16,17,18,19,20,21,25,28,29,30,31,33,34,37,39,40,44,47,49,53,55,56,59,60,61,62,64,65,66,67,70,71,75,76,95,108],txt:[8,17,28,41,42,74,77,91,95,98],type:[4,6,7,9,12,13,16,17,19,21,27,28,29,31,37,38,39,43,44,45,46,48,49,56,57,58,59,60,61,62,65,66,67,68,71,74,75,76,84,86,89,95,96,105,114],type_nam:66,typedef:[14,29,45,46,47,68],typeerror:24,typeid:66,typenam:[32,66,68,75,76],typic:21,ubuntu:[3,63,86],ubyt:59,uci_h:86,uid:96,uint16_t:29,uint32:[27,44],uint32_t:55,uint64:[44,45],uint64_t:[45,89],unawar:14,unbias:5,unblock:20,unbound:40,unbuff:20,unclear:16,uncreat:6,under:[8,13,23,39,51,81,84,95],underli:[19,49],understand:[30,58,67,119],understand_senti:114,undeterminist:108,uni:67,unidirect:67,unifi:[19,26,35,65],uniform:[11,33,58,59],uniform_random:58,uniniti:6,uninstal:[0,78],uniqu:[4,7,9,16,17,43,47,58,64,95],unique_nam:58,unique_name_gener:58,unique_ptr:[61,64,68,74],unit:[8,53,55,62,68],unittest:[46,75,78],unix:20,unk:[65,71],unless:[81,84],unlik:49,unnecessari:[6,67],unordered_map:64,unpack:70,unrol:48,unseen:62,unsign:[14,29],unstack:70,unstack_from:70,unsupervis:33,unsupport:75,until:[9,14,20,22,23,30,40,64,95,97],untrack:72,unzip:116,updat:[6,9,13,14,21,27,29,33,48,49,50,51,53,54,64,67,70,72,105,107],update_equ:84,update_memori:7,update_op:50,updatecallback:74,updatestack:95,upgrad:[3,51,78],upload:[9,17,20,27,63],upon:9,upstream:[72,78],uri:95,usag:[29,36,40,54,58,69,93,97],use:[4,5,7,8,9,15,19,20,21,22,23,26,29,33,35,37,38,39,40,43,47,49,50,51,52,54,55,58,64,65,66,67,70,72,75,81,84,89,91,95,97,108],use_cpu:38,use_cudnn:38,use_eigen_bla:116,use_eigen_for_bla:[116,117],use_gpu:[52,81,84,86,91,96,97,102,103,105],use_mkl_pack:41,use_mkldnn:[38,42],use_old_updat:[12,102,103],use_sparse_remote_updat:12,used:[4,5,7,8,9,15,16,19,20,21,24,26,29,30,33,35,39,40,48,49,52,53,54,55,58,59,60,62,64,66,68,70,95,108],useful:[5,29,39,40,58,64],usegpu:[74,89],user:[4,5,6,7,8,11,13,16,17,18,21,22,23,24,25,26,28,32,33,34,35,37,38,39,43,47,49,50,51,53,55,58,59,60,61,62,64,66,68,70,95,119],user_id:97,user_nam:11,usercert:11,userkei:11,usernam:[11,72,116],uses:[5,9,16,18,20,21,29,39,40,47,48,49,54,55,68,71,95],using:[4,5,6,7,8,9,13,14,16,17,20,21,26,28,29,30,32,33,35,37,40,48,50,53,56,58,59,61,62,64,66,67,68,75,86,95],usr:[0,1,78,91,95,97,103],usual:[6,17,40,47,55,56,62,68,95,108],util:[21,41,42,51,90,97,108,119],uuid:[10,16],v7a:116,v8a:116,val:6,valgrind:107,valid:[59,60,64,95],valu:[5,6,7,9,18,20,24,25,35,36,40,42,44,48,49,50,53,54,56,60,64,65,66,69,70,74,81,89,95,97,105,116],value1:103,value2:103,value_:65,valueerror:[35,81],values_:70,vanilla:[52,114],var_nam:[6,39],var_recurs:24,vardesc:[7,34,56,58,60,65,71],vardescbuild:7,vari:95,variabl:[4,5,7,18,19,20,21,23,24,25,26,31,33,34,35,36,37,39,47,48,49,50,52,53,56,57,61,62,65,66,67,70,95,96,109],variablenamemap:75,varialbl:33,variant:[37,47,68,70],varibal:6,varibl:35,varienc:70,varient:70,variou:[7,20,29,40,62],varproto:66,vars_:[7,64],vartyp:65,vartypeinfer:37,vec2seq:67,vec:83,veclib:117,vector:[4,7,12,14,19,35,36,43,48,49,55,58,60,61,65,67,69,70,87,89],vendor:8,verbos:28,veri:[8,13,18,23,26,30,32,33,40,43,49,54,59,62,64,67,68,70],verifi:7,version:[6,8,17,21,24,28,31,33,35,36,44,49,63,67,72,81,84,95,101,102,103,108,117],versu:4,via:[6,9,47,72,95,119],view:[44,47],vim:1,virtual:[19,24,37,38,61,68],virtualenv:0,visibl:[16,64],visit:6,visual:49,vlog:12,vocabulari:67,volum:[77,96,97],volumemount:[95,96,97],volumn:95,w1_grad:52,w2_grad:52,wai:[4,6,14,16,20,26,30,38,40,49,53,58,59,62,70],wait:[9,14,20,22,97],wangkuiyi:8,want:[4,17,18,20,25,33,38,39,47,53,55,57,59,62,64,68,69,70],warn:[28,78,83,97],warp_ctc:67,warranti:[81,84],wast:51,watch:9,wbia:95,weight:[41,44,62,74],weightlist:74,weights_:74,weights_primitive_desc:43,weights_t:74,welcom:[8,67],well:[6,17,20,21,23,30,32,33,62,65,67,95],wer:67,were:[8,20,30],west:95,wget:[78,116],wgt:43,what:[8,19,30,33,39,49,58,66,72,119],wheel:3,when:[6,7,8,9,12,13,14,17,18,21,23,24,25,26,28,29,30,31,35,49,51,53,54,55,56,58,60,68,70,95,108,119],whenev:[58,67],where:[4,6,7,9,16,18,21,30,31,34,47,48,49,53,56,60,62,68,70,109],wherea:[7,13,32,36,68,69,71],whether:[5,6,7,19,26,55,59,65,70,71,89],which:[4,5,6,7,8,9,11,13,14,16,17,18,19,20,21,22,24,26,29,30,31,32,33,35,37,39,40,43,44,47,48,49,50,51,54,56,57,58,59,60,61,64,65,66,69,70,71,84,95,107,119],while_grad:40,while_loop:[49,70],while_op:6,whileloop:70,whileop:7,white:67,whl:[0,3],who:[6,32,34,51,58],whoever:14,whole:[6,19,33,36,40,45,46,48,51,57,66,67,87,95,119],wholli:19,whose:[5,6,9,16,24,48,60,61,66,70],why:[5,46,108],wide:[8,24,33,93,98],width:[12,45,59,74,75,83,89],wiki:8,window:[0,53,67],wirt:35,wise:[23,60,67],with_avx:[0,72,101,116,117],with_bia:66,with_c_api:[0,87,116,117,118],with_doc:0,with_doubl:[0,74,101],with_dso:0,with_golang:[0,87,116],with_gpu:[0,72,87,101,116,117],with_mkl:[0,41,42,87,116],with_mkldnn:42,with_mklml:42,with_profil:108,with_python:[0,87,101,116,117],with_rdma:[101,116,117],with_style_check:[0,72],with_swig_pi:[0,87,116,117],with_test:[0,72,75],with_tim:[101,108],within:[13,21,30,67],without:[6,9,14,19,20,55,58,59,60,67,81,84],wloop:70,wmt14:114,word2vec:[17,81,91,93],word:[6,23,34,37,40,48,49,60,66,67,70,81,113],word_dict:[91,98],word_dim:83,word_id:81,word_vector_dim:[49,114],wordcount:67,work:[1,4,7,8,9,21,26,29,30,38,50,53,55,58,72,77,95,96,97,119],worker:[23,71,95],workercount:95,workflow:[60,95],workspac:[91,93],worri:5,worth:109,would:[7,8,9,16,20,21,22,23,30,32,33,34,43,50,53,54,58,59,65,67,70,95,119],wouldn:[30,34],wrap:[19,30,32,33,51,119],wrapper:[8,19,20,32,51,53,61,70,108],write:[4,9,16,18,19,20,21,23,29,30,31,32,35,37,43,50,52,53,58,59,60,61,68,70,81,83,84,95],write_lock:10,write_output:52,write_to_arrai:40,writer:[4,58],written:[6,7,18,23,26,33,44,53,60,61,65],wrong:59,wrote:35,wsize:95,www:[81,84],x64:118,x86:[116,117],x86_64:[116,117],x_neg:5,x_po:5,xarg:[1,74,78,98],xcode:117,xcodebuild:117,xgbe0:103,xgbe1:103,xpu:30,xrang:[5,30,33,55,59,74,84,86],xx_layer:38,xxx:[4,70],xxxx:10,xxxxxxxxx:95,xxxxxxxxxx:95,xxxxxxxxxxxxx:95,xxxxxxxxxxxxxxxxxxx:95,y_dim:33,y_neg:5,y_po:5,y_predict:[84,86,109],yaml:[8,93,95,96,97,98,119],yancey1989:17,yapf:72,year:30,yep:[55,107],yet:[30,67,119],yield:[4,11,19,59,81,84],you:[5,17,21,29,64,81,84,95,119],your:[4,8,12,17,28,60,78,95,116,117,118,119],your_access_key_id:95,your_param_nam:83,your_repo:97,your_secrete_access_kei:95,your_source_root:46,yuang:30,yuyang:107,z_dim:33,z_size:33,zaist:0,zero:[5,6,9,20,33,49,54,58,65,95,103],zhihu:0,zhuanlan:0,zip:[58,97,116],zlib:87,zone:95,zxf:78,zxvf:95},titles:["\u4ece\u6e90\u7801\u7f16\u8bd1","\u4f7f\u7528Docker\u5b89\u88c5\u8fd0\u884c","\u5b89\u88c5\u4e0e\u7f16\u8bd1","\u4f7f\u7528pip\u5b89\u88c5","PaddlePaddle Design Doc","Auto Gradient Check Design","Backward Building","Design Doc: Block and Scope","Required CMake Function","Design Doc: Distributed Training","\u6a21\u578b\u53c2\u6570\u68c0\u67e5\u70b9\uff08Checkpointing\uff09","\u8bad\u7ec3\u6570\u636e\u7684\u5b58\u50a8\u548c\u5206\u53d1","Alalysis of large model distributed training in Paddle","Design Doc: Master Server","Design Doc: The Client Library of Parameter Server","Design Doc: Remote Parameter Updater for Cluster Train","Design Doc: Save Model","Submit a Distributed Training Job","Design Doc: Concurrent Programming with Fluid","C++ Data Feeding","Design Doc: CSP in PaddlePaddle Fluid","Design Doc: Distributed Training Architecture","Design Doc: Execute the Program with Multi CPU","Design Doc: Parameter Server","Error Clip","Evaluator Design","Executor Design Doc","FileManager\u8bbe\u8ba1\u6587\u6863","PFSClient","Design Doc: float16","Design Doc: PaddlePaddle Fluid","PaddlePaddle Fluid: Towards a Compiled Programming Language","Design Doc: Functions, Operators, and Layers","Design for GAN","Design Doc: Computations as a Graph","Survey on Graph","The IfElse Operator","Design Doc: InferVarType","Problem","Background","Memory Optimization","Intel\u00ae MKL Packed on PaddlePaddle: Design Doc","Intel\u00ae MKL-DNN on PaddlePaddle: Design Doc","Design Doc: Add MKLDNN Kernel in Fluid Operator","Design Doc: Model Format","Paddle\u591a\u8bed\u8a00\u63a5\u53e3\u5b9e\u73b0","C-API \u6a21\u578b\u63a8\u65ad\u5b9e\u73b0\u6587\u6863","Design Doc: The Keys of Operator Kernel Type","RNNOp design","Design: Sequence Decoder Generating LoDTensors","Optimizer Design","Design Doc: NCCL support in Paddle Fluid","Design Doc: Parallel_Do in PaddlePaddle","Averaging Parameter in PaddlePaddle","Design Doc: The C++ Class Parameters","Introduction","Design Doc: PaddlePaddle Programs","Prune","Design Doc: Python API","Python Data Reader Design Doc","Design Doc: Refactorization Overview","Design Doc: Gradient Operators Registration","Regularization in PaddlePaddle","PaddlePaddle\u53d1\u884c\u89c4\u8303","Design of Scope in Paddle","Design Doc: Selected Rows","Interaction between C++ and Python","DeepSpeech2 on PaddlePaddle: Design Doc","Design Doc: Supporting new Device/Library","Design Doc: Switch","Design for TensorArray","Background","\u5982\u4f55\u8d21\u732e\u4ee3\u7801","\u5f00\u53d1\u6807\u51c6","\u5b9e\u73b0\u65b0\u7684\u7f51\u7edc\u5c42","\u5982\u4f55\u5199\u65b0\u7684Operator","\u5728Paddle\u4e2d\u5982\u4f55\u4f7f\u7528Eigen","\u5982\u4f55\u8d21\u732e\u6587\u6863","\u7f16\u8bd1\u5b89\u88c5\u4e0e\u5355\u5143\u6d4b\u8bd5","\u96c6\u7fa4\u8bad\u7ec3\u4e0e\u9884\u6d4b","FAQ","\u672c\u5730\u8bad\u7ec3\u4e0e\u9884\u6d4b","\u6a21\u578b\u914d\u7f6e","\u53c2\u6570\u8bbe\u7f6e","\u57fa\u672c\u4f7f\u7528\u6982\u5ff5","\u65b0\u624b\u5165\u95e8","\u5feb\u901f\u5f00\u59cb","\u5b89\u88c5\u4e0e\u7f16\u8bd1C-API\u9884\u6d4b\u5e93","C-API\u9884\u6d4b\u5e93","\u8f93\u5165/\u8f93\u51fa\u6570\u636e\u7ec4\u7ec7","C-API\u4f7f\u7528\u6d41\u7a0b","\u542f\u52a8\u53c2\u6570\u8bf4\u660e","\u5206\u5e03\u5f0f\u8bad\u7ec3","\u4f7f\u7528fabric\u542f\u52a8\u96c6\u7fa4\u8bad\u7ec3","\u5728\u4e0d\u540c\u96c6\u7fa4\u4e2d\u8fd0\u884c","Kubernetes on AWS","Kubernetes\u5355\u673a\u8bad\u7ec3","Kubernetes\u5206\u5e03\u5f0f\u8bad\u7ec3","\u5728OpenMPI\u96c6\u7fa4\u4e2d\u542f\u52a8\u8bad\u7ec3","<no title>","<no title>","\u73af\u5883\u51c6\u5907","\u53c2\u6570\u6982\u8ff0","\u7ec6\u8282\u63cf\u8ff0","\u547d\u4ee4\u884c\u53c2\u6570\u8bbe\u7f6e","\u4f7f\u7528\u6848\u4f8b","\u8fdb\u9636\u4f7f\u7528","Python\u4ee3\u7801\u7684\u6027\u80fd\u5206\u6790","GPU\u6027\u80fd\u8c03\u4f18","PaddlePaddle Fluid Source Code Overview","\u652f\u6301\u53cc\u5c42\u5e8f\u5217\u4f5c\u4e3a\u8f93\u5165\u7684Layer","\u5355\u53cc\u5c42RNN API\u5bf9\u6bd4\u4ecb\u7ecd","RNN\u6a21\u578b","Recurrent Group\u6559\u7a0b","RNN\u914d\u7f6e","PaddlePaddle \u6587\u6863","Android\u5e73\u53f0\u7f16\u8bd1\u6307\u5357","iOS\u5e73\u53f0\u7f16\u8bd1\u6307\u5357","Raspberry Pi\u5e73\u53f0\u7f16\u8bd1\u6307\u5357","Cluster bootstrapping tool survey"],titleterms:{"\u4e00\u4e9b\u7ec6\u8282\u7684\u8865\u5145":97,"\u4e0a\u4f20\u8bad\u7ec3\u6587\u4ef6":11,"\u4e0b\u8f7d\u6570\u636e":96,"\u4e0b\u8f7dmklml\u5e93\u5931\u8d25":78,"\u4e0d\u4f7f\u7528":45,"\u4e0d\u4f7f\u7528swig\u8fd9\u79cd\u4ee3\u7801\u751f\u6210\u5668":45,"\u4e0d\u540c\u7684":82,"\u4e0d\u5bfc\u51fapaddle\u5185\u90e8\u7684\u7ed3\u6784\u4f53":45,"\u4e0d\u5f15\u7528\u5176\u4ed6\u52a8\u6001\u5e93":45,"\u4e13\u6ce8\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5f00\u53d1":2,"\u4e24\u79cd\u4f7f\u7528":82,"\u4e3a\u4ec0\u4e48\u9700\u8981\u6027\u80fd\u5206\u6790":108,"\u4ec0\u4e48\u662f\u6027\u80fd\u5206\u6790":108,"\u4ec5\u4ec5\u4f7f\u7528void":45,"\u4ece\u5feb\u7167\u6062\u590d":10,"\u4ece\u6e90\u7801\u7f16\u8bd1":0,"\u4ee3\u7801\u8981\u6c42":72,"\u4f7f\u7528":[72,96],"\u4f7f\u7528\u52a8\u6001\u5e93\u6765\u5206\u53d1paddl":45,"\u4f7f\u7528\u6848\u4f8b":105,"\u4f7f\u7528\u6a21\u578b\u521d\u59cb\u5316\u7f51\u7edc":105,"\u4f7f\u7528\u6d41\u7a0b":90,"\u4f7f\u7528\u73af\u5883\u53d8\u91cf":97,"\u4f7f\u7528\u8f6c\u6362\u5e93":11,"\u4f7f\u7528docker\u542f\u52a8paddlepaddl":1,"\u4f7f\u7528docker\u5b89\u88c5\u8fd0\u884c":1,"\u4f7f\u7528docker\u6267\u884cgpu\u8bad\u7ec3":1,"\u4f7f\u7528docker\u6784\u5efa":77,"\u4f7f\u7528fabric\u542f\u52a8\u96c6\u7fa4\u8bad\u7ec3":93,"\u4f7f\u7528paddlepaddl":77,"\u4f7f\u7528pip\u5b89\u88c5":3,"\u4fdd\u6301\u672c\u5730\u4ed3\u5e93\u6700\u65b0":72,"\u4fee\u6539\u542f\u52a8\u811a\u672c":96,"\u514b\u9686":72,"\u5173\u6ce8\u5e95\u5c42\u6846\u67b6":2,"\u5177\u4f53\u67d0\u79cd\u7c7b\u578b\u7684\u5934\u6587\u4ef6":46,"\u5177\u4f53\u67d0\u79cd\u7c7b\u578b\u7684\u5b9e\u73b0\u6587\u4ef6":46,"\u5185\u7f6e\u5b9a\u65f6\u5668":108,"\u5199\u68af\u5ea6\u68c0\u67e5\u5355\u5143\u6d4b\u8bd5":74,"\u51c6\u5907\u4e00\u4e2alinux\u96c6\u7fa4":93,"\u51c6\u5907\u4ea4\u53c9\u7f16\u8bd1\u73af\u5883":[116,117],"\u51c6\u5907\u6570\u636e\u96c6":91,"\u51c6\u5907\u8bad\u7ec3\u6570\u636e":97,"\u51c6\u5907\u8bad\u7ec3\u7a0b\u5e8f":91,"\u51c6\u5907\u9884\u6d4b\u6a21\u578b":90,"\u51c6\u5907openmpi\u96c6\u7fa4":98,"\u51cf\u5c11\u6570\u636e\u8f7d\u5165\u7684\u8017\u65f6":81,"\u51cf\u5c11dataprovider\u7f13\u51b2\u6c60\u5185\u5b58":81,"\u51fa\u73b0":82,"\u5206\u5757\u6587\u4ef6\u4f20\u8f93":27,"\u5206\u5e03\u5f0f\u8bad\u7ec3":92,"\u5206\u652f\u89c4\u8303":63,"\u521b\u5efa\u672c\u5730\u5206\u652f":72,"\u521b\u5efa\u795e\u7ecf\u7f51\u7edc\u8f93\u5165":90,"\u521b\u5efajob":97,"\u521b\u5efapaddlepaddl":96,"\u521d\u59cb\u5316paddlepaddle\u8fd0\u884c\u73af\u5883":90,"\u5220\u9664\u672c\u5730\u5206\u652f":72,"\u5220\u9664\u8fdc\u7a0b\u5206\u652f":72,"\u5229\u7528\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90":81,"\u5230\u8fdc\u7a0b\u4ed3\u5e93":72,"\u5236\u4f5c\u955c\u50cf":97,"\u5236\u4f5cdocker\u955c\u50cf":96,"\u524d\u5411\u8ba1\u7b97":90,"\u524d\u5411operator\u5355\u6d4b":75,"\u52a0\u8f7d\u6a21\u578b":90,"\u52a0\u8f7dpaddlepaddl":84,"\u52a0\u901f\u6267\u884c":10,"\u52a0\u901f\u8bad\u7ec3\u901f\u5ea6":81,"\u52a8\u6001\u5e93\u4e2d\u4e0d\u5d4c\u5165\u4efb\u4f55\u5176\u4ed6\u8bed\u8a00\u7684\u89e3\u91ca\u5668":45,"\u52a8\u6001\u6269\u5bb9":10,"\u5355\u5143\u6d4b\u8bd5":103,"\u5355\u53cc\u5c42rnn":111,"\u539f\u56e0":45,"\u539f\u56e0\u5217\u8868":45,"\u53c2\u6570\u5185\u5b58":81,"\u53c2\u6570\u670d\u52a1\u5668\u548c\u5206\u5e03\u5f0f\u901a\u4fe1":103,"\u53c2\u6570\u6982\u8ff0":102,"\u53c2\u6570\u8bbe\u7f6e":83,"\u53c2\u8003\u6587\u6863":27,"\u53c2\u8003\u8d44\u6599":108,"\u53cc\u5c42rnn":111,"\u53cc\u5c42rnn\u4ecb\u7ecd":113,"\u53cc\u5c42rnn\u7684\u4f7f\u7528":113,"\u53cd\u5411operator\u5355\u6d4b":75,"\u53d1\u5e03docker\u955c\u50cf":63,"\u53d1\u5e03wheel\u5305\u5230pypi":63,"\u5404\u4e2a\u7248\u672c\u6700\u65b0\u7684whl\u5305":3,"\u540d\u8bcd\u89e3\u91ca":27,"\u5411\u91cf":103,"\u542f\u52a8\u4efb\u52a1":97,"\u542f\u52a8\u53c2\u6570\u670d\u52a1\u5668":91,"\u542f\u52a8\u53c2\u6570\u8bf4\u660e":91,"\u542f\u52a8\u8ba1\u7b97\u8282\u70b9":91,"\u542f\u52a8\u96c6\u7fa4\u4f5c\u4e1a":[93,98],"\u547d\u4ee4\u884c\u53c2\u6570\u8bbe\u7f6e":104,"\u548c":110,"\u5728\u4e0d\u540c\u8bbe\u5907\u4e0a\u6307\u5b9a\u5c42":105,"\u5728\u4e0d\u540c\u96c6\u7fa4\u4e2d\u8fd0\u884c":94,"\u5728docker\u4e2d\u6267\u884cpaddlepaddle\u8bad\u7ec3\u7a0b\u5e8f":1,"\u5728openmpi\u96c6\u7fa4\u4e2d\u542f\u52a8\u8bad\u7ec3":98,"\u5728paddle\u4e2d\u5982\u4f55\u4f7f\u7528eigen":76,"\u57fa\u4e8edocker\u5bb9\u5668\u7684\u7f16\u8bd1\u65b9\u5f0f":116,"\u57fa\u4e8elinux\u4ea4\u53c9\u7f16\u8bd1\u73af\u5883\u7684\u7f16\u8bd1\u65b9\u5f0f":116,"\u57fa\u672c\u4f7f\u7528\u6982\u5ff5":[84,89],"\u57fa\u672c\u539f\u7406":113,"\u57fa\u672c\u8981\u6c42":45,"\u5982\u4f55\u4e66\u5199\u6587\u6863":77,"\u5982\u4f55\u4f7f\u7528":82,"\u5982\u4f55\u5171\u4eab\u53c2\u6570":83,"\u5982\u4f55\u5199\u65b0\u7684oper":75,"\u5982\u4f55\u51cf\u5c11\u5185\u5b58\u5360\u7528":81,"\u5982\u4f55\u521d\u59cb\u5316\u53c2\u6570":83,"\u5982\u4f55\u52a0\u8f7d\u9884\u8bad\u7ec3\u53c2\u6570":83,"\u5982\u4f55\u52a0\u901f\u8bad\u7ec3\u901f\u5ea6":81,"\u5982\u4f55\u548c\u660e\u6587\u8fdb\u884c\u76f8\u4e92\u8f6c\u5316":83,"\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u53c2\u6570\u7684\u6743\u91cd\u548c\u68af\u5ea6":81,"\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u67d0\u4e00\u4e2alayer\u7684output":81,"\u5982\u4f55\u6307\u5b9agpu\u8bbe\u5907":81,"\u5982\u4f55\u66f4\u65b0www":77,"\u5982\u4f55\u6784\u5efa\u6587\u6863":77,"\u5982\u4f55\u8bbe\u7f6e\u5b66\u4e60\u7387\u9000\u706b":83,"\u5982\u4f55\u8c03\u7528":81,"\u5982\u4f55\u8d21\u732e\u4ee3\u7801":72,"\u5982\u4f55\u8d21\u732e\u6587\u6863":77,"\u5982\u4f55\u8fdb\u884c\u6027\u80fd\u5206\u6790":108,"\u5982\u4f55\u9009\u62e9sgd\u7b97\u6cd5\u7684\u5b66\u4e60\u7387":83,"\u5b50\u5e8f\u5217\u95f4\u65e0memori":111,"\u5b50\u5e8f\u5217\u95f4\u6709memori":111,"\u5b58\u50a8\u7684\u53c2\u6570\u683c\u5f0f\u662f\u4ec0\u4e48":83,"\u5b89\u88c5":3,"\u5b89\u88c5\u4e0e\u7f16\u8bd1":2,"\u5b89\u88c5\u4e0e\u7f16\u8bd1c":87,"\u5b89\u88c5\u4ea4\u53c9\u7f16\u8bd1\u5668":118,"\u5b9a\u4e49operator\u7c7b":75,"\u5b9a\u4e49opkernel\u7c7b":75,"\u5b9a\u4e49protomaker\u7c7b":75,"\u5b9e\u73b0":45,"\u5b9e\u73b0\u5355\u5143\u6d4b\u8bd5":75,"\u5b9e\u73b0\u65b0\u7684\u7f51\u7edc\u5c42":74,"\u5b9e\u73b0\u65b9\u5f0f":46,"\u5b9e\u73b0\u8ba1\u7b97":76,"\u5b9e\u73b0c":[74,75],"\u5b9e\u73b0python\u5c01\u88c5":74,"\u5bfb\u627e\u6027\u80fd\u74f6\u9888":107,"\u5bfc\u51fac":45,"\u5c06\u547d\u4ee4\u53c2\u6570\u4f20\u7ed9\u7f51\u7edc\u914d\u7f6e":105,"\u5de5\u5177":108,"\u5e38\u89c1\u95ee\u9898":0,"\u5e38\u89c1\u95ee\u9898\u548c\u89e3\u51b3\u65b9\u6cd5":3,"\u5e38\u89c1\u95ee\u9898\u6c47\u603b":2,"\u5e76\u5b8c\u6210":72,"\u5efa\u7acb":72,"\u5f00\u53d1\u6807\u51c6":73,"\u5f00\u59cb\u5f00\u53d1":72,"\u5f02\u6b65":91,"\u5f02\u6b65\u968f\u673a\u68af\u5ea6\u4e0b\u964d":103,"\u5feb\u7167\u4fdd\u5b58\u7684\u8bbe\u8ba1\u5982\u4e0b":10,"\u5feb\u901f\u4f7f\u7528":86,"\u5feb\u901f\u5b89\u88c5":86,"\u5feb\u901f\u5f00\u59cb":86,"\u6027\u80fd\u5206\u6790\u5c0f\u6280\u5de7":108,"\u6027\u80fd\u5206\u6790\u5de5\u5177\u4ecb\u7ecd":108,"\u6027\u80fd\u8c03\u4f18":103,"\u603b\u7ed3":89,"\u6216\u8005\u662f":78,"\u6267\u884c\u5355\u5143\u6d4b\u8bd5":0,"\u627e\u5230\u7684pythonlibs\u548cpythoninterp\u7248\u672c\u4e0d\u4e00\u81f4":78,"\u62a5importerror":78,"\u6307\u9488\u4f5c\u4e3a\u7c7b\u578b\u7684\u53e5\u67c4":45,"\u63a5\u53e3\u8f93\u51fa\u591a\u4e2alayer\u7684\u9884\u6d4b\u7ed3\u679c":81,"\u63a8\u5bfc\u65b9\u7a0b":74,"\u63a8\u6d4b\u6267\u884c":10,"\u63d0\u4ea4":72,"\u63d0\u4ea4\u4ee3\u7801\u7684\u4e00\u4e9b\u7ea6\u5b9a":72,"\u63d0\u4ea4\u955c\u50cf":96,"\u642d\u5efa\u795e\u7ecf\u7f51\u7edc":84,"\u652f\u6301\u53cc\u5c42\u5e8f\u5217\u4f5c\u4e3a\u8f93\u5165\u7684layer":110,"\u652f\u6301\u7528\u6237\u81ea\u5b9a\u4e49\u7684\u6570\u636e\u9884\u5904\u7406job":11,"\u6570\u636e\u652f\u6301":103,"\u6574\u4f53\u65b9\u6848":97,"\u6587\u4ef6\u4f20\u8f93\u4f18\u5316":27,"\u6587\u4ef6\u8bbf\u95ee\u65b9\u5f0f":11,"\u6587\u4ef6\u8bbf\u95ee\u7684\u6743\u9650":11,"\u6587\u4ef6\u9884\u5904\u7406":11,"\u6587\u6863":115,"\u65b0\u624b\u5165\u95e8":85,"\u65e5\u5fd7\u4e2d\u4fdd\u5b58\u5747\u4e3a\u7f51\u7edc\u901a\u4fe1\u7c7b\u9519\u8bef":79,"\u65f6\u95f4\u5e8f\u5217":111,"\u65f6\u95f4\u6b65":111,"\u66b4\u9732\u63a5\u53e3\u539f\u5219":46,"\u66f4\u65b0":91,"\u672c\u5730\u6d4b\u8bd5":105,"\u672c\u5730\u8bad\u7ec3":105,"\u672c\u5730\u8bad\u7ec3\u4e0e\u9884\u6d4b":81,"\u672f\u8bed":10,"\u6784\u5efa\u548c\u6d4b\u8bd5":72,"\u6784\u5efapaddlepaddle\u7684android\u5f00\u53d1\u955c\u50cf":116,"\u67b6\u6784\u56fe":27,"\u67e5\u770b\u6027\u80fd\u5206\u6790\u6587\u4ef6":107,"\u67e5\u770b\u8bad\u7ec3\u7ed3\u679c":96,"\u67e5\u770b\u8f93\u51fa":97,"\u6846\u67b6\u751f\u6210":27,"\u6848\u4f8b\u4e00":105,"\u6848\u4f8b\u4e8c":105,"\u68c0\u67e5\u6a21\u578b\u8f93\u51fa":93,"\u68c0\u67e5\u96c6\u7fa4\u8bad\u7ec3\u7ed3\u679c":93,"\u6982\u5ff5\u7b80\u4ecb":75,"\u6982\u5ff5\u89e3\u91ca":11,"\u6982\u8ff0":[87,110,113],"\u6a21\u5757":27,"\u6a21\u578b\u53c2\u6570\u68c0\u67e5\u70b9":10,"\u6a21\u578b\u63a8\u65ad\u5b9e\u73b0\u6587\u6863":46,"\u6a21\u578b\u914d\u7f6e":[82,111],"\u6a21\u578b\u914d\u7f6e\u7684\u6a21\u578b\u914d\u7f6e":111,"\u6ce8\u518coper":75,"\u6ce8\u610f\u4e8b\u9879":[75,90],"\u6d41\u7a0b\u4ecb\u7ecd":11,"\u6d4b\u8bd5":103,"\u6df7\u5408\u4ee3\u7801\u7684\u6027\u80fd\u5206\u6790":107,"\u6e05\u7406":90,"\u73af\u5883\u51c6\u5907":101,"\u751f\u6210\u5e8f\u5217":114,"\u751f\u6210\u6027\u80fd\u5206\u6790\u6587\u4ef6":107,"\u751f\u6210\u6d41\u7a0b\u7684\u4f7f\u7528\u65b9\u6cd5":113,"\u751f\u6210sparse\u6587\u4ef6":27,"\u7528\u6237\u4f7f\u7528\u6d41\u7a0b":27,"\u7684\u533a\u522b":82,"\u7684\u53c2\u6570":82,"\u7684\u65b9\u6cd5\u6709\u4f55\u533a\u522b":82,"\u76ee\u5f55\u7ed3\u6784":46,"\u76ee\u6807":27,"\u76f4\u63a5\u6784\u5efa":77,"\u76f8\u5173\u6982\u5ff5":113,"\u77e9\u9635":103,"\u793a\u4f8b1":111,"\u793a\u4f8b2":111,"\u793a\u4f8b3":111,"\u793a\u4f8b4":111,"\u793a\u4f8b\u7a0b\u5e8f":11,"\u795e\u7ecf\u5143\u6fc0\u6d3b\u5185\u5b58":81,"\u7a00\u758f\u8bad\u7ec3":105,"\u7aef\u6570\u636e\u7c7b\u578b\u8bf4\u660e":89,"\u7b26\u53f7":45,"\u7b80\u5355\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":114,"\u7c7b":[45,74,75],"\u7ebf\u6027\u56de\u5f52\u5b8c\u6574\u793a\u4f8b":84,"\u7ec4\u7ec7\u5e8f\u5217\u4fe1\u606f":89,"\u7ec4\u7ec7\u8f93\u5165\u6570\u636e":[89,90],"\u7ec6\u8282\u63cf\u8ff0":103,"\u7ec8\u6b62\u96c6\u7fa4\u4f5c\u4e1a":93,"\u7ed1\u5b9apython":75,"\u7f16\u5199\u9884\u6d4b\u4ee3\u7801":90,"\u7f16\u5199yaml\u6587\u4ef6":96,"\u7f16\u8bd1":75,"\u7f16\u8bd1\u4f9d\u8d56":0,"\u7f16\u8bd1\u548c\u5b89\u88c5":[116,117,118],"\u7f16\u8bd1\u548c\u6267\u884c":75,"\u7f16\u8bd1\u5b89\u88c5\u4e0e\u5355\u5143\u6d4b\u8bd5":78,"\u7f16\u8bd1\u5b89\u88c5\u540e\u6267\u884c":78,"\u7f16\u8bd1\u65b9\u6cd5":0,"\u7f16\u8bd1\u9009\u9879":[0,46],"\u7f16\u8bd1\u9009\u9879\u7684\u8bbe\u7f6e":0,"\u7f16\u8bd1\u9009\u9879\u8bf4\u660e":0,"\u7f16\u8bd1paddlepaddl":116,"\u7f29\u5bb9":10,"\u800c\u662f\u624b\u5199\u591a\u8bed\u8a00\u7ed1\u5b9a":45,"\u80cc\u666f":45,"\u81ea\u7136\u8bed\u8a00\u5904\u7406":103,"\u83b7\u53d6paddlepaddle\u7684docker\u955c\u50cf":1,"\u8986\u76d6\u4e0d\u4e00\u81f4\u7684\u90e8\u5206":27,"\u8bad\u7ec3":103,"\u8bad\u7ec3\u56e0\u6b64\u9000\u51fa\u600e\u4e48\u529e":81,"\u8bad\u7ec3\u6570\u636e\u5b58\u50a8":11,"\u8bad\u7ec3\u6570\u636e\u7684\u5b58\u50a8\u548c\u5206\u53d1":11,"\u8bad\u7ec3\u6a21\u578b":84,"\u8bad\u7ec3\u6d41\u7a0b\u7684\u4f7f\u7528\u65b9\u6cd5":113,"\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u51fa\u73b0":81,"\u8bcd\u6c47\u8868":111,"\u8be6\u7ec6\u6559\u7a0b":108,"\u8bfb\u53d6\u53cc\u5c42\u5e8f\u5217\u6570\u636e":111,"\u8f6c\u6362\u5e93":11,"\u8f93\u5165":[89,113],"\u8f93\u5165\u4e0d\u7b49\u957f":111,"\u8f93\u5165\u793a\u4f8b":113,"\u8f93\u51fa":113,"\u8f93\u51fa\u6570\u636e":89,"\u8f93\u51fa\u6570\u636e\u7c7b\u578b":89,"\u8f93\u51fa\u6570\u636e\u7ec4\u7ec7":89,"\u8fd0\u884c\u5bb9\u5668":96,"\u8fd0\u884c\u73af\u5883\u4f9d\u8d56":3,"\u8fd0\u884cdocker":78,"\u8fd9\u4e2a\u52a8\u6001\u5e93\u4f7f\u7528c99\u6807\u51c6\u7684\u5934\u6587\u4ef6\u5bfc\u51fa\u4e00\u4e9b\u51fd\u6570":45,"\u8fdb\u884c\u8bad\u7ec3":[11,96],"\u8fdb\u9636\u4f7f\u7528":106,"\u901a\u7528":103,"\u9047\u5230":78,"\u914d\u7f6e\u4ea4\u53c9\u7f16\u8bd1\u53c2\u6570":[116,117,118],"\u914d\u7f6e\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u67b6\u6784":114,"\u914d\u7f6e\u7f51\u7edc":84,"\u94a9\u5b50":72,"\u94fe\u63a5\u8bf4\u660e":87,"\u9519\u8bef\u600e\u4e48\u529e":82,"\u9644\u5f55":0,"\u968f\u673a\u6570":103,"\u96c6\u7fa4\u591a\u8282\u70b9\u8bad\u7ec3":79,"\u96c6\u7fa4\u8bad\u7ec3":105,"\u96c6\u7fa4\u8bad\u7ec3\u4e0e\u9884\u6d4b":79,"\u9700\u8981\u7684\u8f6f\u786c\u4ef6":0,"\u975e\u6cd5\u6307\u4ee4":78,"abstract":[21,22,23,51,119],"android\u5e73\u53f0\u7f16\u8bd1\u6307\u5357":116,"api\u4f7f\u7528\u6d41\u7a0b":90,"api\u5bf9\u6bd4\u4ecb\u7ecd":111,"api\u5e93":116,"api\u9884\u6d4b\u5e93":[87,88],"beam_search\u7684\u751f\u6210":111,"book\u4e2d\u6240\u6709\u7ae0\u8282":63,"book\u6559\u7a0b":1,"case":6,"class":[33,54,58],"cmake\u6e90\u7801\u7f16\u8bd1":78,"filemanager\u8bbe\u8ba1\u6587\u6863":27,"final":38,"float":81,"function":[8,32,33,58],"gpu\u548ccpu\u6df7\u5408\u4f7f\u7528":105,"gpu\u6027\u80fd\u8c03\u4f18":108,"gpu\u955c\u50cf\u51fa\u73b0":78,"group\u6559\u7a0b":113,"import":78,"ios\u5e73\u53f0\u7f16\u8bd1\u6307\u5357":117,"kubernetes\u5206\u5e03\u5f0f\u8bad\u7ec3":97,"kubernetes\u5355\u673a\u8bad\u7ec3":96,"new":68,"org\u5de5\u5177":77,"paddle\u52a8\u6001\u5e93\u4e2d":45,"paddle\u591a\u8bed\u8a00\u63a5\u53e3\u5b9e\u73b0":45,"paddle\u7248\u672c\u53f7\u4e3a0":78,"paddlepaddle\u53d1\u884c\u89c4\u8303":63,"paddlepaddle\u56de\u5f52\u6d4b\u8bd5\u5217\u8868":63,"paddlepaddle\u662f\u5426\u652f\u6301\u7ef4\u6570\u53ef\u53d8\u7684\u6570\u636e\u8f93\u5165":82,"paddlepaddle\u73af\u5883\u4f9d\u8d56":3,"paddlepaddle\u7684softmax\u80fd\u5426\u6307\u5b9a\u8ba1\u7b97\u7684\u7ef4\u5ea6":82,"paddlepaddle\u7f16\u8bd1\u4f9d\u8d56":0,"pi\u5e73\u53f0\u7f16\u8bd1\u6307\u5357":118,"pod\u95f4\u901a\u4fe1":97,"python\u4e0ec":107,"python\u4ee3\u7801\u7684\u6027\u80fd\u5206\u6790":107,"python\u76f8\u5173\u7684\u5355\u5143\u6d4b\u8bd5\u90fd\u8fc7\u4e0d\u4e86":78,"return":[58,59],"rnn\u6a21\u578b":112,"rnn\u914d\u7f6e":114,"switch":[43,68,69],"tensor\u4f7f\u7528\u6837\u4f8b":76,"tensor\u5230eigentensor\u7684\u8f6c\u6362":76,"tensor\u6a21\u5757":76,AWS:95,DNS:95,EFS:95,For:8,KMS:95,The:[7,14,18,26,30,33,34,36,37,47,50,54,60,61,69],Use:[7,56],Using:[8,14],With:17,about:33,absolut:49,access:95,account:95,action:[41,42],activ:42,actor:20,add:[40,43,95],address:95,advanc:68,alalysi:12,algorithm:[5,9,21,48,57],all:[64,70],analog:18,analysi:[21,40],anneal:83,api:[21,41,42,46,50,53,58,62,66],appendix:119,arbitrari:30,architectur:[21,55],argument:[28,59],arrai:5,asset:95,associ:[64,95],assumpt:119,async:103,attent:114,attribut:[40,62],auto:5,averag:53,aws:95,background:[5,23,39,41,68,69,70,71],backward:[6,30,34,60],base:[17,49],basic:[40,68,119],batch:59,batch_siz:59,beam:[49,67],becaus:83,benchmark:[41,42],benefit:[23,60],between:[4,20,58,60,66,68],big:83,binari:7,bla:0,block:[7,31,33,34,56,58,60],blockdesc:56,bootstrap:119,bring:119,bucket:95,build:[6,33,60],can:64,capi:46,capi_priv:46,challeng:[6,23,57],chang:49,channel:20,check:5,checkpoint:[9,10,16],choic:38,choos:[8,95],client:14,clip:24,clone:72,close:5,cloudform:95,cluster:[15,95,119],cmake:[8,41,42],code:[17,31,58,109],commit:72,compar:119,comparis:58,compat:30,compil:[7,29,31,56,60,109],complet:30,compos:59,comput:[7,34,43,60,62],con:119,concept:[58,60,95],concern:42,conclus:[16,35,119],concurr:[18,20],condit:33,configur:95,construct:34,content:[41,42,46,67,78,79,81,82,83,95,108,110],control:[40,60],contruct:40,convert:16,convolut:67,copi:52,core:[5,58,95],corner:6,cpu:22,creat:[6,20,59,60,64,95],createreaderop:19,creation:[13,53,62],creator:59,credenti:95,csp:20,ctc:67,cuda:[0,29,78],cudnn:0,current:[29,61],custom:59,data:[9,19,21,39,59,95],dataflow:40,dataprovid:103,dataset:[9,13],datatyp:47,decod:49,decor:59,decoratedread:19,deep:[7,30],deepspeech2:67,defin:95,definit:71,delet:95,demo:[33,95],dens:16,depend:[33,67],deploi:17,describ:[30,50],descript:[28,60],design:[4,5,7,9,13,14,15,16,18,20,21,22,23,25,26,29,30,32,33,34,37,41,42,43,44,47,48,49,50,51,52,54,56,58,59,60,61,64,65,67,68,69,70],destroi:[64,95],detail:[12,67],develop:60,devic:[52,68],devicecontext:68,dictionari:59,differ:[52,60,68],directori:95,discrimin:33,discuss:[23,33],dispatch:[9,13],distribut:[4,9,12,17,21,23,95],dnn:42,doc:[4,7,9,13,14,15,16,18,20,21,22,23,26,29,30,32,34,37,41,42,43,44,47,51,52,54,56,58,59,60,61,65,67,68,69],docker:17,doe:59,down:95,download:95,driver:78,drop_out:82,duplic:82,dure:[49,59],dylib:46,dynam:[9,70],dynet:35,ec2:95,eigen:76,elast:95,elect:16,els:7,engin:33,enough:5,entri:59,environ:17,error:24,evalu:25,event:[4,55],evolut:30,exampl:[4,8,18,20,36,46],except:81,execut:[7,22,30,56,60],executor:26,expand:110,explan:5,extern:95,faq:80,fault:9,feed:19,file:[7,95],fileread:19,find:95,first_seq:110,float16:29,flow:40,fluid:[18,20,30,31,43,51,109],fork:72,format:[7,9,44],forward:[34,52],frame:7,framework:[5,76],from:[4,16,66],functor:68,futur:[30,67],gan:33,gate:114,gener:[31,33,49,119],give:59,global:[56,58],gpu:103,grad_op:6,gradient:[5,6,14,42,61],graph:[34,35,40,60,62],group:95,gru:103,handler:[4,45],happen:16,hardwar:29,helper:58,hierarchi:7,high:[50,53,62,66],how:[5,12,53,59,60,68],iam:95,ifels:36,ifelseop:7,illeg:78,imag:17,implement:[5,6,8,12,22,24,25,29,44,48,51,53,58,59,60,61,62],imporv:52,infer:81,infershap:[56,65],infervartyp:37,ingredi:4,ingress:27,initi:[14,33,95],input:52,insid:64,inspect:95,instal:[95,119],instanc:95,instead:59,instruct:78,insuffici:78,integr:[68,95],intel:[41,42],interact:66,interfac:[5,9,14,15,26,50,59,64],intermedi:60,introduc:[49,70],introduct:[55,62],isn:59,issu:[29,72],job:[9,17,95,96],kei:[41,47,95],kernel:[43,47,60],kube:95,kubectl:95,kubernet:[17,95,96],languag:[7,31],larg:12,last_seq:110,layer:[4,32,41,42,58,82],layout:47,learn:[7,30,83],leval:66,level:[50,53,62,66],libpaddle_capi_shar:46,libpaddle_capi_whol:46,librari:[14,29,47,60,68],limit:21,list:[10,59],live:40,load:20,local:[21,64,95],lod:49,lodtensor:[48,49,70],lodtensordesc:71,logic:13,low:[53,62,66],lstm:103,machin:49,macro:60,main:33,make:40,manag:8,map:[59,60],master:[9,13,17,18],math:68,mathemat:5,matrix:42,member:33,memori:[40,48,68,82,111,113],messag:[66,83],method:49,might:33,migrat:60,mileston:60,mini:59,minibatch:20,mkl:[41,42],mkldnn:43,mkldnn_helper:43,mkldnndevicecontext:43,model:[4,12,14,16,20,30,33,44,49,114],modul:[60,68,78],more:33,motiv:[6,20,26,44,51,57],multi:[22,31],multipl:59,mxnet:35,name:[64,78,82,95],nativ:31,nccl:51,necess:58,necessari:60,need:59,nest:48,network:[60,114],neural:114,nlp:103,norm:62,note:5,numer:5,numpi:5,nvprof:108,nvvp:108,object:9,offset:49,onli:[59,64],onto:52,op_mak:60,oper:[19,32,36,40,43,47,53,56,58,60,61,65,70],opinfomap:60,opkernel:[60,68],opproto:66,ops:62,optim:[9,14,34,40,50,58],option:28,opwithkernel:60,order:28,org:77,origin:60,orthogon:64,other:42,output:95,overview:[16,24,26,41,42,52,60,64,67,109],pack:[41,49],packag:8,paddl:[12,51,59,64,76,78,82],paddlejob:17,paddlepaddl:[4,7,20,30,31,41,42,52,53,56,62,63,67,77,78,95,109,115],pair:95,paradigm:30,parallel_do:52,parallel_nn:105,paramet:[4,9,14,15,17,20,23,42,52,53,54,58,62,95],parameteraverageoptim:53,parent:64,part:34,partit:14,path:[16,28],penalti:62,perform:[52,53,103],persist:13,pfsclient:[27,28],pfsserver:27,place:[40,47,68],placement:21,platform:78,point:[41,81,95],polici:40,pool:110,pose:[37,61],potenti:38,pre:72,prefetch:59,prepar:95,principl:43,privat:95,pro:119,problem:[25,37,38,39,40,47,50,61],procedur:119,process:[9,14,17,50,60],program:[7,18,20,22,30,31,56,58],programdesc:[31,56],project:8,propos:[37,61,62],protobuf:65,protocol:83,provid:59,prune:57,pserver:16,pull:72,push:72,python:[5,17,21,41,42,48,50,53,58,59,62,66,71,89],qualiti:60,queue:[9,13],raspberri:118,rate:83,reader:[4,19,59],readerbas:19,readerhold:19,readop:19,realiz:60,recoveri:9,recurr:[82,113,114],recv:20,refactor:60,refer:[5,21,23,40,41,42,67],region:95,regist:[37,60,66],registr:[60,61],registri:60,regular:[14,62],reject:83,rel:49,relat:[19,60,70],remot:15,remoteexecutor:21,render:95,represent:[7,60],request:72,requir:[8,33],retri:13,reus:58,rnn:[48,70,103,111],rnnop:[7,48,60],route53:95,row:[65,67],rpc:20,run:[26,109],runtim:17,save:16,scale:9,scope:[7,48,60,64],search:[49,67],secur:95,select:[14,20,65],selectedrow:65,semant:69,send:20,separ:60,sequenc:[49,114],server:[9,13,14,17,20,23,95],servic:95,setup:95,sextant:119,sgd:[91,103],shape:49,share:[4,6,40,64],should:64,shuffl:59,simpl:49,singl:59,solut:[37,38,39,40,41,47,57,61],sourc:109,spars:[14,15,16,65],split:52,stack:7,start:[4,95],statement:25,step2:90,step:[48,90],storag:62,store:9,strategi:40,subcommond:28,submit:17,suffici:59,suitabl:8,sulut:43,summar:[4,18],summari:44,support:[29,51,68,70,78],survei:[29,35,62,119],synopsi:28,syntax:20,system:[30,95],tabl:[46,67],task:[9,13,67],tear:95,tecton:119,templat:95,tensor:[60,68,76],tensorarrai:[49,70],tensordesc:71,tensorflow:35,test:[41,42,43],theori:5,thi:[64,78],think:33,three:70,time:109,timelin:16,todo:[10,11,22],togeth:64,toler:9,too:83,tool:[8,119],topic:68,toward:31,train:[4,9,12,15,17,21,50,59,95],trainer:[9,14,16,17,20,95],transform:39,translat:49,transpil:[21,22,23,31,40,51],tune:103,ture:30,two:5,type:[20,47],uniform:70,unit:[41,42,43],unpack:49,updat:[4,15,16,95],usag:[6,24,48,49,59],use:[12,59],user:9,valu:58,variabl:[6,40,58,60,64,71],vartyp:71,verifi:95,version:[18,29,78],volum:95,vpc:95,what:[12,16],wheel:78,when:[16,64],whl:78,why:[29,30,53,59,60,70],work:67,worker:18}}) \ No newline at end of file +Search.setIndex({docnames:["build_and_install/build_from_source_cn","build_and_install/docker_install_cn","build_and_install/index_cn","build_and_install/pip_install_cn","dev/contribute_to_paddle_cn","dev/index_cn","dev/new_layer_cn","dev/write_docs_cn","faq/build_and_install/index_cn","faq/cluster/index_cn","faq/index_cn","faq/local/index_cn","faq/model/index_cn","faq/parameter/index_cn","getstarted/concepts/use_concepts_cn","getstarted/index_cn","getstarted/quickstart_cn","howto/capi/compile_paddle_lib_cn","howto/capi/index_cn","howto/capi/organization_of_the_inputs_cn","howto/capi/workflow_of_capi_cn","howto/cluster/cmd_argument_cn","howto/cluster/index_cn","howto/cluster/multi_cluster/fabric_cn","howto/cluster/multi_cluster/index_cn","howto/cluster/multi_cluster/k8s_aws_cn","howto/cluster/multi_cluster/k8s_cn","howto/cluster/multi_cluster/k8s_distributed_cn","howto/cluster/multi_cluster/openmpi_cn","howto/cluster/multi_cluster/src/k8s_data/README","howto/cluster/multi_cluster/src/k8s_train/README","howto/cluster/preparations_cn","howto/cmd_parameter/arguments_cn","howto/cmd_parameter/detail_introduction_cn","howto/cmd_parameter/index_cn","howto/cmd_parameter/use_case_cn","howto/index_cn","howto/optimization/gpu_profiling_cn","howto/rnn/hierarchical_layer_cn","howto/rnn/hrnn_rnn_api_compare_cn","howto/rnn/index_cn","howto/rnn/recurrent_group_cn","howto/rnn/rnn_config_cn","index_cn"],envversion:50,filenames:["build_and_install/build_from_source_cn.rst","build_and_install/docker_install_cn.rst","build_and_install/index_cn.rst","build_and_install/pip_install_cn.rst","dev/contribute_to_paddle_cn.md","dev/index_cn.rst","dev/new_layer_cn.rst","dev/write_docs_cn.rst","faq/build_and_install/index_cn.rst","faq/cluster/index_cn.rst","faq/index_cn.rst","faq/local/index_cn.rst","faq/model/index_cn.rst","faq/parameter/index_cn.rst","getstarted/concepts/use_concepts_cn.rst","getstarted/index_cn.rst","getstarted/quickstart_cn.rst","howto/capi/compile_paddle_lib_cn.md","howto/capi/index_cn.rst","howto/capi/organization_of_the_inputs_cn.md","howto/capi/workflow_of_capi_cn.md","howto/cluster/cmd_argument_cn.md","howto/cluster/index_cn.rst","howto/cluster/multi_cluster/fabric_cn.md","howto/cluster/multi_cluster/index_cn.rst","howto/cluster/multi_cluster/k8s_aws_cn.md","howto/cluster/multi_cluster/k8s_cn.md","howto/cluster/multi_cluster/k8s_distributed_cn.md","howto/cluster/multi_cluster/openmpi_cn.md","howto/cluster/multi_cluster/src/k8s_data/README.md","howto/cluster/multi_cluster/src/k8s_train/README.md","howto/cluster/preparations_cn.md","howto/cmd_parameter/arguments_cn.md","howto/cmd_parameter/detail_introduction_cn.md","howto/cmd_parameter/index_cn.rst","howto/cmd_parameter/use_case_cn.md","howto/index_cn.rst","howto/optimization/gpu_profiling_cn.rst","howto/rnn/hierarchical_layer_cn.rst","howto/rnn/hrnn_rnn_api_compare_cn.rst","howto/rnn/index_cn.rst","howto/rnn/recurrent_group_cn.md","howto/rnn/rnn_config_cn.rst","index_cn.rst"],objects:{},objnames:{},objtypes:{},terms:{"00m":37,"01org":8,"03m":37,"0424m":37,"04\u4ee5\u4e0a":3,"04\u4ee5\u53camaco":16,"055ee37d":25,"0630u":37,"06u":37,"0810u":37,"0957m":37,"0\u53f7\u8bad\u7ec3\u8282\u70b9\u662f\u4e3b\u8bad\u7ec3\u8282\u70b9":33,"0\u5c42\u5e8f\u5217":38,"0_cudnn5":0,"0_cudnn5_avx_mkl":[1,3],"0_cudnn7_avx_mkl":3,"100gi":25,"100m":11,"1150u":37,"11e6":26,"124n":37,"12\u4ee5\u4e0a":3,"12\u64cd\u4f5c\u7cfb\u7edf":8,"13m":26,"1490u":37,"14\u8fd9\u79cd\u5199\u6cd5\u5c06\u4f1a\u6d4b\u8bd5\u6a21\u578b":35,"1550u":37,"15\u884c":39,"16\u5b57\u8282\u8868\u793a\u4fdd\u5b58\u7684\u53c2\u6570\u603b\u4e2a\u6570":13,"16u":37,"173n":37,"1770u":37,"18ad":25,"18e457ce3d362ff5f3febf8e7f85ffec852f70f3b629add10aed84f930a68750":26,"197u":37,"1\u4e4b\u540e\u7684\u4efb\u4f55\u4e00\u4e2a\u7248\u672c\u6765\u7f16\u8bd1\u8fd0\u884c":0,"1\u7684\u5c42\u4e4b\u5916":35,"1\u7a00\u758f\u6570\u636e":6,"1\u8f6e\u5b58\u50a8\u7684\u6240\u6709\u6a21\u578b":35,"210u":37,"215n":37,"228u":37,"2520u":37,"2680u":37,"26\u884c":39,"279n":37,"27m":37,"285m":37,"2863m":37,"28m":37,"2977m":37,"2\u4e09\u7c7b\u7684\u6bd4\u4f8b\u4e3a":13,"2\u4e2a\u5b50\u5e8f\u5217":19,"2\u5206\u522b\u4ee3\u88683\u4e2a\u8282\u70b9\u7684trainer":27,"2\u610f\u5473\u77400\u53f7\u548c1\u53f7gpu\u5c06\u4f1a\u4f7f\u7528\u6570\u636e\u5e76\u884c\u6765\u8ba1\u7b97fc1\u548cfc2\u5c42":35,"2\u8fd9\u51e0\u4e2a\u76ee\u5f55\u8868\u793apaddlepaddle\u8282\u70b9\u4e0etrain":27,"2cbf7385":25,"302n":37,"30u":37,"328n":37,"32u":37,"331n":37,"3320u":37,"365e":25,"36u":37,"3710m":37,"3768m":37,"387u":37,"38u":37,"3920u":37,"39u":37,"3\u4ee5\u4e0a\u7684\u7b26\u53f7":3,"3\u53f7gpu":11,"4035m":37,"4090u":37,"4096mb":33,"4279m":37,"43u":37,"448a5b355b84":26,"4560u":37,"4563m":37,"45u":37,"4650u":37,"4726m":37,"473m":26,"4\u4e2a\u5e8f\u5217\u7684\u957f\u5ea6\u5206\u522b\u4e3a":19,"4\u5b57\u8282\u8868\u793apaddlepaddle\u7248\u672c\u4fe1\u606f":13,"4gb":33,"500m":11,"50bd":25,"50gi":25,"514u":37,"525n":37,"526u":37,"536u":37,"5460u":37,"5470u":37,"54u":37,"5690m":37,"573u":37,"578n":37,"5798m":37,"586u":37,"58s":26,"5969m":37,"5\u4f5c\u4e3a\u7f16\u8bd1\u73af\u5883":3,"5\u5373\u5c06\u505c\u6b62\u7ef4\u62a4":3,"5_cudnn5_avx_mkl":3,"5_cudnn5_avx_openbla":[3,16],"6080u":37,"6140u":37,"6305m":37,"639u":37,"655u":37,"6780u":37,"6810u":37,"682u":37,"6970u":37,"6\u4e07\u4ebf\u6b21\u6d6e\u70b9\u8fd0\u7b97\u6bcf\u79d2":37,"6\u4ee5\u4e0a":[3,16],"6\u4f5c\u4e3a\u6807\u51c6\u7f16\u8bd1\u73af\u5883":3,"6ce9":25,"704u":37,"7090u":37,"72u":37,"73u":37,"75u":37,"760u":37,"767u":37,"783n":37,"784u":37,"78m":37,"7\u4ee5\u4e0a\u7684\u7b26\u53f7":3,"7\u548cpip":8,"7\u7cfb\u5217":3,"7kb":26,"8000\u5c31\u53ef\u4ee5\u5728\u7f51\u9875\u4e0a\u751f\u6210\u9700\u8981\u7684\u6587\u6863":7,"8250u":37,"8300u":37,"830n":37,"849m":37,"861u":37,"8661m":37,"892m":37,"8\u5b57\u8282\u8868\u793a\u6bcf\u4e2a\u53c2\u6570\u5360\u7528\u7684\u5b57\u8282\u6570":13,"901n":37,"90u":37,"918u":37,"9247m":37,"924n":37,"9261m":37,"9330m":37,"94u":37,"9530m":37,"983m":37,"988u":37,"997u":37,"99u":37,"9f18":26,"\u4e00":39,"\u4e00\u4e2a":19,"\u4e00\u4e2a0\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a0\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u6269\u5c55\u6210\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a\u5206\u5e03\u5f0fpaddlepaddle\u8bad\u7ec3\u4efb\u52a1\u4e2d":26,"\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u6216\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u6269\u5c55\u6210\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u8fdb\u5165":41,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u6216\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a0\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u8fdb\u5165":41,"\u4e00\u4e2a\u53cc\u5c42rnn\u7531\u591a\u4e2a\u5355\u5c42rnn\u7ec4\u6210":41,"\u4e00\u4e2a\u53ef\u8c03\u7528\u7684\u51fd\u6570":41,"\u4e00\u4e2a\u6570\u636e\u96c6\u5927\u90e8\u5206\u5e8f\u5217\u957f\u5ea6\u662f100":11,"\u4e00\u4e2a\u662f\u6d6e\u70b9\u8ba1\u7b97\u91cf":37,"\u4e00\u4e2a\u72ec\u7acb\u7684\u5143\u7d20":38,"\u4e00\u4e2a\u72ec\u7acb\u7684\u8bcd\u8bed":38,"\u4e00\u4e2a\u7f51\u7edc\u5c42\u7684\u524d\u5411\u4f20\u64ad\u90e8\u5206\u628a\u8f93\u5165\u8f6c\u5316\u4e3a\u76f8\u5e94\u7684\u8f93\u51fa":6,"\u4e00\u4e2a\u7f51\u7edc\u5c42\u7684\u53c2\u6570\u662f\u5728":6,"\u4e00\u4e2a\u7f51\u7edc\u5c42\u7684c":6,"\u4e00\u4e2a\u8f93\u51fa\u6570\u636e\u540c\u6837\u88ab\u7ec4\u7ec7\u4e3a\u4e00\u4e2a":19,"\u4e00\u4e2a\u91cd\u8981\u7684\u95ee\u9898\u662f\u9009\u62e9\u6b63\u786e\u7684learning_r":13,"\u4e00\u4e2agpu\u8bbe\u5907\u4e0a\u4e0d\u5141\u8bb8\u914d\u7f6e\u591a\u4e2a\u6a21\u578b":33,"\u4e00\u4e2agradientmachine\u7c7b\u7684\u5bf9\u8c61\u7ba1\u7406\u7740\u4e00\u7ec4\u8ba1\u7b97\u5c42":20,"\u4e00\u4e2alabel":39,"\u4e00\u4e2amemory\u5305\u542b":42,"\u4e00\u4e9b\u60c5\u51b5\u4e3a\u4e86\u4fbf\u4e8e\u53d1\u5e03":20,"\u4e00\u4eba":39,"\u4e00\u53e5\u8bdd\u662f\u7531\u8bcd\u8bed\u6784\u6210\u7684\u5e8f\u5217":41,"\u4e00\u53f0\u7535\u8111":0,"\u4e00\u65e9":39,"\u4e00\u662fbatch":11,"\u4e00\u6837\u7684\u65b9\u5f0f":0,"\u4e00\u6b21\u6027\u676f\u5b50":39,"\u4e00\u7ef4\u6570\u7ec4":[19,20],"\u4e00\u7ef4\u6574\u578b\u6570\u7ec4":19,"\u4e00\u81f4":[38,39],"\u4e00\u822c\u4ece":4,"\u4e00\u822c\u5728paddlepaddle\u4e2d":39,"\u4e00\u822c\u662f\u7531\u4e8e\u76f4\u63a5\u4f20\u9012\u5927\u5b57\u5178\u5bfc\u81f4\u7684":13,"\u4e00\u822c\u6765\u8bf4":42,"\u4e00\u822c\u8868\u793a":39,"\u4e00\u822c\u8bbe\u7f6e":13,"\u4e00\u8282":20,"\u4e09\u79cd\u5e8f\u5217\u6a21\u5f0f":14,"\u4e0a":4,"\u4e0a\u4f20\u8ba1\u7b97\u5f97\u51fa\u7684\u68af\u5ea6":22,"\u4e0a\u56fe\u4e2d\u7684":19,"\u4e0a\u56fe\u4e2d\u865a\u7ebf\u7684\u8fde\u63a5":39,"\u4e0a\u56fe\u63cf\u8ff0\u4e86\u4e00\u4e2a3\u8282\u70b9\u7684\u5206\u5e03\u5f0f\u8bad\u7ec3\u573a\u666f":27,"\u4e0a\u7f16\u8bd1\u5f88\u6162":0,"\u4e0a\u7f51":39,"\u4e0a\u8ff0\u4ee3\u7801\u5c06bias\u5168\u90e8\u521d\u59cb\u5316\u4e3a1":13,"\u4e0a\u8ff0\u547d\u4ee4\u4e2d":1,"\u4e0a\u8ff0\u547d\u4ee4\u628a\u5f53\u524d\u76ee\u5f55":0,"\u4e0a\u8ff0\u7684":12,"\u4e0a\u8ff0\u7684\u4ee3\u7801\u7247\u6bb5\u5305\u542b\u4e86\u4e24\u79cd\u65b9\u6cd5":37,"\u4e0a\u8ff0\u7b2c4\u6b65":0,"\u4e0b":7,"\u4e0b\u4e00\u6b65\u5c31\u662f\u7528\u6a21\u578b\u6765\u505a\u9884\u6d4b":18,"\u4e0b\u4f1a\u770b\u5230\u5982\u4e0b\u76ee\u5f55\u7ed3\u6784":17,"\u4e0b\u540c":13,"\u4e0b\u56fe\u4e2d\u5c31\u5c55\u793a\u4e86\u4e00\u4e9b\u5173\u4e8e\u5185\u5b58\u6570\u636e\u8fc1\u5f99\u548c\u8ba1\u7b97\u8d44\u6e90\u5229\u7528\u7387\u7684\u5efa\u8bae":37,"\u4e0b\u56fe\u662f\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42\u7684\u793a\u610f\u56fe":6,"\u4e0b\u56fe\u662fcsr\u5b58\u50a8\u7a00\u758f\u77e9\u9635\u7684\u793a\u610f\u56fe":19,"\u4e0b\u627e\u5230":17,"\u4e0b\u6587\u4ee5nlp\u4efb\u52a1\u4e3a\u4f8b":41,"\u4e0b\u6587\u4f1a\u8be6\u7ec6\u8fdb\u884c\u4ecb\u7ecd":19,"\u4e0b\u6587\u4f7f\u7528":27,"\u4e0b\u6587\u5c31\u662f\u7528job\u7c7b\u578b\u7684\u8d44\u6e90\u6765\u8fdb\u884c\u8bad\u7ec3":26,"\u4e0b\u6587\u8be6\u7ec6\u89e3\u91ca":19,"\u4e0b\u6b21":39,"\u4e0b\u7684":[20,27],"\u4e0b\u8868\u5217\u51fa\u4e86python\u7aef\u8bad\u7ec3\u63a5\u53e3\u66b4\u9732\u7684\u6570\u636e\u7c7b\u578b":19,"\u4e0b\u8f7d\u5b8c\u6570\u636e\u540e":26,"\u4e0b\u8f7d\u6307\u5b9a\u7248\u672c\u7684docker\u955c\u50cf":1,"\u4e0b\u8f7dgpu\u7248\u672c":1,"\u4e0b\u9762":20,"\u4e0b\u9762\u4e3e\u4e2a\u7b80\u5355\u7684\u4f8b\u5b50":37,"\u4e0b\u9762\u4ee5":21,"\u4e0b\u9762\u5217\u51fa\u4e86":42,"\u4e0b\u9762\u5217\u51fa\u4e86\u5168\u8fde\u63a5\u5c42\u7684\u68af\u5ea6\u68c0\u67e5\u5355\u5143\u6d4b\u8bd5":6,"\u4e0b\u9762\u5c31\u6839\u636e\u8fd9\u51e0\u4e2a\u6b65\u9aa4\u5206\u522b\u4ecb\u7ecd":27,"\u4e0b\u9762\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u955c\u50cf\u6765\u4e0b\u8f7d\u6570\u636e\u5230docker":26,"\u4e0b\u9762\u662fc":20,"\u4e0b\u9762\u7684\u4ee3\u7801\u5c06\u968f\u673a\u751f\u6210\u7684\u77e9\u9635\u8f6c\u5316\u4e3a\u53ef\u4ee5\u88abpaddlepaddle\u52a0\u8f7d\u7684\u6a21\u578b\u53c2\u6570":13,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u4ecegithub\u62c9\u53d6\u6700\u65b0\u4ee3\u7801":17,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u521b\u5efa\u4e86\u4e00\u4e2a\u9ad8\u5ea6\u4e3a1":19,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u521b\u5efa\u4e86\u4e00\u4e2acpu\u4e0a\u7684\u4e8c\u503c\u7a00\u758f\u77e9\u9635":19,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u521b\u5efa\u4e86\u542b\u6709\u4e09\u4e2a\u5143\u7d20":19,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u5728\u521b\u5efa\u4e86\u4e00\u4e2acpu\u4e0a\u7684\u5e26\u5143\u7d20\u503c\u7684\u7a00\u758f\u77e9\u9635":19,"\u4e0b\u9762\u7684\u4ee3\u7801\u7247\u6bb5\u5b9e\u73b0\u4e86":6,"\u4e0b\u9762\u7ed9\u51fa\u4e86\u4e00\u4e2a\u4f8b\u5b50":6,"\u4e0b\u9762\u7ed9\u51fa\u5728\u4e09\u7ef4\u7a7a\u95f4\u4e2d\u4f7f\u7528\u7ebf\u6027\u56de\u5f52\u62df\u5408\u4e00\u6761\u76f4\u7ebf\u7684\u4f8b\u5b50":14,"\u4e0b\u9762\u8be6\u7ec6\u89e3\u91ca\u4ec0\u4e48\u662f":19,"\u4e0b\u9762\u8fd9\u4e9blayer\u80fd\u591f\u63a5\u53d7\u53cc\u5c42\u5e8f\u5217\u4f5c\u4e3a\u8f93\u5165":38,"\u4e0d":39,"\u4e0d\u4ec5\u8981\u63d0\u4f9b\u6bcf\u4e00\u4e2a\u5916\u5c42\u5e8f\u5217\u5728\u6574\u4e2a":19,"\u4e0d\u4f1a\u4fdd\u7559\u5728\u78c1\u76d8\u4e0a":0,"\u4e0d\u4f1a\u518d\u4ece":11,"\u4e0d\u4f1a\u865a\u62df\u4efb\u4f55\u786c\u4ef6":0,"\u4e0d\u4f7f\u7528\u989d\u5916\u7a7a\u95f4":6,"\u4e0d\u53ef\u518d\u8fdb\u884c\u62c6\u5206":19,"\u4e0d\u540c\u4e8e\u4e0a\u8ff0\u4ecb\u7ecd\u7684recurr":12,"\u4e0d\u540c\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u6570\u636e\u5927\u5c0f\u7684\u6700\u5927\u503c\u4e0e\u6700\u5c0f\u503c\u7684\u6bd4\u7387":33,"\u4e0d\u540c\u5e8f\u5217\u53ef\u80fd\u4f1a\u542b\u6709\u4e0d\u540c\u6570\u76ee\u4e2a\u5143\u7d20":19,"\u4e0d\u540c\u65f6\u95f4\u6b65\u7684\u8f93\u5165\u662f\u4e0d\u540c\u7684":42,"\u4e0d\u540c\u7684\u4f18\u5316\u7b97\u6cd5\u9700\u8981\u4f7f\u7528\u4e0d\u540c\u5927\u5c0f\u7684\u5185\u5b58":11,"\u4e0d\u540c\u7684\u5206\u5e03\u5f0f\u6587\u4ef6\u7cfb\u7edf":27,"\u4e0d\u540c\u7684\u6570\u636e\u7c7b\u578b\u548c\u5e8f\u5217\u6a21\u5f0f\u8fd4\u56de\u7684\u683c\u5f0f\u4e0d\u540c":14,"\u4e0d\u540c\u8ba1\u7b97\u5c42\u5bf9\u7a7a\u8f93\u5165\u7684\u5904\u7406\u7b56\u7565\u6709\u53ef\u80fd\u4e0d\u540c":19,"\u4e0d\u540c\u8f93\u5165\u542b\u6709\u7684\u5b50\u53e5":41,"\u4e0d\u540c\u8f93\u5165\u5e8f\u5217\u542b\u6709\u7684\u8bcd\u8bed\u6570\u5fc5\u987b\u4e25\u683c\u76f8\u7b49":41,"\u4e0d\u540cdataprovider\u5bf9\u6bd4\u5982\u4e0b":39,"\u4e0d\u5c11":39,"\u4e0d\u5e94\u8be5\u88ab\u62c6\u89e3":41,"\u4e0d\u6307\u5b9a\u65f6":41,"\u4e0d\u652f\u6301":19,"\u4e0d\u652f\u6301\u5e8f\u5217\u957f\u5ea6\u4e3a":19,"\u4e0d\u662f\u4e00\u6761\u5e8f\u5217":14,"\u4e0d\u662f\u771f\u6b63\u7684layer":12,"\u4e0d\u662f\u901a\u8fc7\u4e00\u822c\u7684\u65b9\u5f0f\u6765\u5b9e\u73b0\u5bf9\u8f93\u51fa\u7684\u6fc0\u6d3b":12,"\u4e0d\u6ee1\u8db3\u94a9\u5b50\u7684":4,"\u4e0d\u80fd\u592a\u968f\u610f":4,"\u4e0d\u80fd\u88ab\u63d0\u4ea4\u5230":4,"\u4e0d\u8981\u5728\u6ce8\u91cd\u6027\u80fd\u7684\u8bad\u7ec3\u573a\u666f\u4e0b\u4f7f\u7528":11,"\u4e0d\u8bba\u5e8f\u5217\u4e2d\u7684\u5143\u7d20\u5728\u5185\u5b58\u4e2d\u5360\u7528\u591a\u5c11\u5b9e\u9645\u5b58\u50a8\u7a7a\u95f4":19,"\u4e0d\u8bba\u6570\u636e\u57df\u662f":19,"\u4e0d\u8bba\u662f\u4e00\u7ef4\u6574\u578b\u6570\u7ec4\u8fd8\u662f\u4e8c\u7ef4\u6d6e\u70b9\u6570\u77e9\u9635":19,"\u4e0d\u8bba\u662f\u5355\u5c42\u5e8f\u5217\u8fd8\u662f\u53cc\u5c42\u5e8f\u5217\u7684\u5e8f\u5217\u4fe1\u606f":19,"\u4e0d\u8fc7":39,"\u4e0d\u8fc7\u5b9e\u9645\u4e0a\u662f\u8fd0\u884c\u5728\u4e00\u4e2a":0,"\u4e0d\u8fdc":39,"\u4e0d\u9519":39,"\u4e0d\u9700\u5728\u4f7f\u7528c":20,"\u4e0d\u9700\u8981\u4f9d\u8d56\u5176\u4ed6\u4efb\u4f55\u8f6f\u4ef6\u4e86":0,"\u4e0d\u9700\u8981\u63d0\u4f9b\u5143\u7d20\u503c":19,"\u4e0e":27,"\u4e0e\u5176\u5b83":20,"\u4e0e\u5355\u5c42rnn\u7684\u914d\u7f6e\u7c7b\u4f3c":39,"\u4e0e\u540c\u6b65sgd\u76f8\u6bd4":22,"\u4e0e\u5f53\u524d\u7684\u8870\u51cf\u56e0\u5b50\u7684\u4e58\u79ef":13,"\u4e0e\u672c\u5730\u8bad\u7ec3\u76f8\u540c":23,"\u4e0e\u6b64\u4e0d\u540c\u7684\u662f":27,"\u4e0e\u8f93\u5165\u4e0d\u540c\u7684\u662f":20,"\u4e0e\u8fd9\u4e2a\u8bad\u7ec3\u6570\u636e\u4ea4\u4e92\u7684layer":11,"\u4e0ejob":27,"\u4e14":39,"\u4e14\u4e0d\u6392\u9664commit\u4e4b\u95f4\u7684\u4fee\u6539\u5b58\u5728\u76f8\u4e92\u8986\u76d6\u7684\u60c5\u51b5":4,"\u4e14\u4f7f\u7528":17,"\u4e14\u5e8f\u5217\u7684\u6bcf\u4e00\u4e2a\u5143\u7d20\u8fd8\u662f\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217":14,"\u4e14\u6bcf\u4e2a\u53e5\u5b50\u8868\u793a\u4e3a\u5bf9\u5e94\u7684\u8bcd\u8868\u7d22\u5f15\u6570\u7ec4":39,"\u4e24":39,"\u4e24\u4e2a\u5b50\u76ee\u5f55\u4e0b":7,"\u4e24\u4e2a\u5d4c\u5957\u7684":41,"\u4e24\u4e2a\u64cd\u4f5c":37,"\u4e24\u4e2a\u8f93\u5165\u7684\u5b50\u5e8f\u5217\u957f\u5ea6\u4e5f\u5e76\u4e0d\u76f8\u540c":39,"\u4e24\u4e2a\u90e8\u5206":7,"\u4e24\u4e2a\u9690\u5c42\u7684\u7b80\u5355\u5168\u8fde\u63a5\u7f51\u7edc":20,"\u4e24\u6b21":19,"\u4e24\u79cd\u5e38\u7528\u7684\u6a21\u578b\u52a0\u8f7d\u65b9\u5f0f":20,"\u4e24\u79cd\u65b9\u6cd5\u7684\u533a\u522b":11,"\u4e24\u79cdblas\u5e93":0,"\u4e24\u8005\u90fd\u662f\u5bf9\u68af\u5ea6\u7684\u622a\u65ad":11,"\u4e2a\u5185\u5b58\u6c60\u5b9e\u9645\u4e0a\u51b3\u5b9a\u4e86shuffle\u7684\u7c92\u5ea6":11,"\u4e2a\u6279\u6b21\u7684\u53c2\u6570\u5e73\u5747\u503c\u8fdb\u884c\u6d4b\u8bd5":33,"\u4e2a\u6a21\u578b\u6d4b\u8bd5\u6570\u636e":33,"\u4e2d":[6,11,19,27],"\u4e2d\u4e0d\u8981\u6dfb\u52a0\u5927\u6587\u4ef6\u7b49":4,"\u4e2d\u4f1a\u4f7f\u7528\u5230\u7684\u5b57\u5178\u6570\u636e\u6587\u4ef6":21,"\u4e2d\u4f20\u5165\u53c2\u6570":21,"\u4e2d\u4f20\u5165\u7684\u53c2\u6570":21,"\u4e2d\u5143\u7d20\u4e2a\u6570\u603b\u662f\u7b49\u4e8e\u884c\u6570":19,"\u4e2d\u5143\u7d20\u7684\u4e2a\u6570\u7b49\u4e8e\u7f51\u7edc\u4e2d\u8f93\u51fa\u5c42\u7684\u4e2a\u6570":11,"\u4e2d\u5173\u4e8e\u65f6\u95f4\u9012\u5f52\u795e\u7ecf\u7f51\u7edc\u7684\u4ecb\u7ecd":39,"\u4e2d\u5355\u5143\u6d4b\u8bd5\u7684\u4e00\u90e8\u5206":4,"\u4e2d\u5355\u5143\u6d4b\u8bd5\u80fd\u987a\u5229\u901a\u8fc7":4,"\u4e2d\u542b\u6709\u591a\u4e2a\u5e8f\u5217":19,"\u4e2d\u5b9a\u4e49":42,"\u4e2d\u5b9a\u4e49\u548c\u4f7f\u7528":41,"\u4e2d\u6253\u5370\u5176\u503c":11,"\u4e2d\u6307\u5b9a":33,"\u4e2d\u6307\u5b9a\u7684\u540d\u5b57":35,"\u4e2d\u641c\u7d22\u8fd9\u51e0\u4e2a\u5e93":0,"\u4e2d\u64cd\u4f5c":19,"\u4e2d\u6587\u6587\u6863":7,"\u4e2d\u6587\u6587\u6863\u76ee\u5f55":7,"\u4e2d\u6587\u7ef4\u57fa\u767e\u79d1\u9875\u9762":39,"\u4e2d\u6bcf\u4e2a\u5143\u7d20\u662f\u4e00\u4e2alayer\u7684\u8f93\u51fa\u7ed3\u679c\u77e9\u9635":11,"\u4e2d\u6bcf\u4e2apod\u7684ip\u5730\u5740":27,"\u4e2d\u6bcf\u5c42\u7684\u6570\u503c\u7edf\u8ba1":33,"\u4e2d\u7528\u4e8e\u5b58\u50a8\u6570\u636e\u7684":20,"\u4e2d\u7684":20,"\u4e2d\u7684\u4e00\u884c":4,"\u4e2d\u7684\u4ee3\u7801\u4f5c\u4e3a\u5b9e\u4f8b":21,"\u4e2d\u7684\u504f\u79fb":19,"\u4e2d\u7684\u5bf9\u5e94\u5206\u652f\u5373\u53ef":4,"\u4e2d\u7684\u76f8\u5173\u811a\u672c":20,"\u4e2d\u7684\u8d77\u59cb\u504f\u79fb":19,"\u4e2d\u83b7\u53d6":27,"\u4e2d\u8bbe\u7f6e\u7684\u6240\u6709\u8282\u70b9":23,"\u4e2d\u8be6\u7ec6\u4ecb\u7ecd":6,"\u4e2d\u914d\u7f6e\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":42,"\u4e34\u65f6\u53d8\u91cf\u7b49\u7b49":11,"\u4e3a":[19,42],"\u4e3a\u4e86\u4f7f\u8bc4\u5ba1\u4eba\u5728\u8bc4\u5ba1\u4ee3\u7801\u65f6\u66f4\u597d\u5730\u4e13\u6ce8\u4e8e\u4ee3\u7801\u672c\u8eab":4,"\u4e3a\u4e86\u4fdd\u8bc1\u6548\u7387":6,"\u4e3a\u4e86\u4fdd\u8bc1gpu\u9a71\u52a8\u80fd\u591f\u5728\u955c\u50cf\u91cc\u9762\u6b63\u5e38\u8fd0\u884c":1,"\u4e3a\u4e86\u51cf\u5c11\u751f\u6210\u94fe\u63a5\u5e93\u7684\u5927\u5c0f\u628a":17,"\u4e3a\u4e86\u548c\u7528\u6237\u7cfb\u7edf\u517c\u5bb9":18,"\u4e3a\u4e86\u5c01\u88c5\u80fd\u591f\u6b63\u786e\u5de5\u4f5c":6,"\u4e3a\u4e86\u63cf\u8ff0\u65b9\u4fbf":41,"\u4e3a\u4e86\u65b9\u4fbf\u5927\u5bb6":4,"\u4e3a\u4e86\u65b9\u4fbf\u5927\u5bb6\u7684\u90e8\u7f72":24,"\u4e3a\u4e86\u7f16\u8bd1paddlepaddl":0,"\u4e3a\u4e86\u8fbe\u5230\u6027\u80fd\u6700\u4f18":37,"\u4e3a\u4ec0\u4e48\u7528":0,"\u4e3a\u4f7f\u7528c":20,"\u4e3a\u4f8b":12,"\u4e3a\u53c2\u6570\u77e9\u9635\u7684\u5bbd\u5ea6":13,"\u4e3a\u5b83\u4eec\u9644\u52a0\u4e0a\u5e8f\u5217\u4fe1\u606f\u5c06\u53d8\u6210\u5e8f\u5217\u8f93\u5165":19,"\u4e3a\u5bb9\u5668\u5185\u6267\u884c\u7684\u547d\u4ee4":1,"\u4e3a\u60a8\u505a\u6027\u80fd\u8c03\u4f18\u63d0\u4f9b\u4e86\u65b9\u5411":37,"\u4e3a\u65b9\u4fbf\u4f5c\u4e1a\u542f\u52a8\u63d0\u4f9b\u4e86\u4e24\u4e2a\u72ec\u7279\u7684\u547d\u4ee4\u9009\u9879":23,"\u4e3a\u6b64":26,"\u4e3a\u6bcf\u4e00\u4e2a":[19,20],"\u4e3a\u6bcf\u4e00\u4e2a\u8f93\u5165":[19,20],"\u4e3a\u8f93\u51fa\u5206\u914d\u5185\u5b58":6,"\u4e3aoutput_\u7533\u8bf7\u5185\u5b58":6,"\u4e3b\u8981\u4e3a\u5f00\u53d1\u8005\u4f7f\u7528":33,"\u4e3b\u8981\u5305\u62ec\u56db\u79cd\u7c7b\u578b":14,"\u4e3b\u8981\u539f\u56e0":39,"\u4e3b\u8981\u539f\u56e0\u5305\u62ec\u4e24\u4e2a\u65b9\u9762":11,"\u4e3e\u4e00\u4e2a\u4f8b\u5b50":13,"\u4e3e\u4f8b":11,"\u4e3e\u4f8b\u8bf4\u660e":39,"\u4e4b\u524d":4,"\u4e4b\u540e":[6,14,21],"\u4e4b\u540e\u4f7f\u7528":6,"\u4e4b\u540e\u4f7f\u7528\u77e9\u9635\u8fd0\u7b97\u51fd\u6570\u6765\u8ba1\u7b97":6,"\u4e4b\u540e\u518d\u7528\u7f51\u9875\u8fde\u5230http":7,"\u4e4b\u540e\u521d\u59cb\u5316\u6240\u6709\u7684\u6743\u91cd\u77e9\u9635":6,"\u4e4b\u540e\u624d\u80fd\u5f00\u59cb\u7f16\u8bd1\u7684\u6b65\u9aa4":0,"\u4e4b\u7c7b\u7684\u7a0b\u5e8f\u6765\u7f16\u8bd1\u6e90\u7801":0,"\u4e4b\u95f4\u7684\u8fd0\u7b97\u662f\u72ec\u7acb\u7684":41,"\u4e58\u4e0a\u8f93\u51fa\u7684\u68af\u5ea6":6,"\u4e58\u9664\u7b49\u65f6\u5019":11,"\u4e5f":39,"\u4e5f\u4e0d\u5b58\u5728\u4e00\u4e2asubseq\u76f4\u63a5\u751f\u6210\u4e0b\u4e00\u4e2asubseq\u7684\u60c5\u51b5":41,"\u4e5f\u4e0d\u80fd\u63a5\u6536\u5e8f\u5217\u6570\u636e\u4f5c\u4e3a\u8f93\u5165":12,"\u4e5f\u4f1a\u5360\u7528\u78c1\u76d8":0,"\u4e5f\u53ef\u4ee5\u4f7f\u7528":4,"\u4e5f\u53ef\u4ee5\u5229\u7528paddlepaddl":7,"\u4e5f\u53ef\u4ee5\u662f\u4e00\u4e2a\u8bcd\u8bed":41,"\u4e5f\u53ef\u4ee5\u662f\u5728\u4efb\u52a1\u542f\u52a8\u524d\u4e0b\u8f7d\u5230\u672c\u5730\u7684":21,"\u4e5f\u53ef\u4ee5\u76f4\u63a5\u5728\u7f51\u9875\u9884\u89c8\u6587\u6863":7,"\u4e5f\u53ef\u4ee5\u8bf4\u662f\u67d0\u4e9b\u7279\u5b9a\u6307\u4ee4\u7684\u4f7f\u7528\u60c5\u51b5":37,"\u4e5f\u53ef\u4ee5\u901a\u8fc7\u4fee\u6539":27,"\u4e5f\u5c31\u662f":4,"\u4e5f\u5c31\u662f\u7a7a\u8f93\u5165":19,"\u4e5f\u5c31\u662f\u81ea\u5df1\u7528\u6237\u540d\u4e0b\u7684":4,"\u4e5f\u5c31\u662f\u8bf4":[19,33,35],"\u4e5f\u5c31\u662fpaddlepaddle\u4e2d\u7684\u4e00\u7ef4\u6574\u578b\u6570\u7ec4":19,"\u4e5f\u63cf\u8ff0\u4e86\u5bb9\u5668\u9700\u8981\u4f7f\u7528\u7684\u5b58\u50a8\u5377\u6302\u8f7d\u7684\u60c5\u51b5":27,"\u4e5f\u652f\u6301cpu\u7684\u6027\u80fd\u5206\u6790":37,"\u4e5f\u662f\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217":39,"\u4e5f\u662fdecoder\u5faa\u73af\u5c55\u5f00\u7684\u4f9d\u636e":41,"\u4e5f\u6ca1\u7528":8,"\u4e7e":39,"\u4e86":[0,39],"\u4e86\u89e3\u60a8\u7684\u786c\u4ef6":37,"\u4e86\u89e3\u66f4\u591a\u7ec6\u8282":42,"\u4e86\u89e3\u66f4\u591a\u8be6\u7ec6\u4fe1\u606f":42,"\u4e8c\u7ef4\u6d6e\u70b9\u578b\u77e9\u9635":19,"\u4e8c\u7ef4\u6d6e\u70b9\u6570\u77e9\u9635":19,"\u4e8c\u7ef4\u77e9\u9635":20,"\u4e8c\u7ef4\u77e9\u9635\u53ef\u4ee5\u8868\u793a\u884c\u5411\u91cf\u548c\u5217\u5411\u91cf":19,"\u4e8c\u8005\u8bed\u610f\u4e0a\u5b8c\u5168\u4e00\u81f4":39,"\u4e8e\u662f":19,"\u4e94\u661f\u7ea7":39,"\u4ea4\u901a":39,"\u4ea4\u901a\u4fbf\u5229":39,"\u4eab\u53d7\u60a8\u7684\u65c5\u7a0b":1,"\u4ec0\u4e48\u662f":0,"\u4ec5\u5728\u8fdc\u7a0b\u7a00\u758f\u8bad\u7ec3\u65f6\u6709\u6548":6,"\u4ec5\u5bf9\u7a00\u758f\u6570\u636e\u6709\u6548":6,"\u4ec5\u652f\u6301\u6574\u578b\u503c":19,"\u4ec5\u7528\u4e8e\u5b58\u50a8\u6574\u578b\u503c":20,"\u4ecb\u7ecd\u4e86\u4e00\u79cd\u901a\u8fc7ssh\u8fdc\u7a0b\u5206\u53d1\u4efb\u52a1":27,"\u4ecb\u7ecd\u4f7f\u7528paddlepaddl":21,"\u4ece":[9,37],"\u4ece0\u5230num":33,"\u4ece0\u5f00\u59cb\u7684\u6574\u6570":21,"\u4ece\u4e00\u4e2aword\u751f\u6210\u4e0b\u4e00\u4e2aword":41,"\u4ece\u5185\u6838\u51fd\u6570\u7684\u89d2\u5ea6":37,"\u4ece\u6a21\u578b\u6587\u4ef6\u5c06\u9884\u8bad\u7ec3\u53c2\u6570\u8f7d\u5165":13,"\u4ece\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u6765\u770b":39,"\u4ece\u6e90\u7801\u4e2d\u6784\u5efa\u7528\u4e8e\u7f16\u8bd1paddlepaddle\u7684docker\u955c\u50cf":0,"\u4ece\u6e90\u7801\u7f16\u8bd1":2,"\u4ece\u78c1\u76d8\u52a0\u8f7d\u9884\u6d4b\u6a21\u578b":20,"\u4ece\u800c\u53ef\u4ee5\u505a\u4e00\u4e9b\u4e0e\u8ba1\u7b97\u91cd\u53e0\u7684\u5de5\u4f5c":6,"\u4ece\u800c\u5f15\u53d1\u5176\u4ed6\u8282\u70b9\u65e0\u6cd5\u8fde\u63a5\u5bfc\u81f4":9,"\u4ece\u8bed\u4e49\u4e0a\u770b":41,"\u4ece\u8d77\u59cb\u7aef\u53e3\u76d1\u542c\u591a\u4e2a\u7aef\u53e3\u7528\u4e8e\u901a\u4fe1":21,"\u4ece\u8f93\u5165\u6570\u636e\u4e0a\u770b":39,"\u4ecestart":33,"\u4ed3\u5e93\u7684\u8fdc\u7a0b\u4e3b\u673a":4,"\u4ed6\u4eec\u5206\u522b\u662f":39,"\u4ed6\u4eec\u5728\u81ea\u5df1\u7684":0,"\u4ed6\u4eec\u5728paddle\u7684\u6587\u6863\u548capi\u4e2d\u662f\u4e00\u4e2a\u6982\u5ff5":39,"\u4ee3\u66ff":27,"\u4ee3\u7801\u4e2d9":39,"\u4ee3\u7801\u53c2\u8003":21,"\u4ee3\u7801\u5982\u4e0b":[11,12,13,42],"\u4ee3\u7801\u6ce8\u91ca\u8bf7\u9075\u5b88":4,"\u4ee3\u7801\u7247\u6bb5\u5982\u4e0b":19,"\u4ee3\u7801\u793a\u4f8b\u5982\u4e0b":20,"\u4ee3\u8868\u5bbf\u4e3b\u673a\u76ee\u5f55":27,"\u4ee5":12,"\u4ee5\u4e0a":4,"\u4ee5\u4e0a\u4e24\u79cd\u65b9\u5f0f\u53ea\u9700\u9009\u62e9\u5176\u4e00\u5373\u53ef":20,"\u4ee5\u4e0b\u4ee3\u7801\u7247\u6bb5\u5b9a\u4e49":42,"\u4ee5\u4e0b\u5c06\u4e00\u4e00\u4ecb\u7ecd":24,"\u4ee5\u4e0b\u6307\u4ee4\u80fd\u68c0\u67e5linux\u7535\u8111\u662f\u5426\u652f\u6301avx":1,"\u4ee5\u4e0b\u6307\u5357\u4ecb\u7ecd\u4e86\u5982\u4f55\u4f7f\u7528openmpi\u6765\u642d\u5efapaddlepaddle\u7684\u96c6\u7fa4\u8bad\u7ec3\u4efb\u52a1":24,"\u4ee5\u4e0b\u6307\u5357\u5c55\u793a\u4e86paddlepaddle\u5bf9kubernetes\u7684\u652f\u6301":24,"\u4ee5\u4e0b\u64cd\u4f5c\u5747\u5728head\u8282\u70b9\u4e2d\u6267\u884c":28,"\u4ee5\u4e0b\u6559\u7a0b\u5c06\u6307\u5bfc\u60a8\u63d0\u4ea4\u4ee3\u7801":4,"\u4ee5\u4ea4\u4e92\u5f0f\u7684\u65b9\u5f0f\u6267\u884c\u6216\u8c03\u8bd5\u60a8\u7684\u4ee3\u7801":1,"\u4ee5\u4f7f\u7528adam\u7b97\u6cd5\u4e3a\u4f8b":13,"\u4ee5\u4fbf\u83b7\u5f97\u8bad\u7ec3\u6570\u636e\u7684\u4f4d\u7f6e\u548c\u83b7\u53d6\u73af\u5883\u53d8\u91cf\u914d\u7f6e":21,"\u4ee5\u4fdd\u8bc1\u68af\u5ea6\u7684\u6b63\u786e\u8ba1\u7b97":6,"\u4ee5\u4fdd\u8bc1\u68af\u5ea6\u8ba1\u7b97\u7684\u6b63\u786e\u6027":6,"\u4ee5\u4fdd\u8bc1\u7f16\u8bd1\u9ad8\u6548":0,"\u4ee5\u51cf\u5c0fsdk\u7684\u4f53\u79ef":18,"\u4ee5\u53ca":[6,19],"\u4ee5\u53ca\u4f7f\u7528\u5b50\u5e8f\u5217\u6765\u5b9a\u4e49\u5206\u7ea7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u67b6\u6784":42,"\u4ee5\u53ca\u5207\u6362\u673a\u5668\u65f6\u9700\u8981\u65b0\u5b89\u88c5\u7684\u8f9b\u82e6":0,"\u4ee5\u53ca\u53cc\u5c42\u5e8f\u5217":38,"\u4ee5\u53ca\u5982\u4f55\u89e3\u6790\u795e\u7ecf\u7f51\u7edc\u524d\u5411\u8ba1\u7b97\u7684\u8f93\u51fa\u7ed3\u679c":19,"\u4ee5\u53ca\u7b2c\u4e09\u65b9\u4f9d\u8d56\u94fe\u63a5\u5e93\u548c\u5934\u6587\u4ef6":17,"\u4ee5\u53ca\u8ba1\u7b97\u903b\u8f91\u5728\u5e8f\u5217\u4e0a\u7684\u5faa\u73af\u5c55\u5f00":41,"\u4ee5\u53ca\u8f93\u5165\u7684\u68af\u5ea6":6,"\u4ee5\u53carelu":6,"\u4ee5\u5b9e\u73b0\u5bf9\u6a21\u578b\u8bad\u7ec3\u6216\u9884\u6d4b\u6d41\u7a0b\u7684\u63a7\u5236":34,"\u4ee5\u8f93\u51fa":11,"\u4ee5\u9017\u53f7\u95f4\u9694":33,"\u4ee5\u907f\u514d\u94fe\u63a5\u4e0d\u5fc5\u8981\u7684\u5e93":17,"\u4ee5embedding\u5c42\u4e3a\u4f8b":13,"\u4ee5lstm\u4e3a\u4f8b":12,"\u4ef7\u683c":39,"\u4efb\u4f55\u65f6\u5019\u5982\u679c\u9700\u8981\u6d6e\u70b9\u578b\u6570\u7ec4":19,"\u4efb\u52a1\u6765\u7ec8\u6b62\u96c6\u7fa4\u4f5c\u4e1a":23,"\u4efb\u610f\u5c06\u4e00\u4e9b\u6570\u636e\u7ec4\u5408\u6210\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217":39,"\u4f18\u5316\u5668\u5219\u7528\u94fe\u5f0f\u6cd5\u5219\u6765\u5bf9\u6bcf\u4e2a\u53c2\u6570\u8ba1\u7b97\u635f\u5931\u51fd\u6570\u7684\u68af\u5ea6":6,"\u4f1a\u4f7f\u7528":20,"\u4f1a\u5148\u8fdb\u884c\u53c2\u6570\u7684\u521d\u59cb\u5316\u4e0e\u89e3\u6790":27,"\u4f1a\u5171\u4eab\u53c2\u6570":13,"\u4f1a\u5173\u8054\u53c2\u6570":12,"\u4f1a\u52a0\u8f7d\u4e0a\u4e00\u8f6e\u7684\u53c2\u6570":33,"\u4f1a\u53d8\u6210\u8bcd\u8868\u4e2d\u7684\u4f4d\u7f6e":39,"\u4f1a\u542f\u52a8pserver\u4e0etrainer\u8fdb\u7a0b":27,"\u4f1a\u5728":7,"\u4f1a\u5728\u5f53\u524d\u76ee\u5f55\u751f\u6210\u4e24\u4e2a\u5b50\u76ee\u5f55":7,"\u4f1a\u5927\u4e0d\u76f8\u540c":21,"\u4f1a\u5bf9\u6bcf\u4e00\u4e2a\u6fc0\u6d3b\u6682\u5b58\u4e00\u4e9b\u6570\u636e":11,"\u4f1a\u5bf9\u8bad\u7ec3\u6027\u80fd\u9020\u6210\u5f71\u54cd":11,"\u4f1a\u5bf9\u8fd9\u7c7b\u8f93\u5165\u8fdb\u884c\u62c6\u89e3":41,"\u4f1a\u5c06\u6bcf\u4e2a\u65f6\u95f4\u6b65\u7684\u8f93\u51fa\u62fc\u63a5":41,"\u4f1a\u5c06\u7b2c\u4e00\u4e2a":11,"\u4f1a\u6210\u4e3astep\u51fd\u6570\u7684\u8f93\u5165":41,"\u4f1a\u6267\u884c":0,"\u4f1a\u628a\u8bad\u7ec3\u96c6\u548c\u6d4b\u8bd5\u96c6\u5206\u522b\u5206\u5272\u6210\u591a\u4e2a\u6587\u4ef6":21,"\u4f1a\u62a5\u5982\u4e0b\u7684\u9519\u8bef":11,"\u4f1a\u62a5\u9519":41,"\u4f1a\u72ec\u7acb\u62e5\u6709\u4e00\u4efd\u8bad\u7ec3\u597d\u7684\u6a21\u578b":20,"\u4f1a\u76f8\u5e94\u5730\u6539\u53d8\u8f93\u51fa\u7684\u5c3a\u5bf8":6,"\u4f1a\u81ea\u52a8\u5173\u95ed\u5bf9\u5e94\u7684issu":4,"\u4f1a\u81ea\u52a8\u5728\u7f16\u8bd1\u65f6\u4e0b\u8f7d":0,"\u4f1a\u83b7\u53d6\u5f53\u524dnamespace\u4e0b\u7684\u6240\u6709pod":27,"\u4f1a\u88ab":21,"\u4f1a\u88ab\u62c6\u89e3\u4e3a\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":41,"\u4f1a\u88ab\u62c6\u89e3\u4e3a\u975e\u5e8f\u5217":41,"\u4f1a\u901a\u8fc7\u5224\u6570\u636e\u662f\u5426\u9644\u5e26\u6709\u5e8f\u5217\u4fe1\u606f\u6765\u5224\u65ad\u4e00\u4e2a\u5411\u91cf":19,"\u4f1a\u9020\u6210\u90ae\u4ef6\u707e\u96be":4,"\u4f20\u7ed9dataprovider\u7684\u67d0\u4e00\u4e2aargs\u8fc7\u5927":13,"\u4f20\u9012\u7ed9\u914d\u7f6e\u6587\u4ef6\u7684\u53c2\u6570":33,"\u4f46\u4e0d\u7528\u4e8e\u8ba1\u7b97\u68af\u5ea6":6,"\u4f46\u4e0d\u9700\u8981\u63d0\u524d\u521b\u5efa":33,"\u4f46\u4e8e\u53cc\u5c42\u5e8f\u5217\u7684lstm\u6765\u8bf4":39,"\u4f46\u53ef\u4ee5\u83b7\u53d6":11,"\u4f46\u548c\u5355\u5c42rnn\u4e0d\u540c":39,"\u4f46\u5b50\u53e5\u542b\u6709\u7684\u8bcd\u8bed\u6570\u53ef\u4ee5\u4e0d\u76f8\u7b49":41,"\u4f46\u5c3d\u91cf\u8bf7\u4fdd\u6301\u7f16\u8bd1\u548c\u8fd0\u884c\u4f7f\u7528\u7684cudnn\u662f\u540c\u4e00\u4e2a\u7248\u672c":0,"\u4f46\u5e8f\u5217\u8f93\u51fa\u65f6":39,"\u4f46\u622a\u65ad\u65f6\u673a\u4e0d\u540c":11,"\u4f46\u662f":[11,39],"\u4f46\u662f\u5927\u90e8\u5206\u53c2\u6570\u662f\u4e3a\u5f00\u53d1\u8005\u63d0\u4f9b\u7684":32,"\u4f46\u662f\u5b50\u5e8f\u5217\u7684\u6570\u76ee\u5fc5\u987b\u4e00\u6837":39,"\u4f46\u662f\u5e76\u4e0d\u80fd\u4fdd\u8bc1\u53c2\u6570\u540c\u6b65\u66f4\u65b0":22,"\u4f46\u662f\u652f\u6301avx\u6307\u4ee4\u96c6":4,"\u4f46\u662f\u6bcf\u4e2a\u6837\u672c\u4ec5\u5305\u542b\u51e0\u4e2a\u8bcd":35,"\u4f46\u662f\u7a81\u7136\u6709\u4e00\u4e2a10000\u957f\u7684\u5e8f\u5217":11,"\u4f46\u662f\u865a\u62df\u7684\u4e0d\u4ec5\u4ec5\u662f":0,"\u4f46\u662fbatch":11,"\u4f46\u6709\u503c\u7684\u5730\u65b9\u5fc5\u987b\u4e3a1":14,"\u4f46\u6709\u503c\u7684\u90e8\u5206\u53ef\u4ee5\u662f\u4efb\u4f55\u6d6e\u70b9\u6570":14,"\u4f46\u7531\u4e8ecuda\u5e93\u901a\u5e38\u9700\u8981cento":3,"\u4f4d\u7f6e":39,"\u4f4f":39,"\u4f5c\u4e3a\u4e0b\u4e00\u4e2a\u5b50\u53e5memory\u7684\u521d\u59cb\u72b6\u6001":39,"\u4f5c\u4e3a\u4f8b\u5b50\u6f14\u793a\u5982\u4f55\u914d\u7f6e\u590d\u6742\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u6a21\u578b":42,"\u4f5c\u4e3a\u53c2\u6570\u7684id":13,"\u4f5c\u4e3a\u5f53\u524d\u65f6\u523b\u8f93\u5165":41,"\u4f5c\u4e3a\u7edf\u8ba1\u7684\u57fa\u672c\u5355\u4f4d":19,"\u4f5c\u4e3a\u8c03\u7528":20,"\u4f5c\u4e3a\u8f93\u5165":19,"\u4f5c\u4e3a\u8f93\u51fa":42,"\u4f5c\u4e3aboot_layer\u4f20\u7ed9\u4e0b\u4e00\u4e2a\u5b50\u53e5\u7684memori":39,"\u4f5c\u7528":38,"\u4f60\u53ef\u4ee5\u5c06\u7f51\u7edc\u914d\u7f6e\u6210\u67d0\u4e9b\u5c42\u4f7f\u7528gpu\u8ba1\u7b97":35,"\u4f60\u8fd8\u53ef\u4ee5\u901a\u8fc7\u8fd0\u884cdjango\u6846\u67b6\u76f4\u63a5\u6fc0\u6d3b\u5de5\u5177\u7684\u670d\u52a1\u5668":7,"\u4f60\u9700\u8981\u4e00\u4e9b\u66f4\u590d\u6742\u7684\u5355\u5143\u6d4b\u8bd5\u6765\u4fdd\u8bc1\u4f60\u5b9e\u73b0\u7684\u7f51\u7edc\u5c42\u662f\u6b63\u786e\u7684":6,"\u4f60\u9700\u8981\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u6307\u5b9a\u8bbe\u5907\u7684id\u53f7":35,"\u4f60\u9700\u8981\u5728\u914d\u7f6ecmake\u65f6\u5c06":6,"\u4f60\u9700\u8981\u628a\u8be5\u6587\u4ef6\u52a0\u5165":6,"\u4f7f\u4e4b\u53d8\u4e3a\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u8f93\u5165":19,"\u4f7f\u4e4b\u53d8\u4e3a\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217\u8f93\u5165":19,"\u4f7f\u5f97\u5355\u5143\u6d4b\u8bd5\u6709\u4e00\u4e2a\u5e72\u51c0\u7684\u73af\u5883":8,"\u4f7f\u5f97\u642d\u6a21\u578b\u65f6\u66f4\u65b9\u4fbf":6,"\u4f7f\u68af\u5ea6\u7684\u63d0\u4ea4\u548c\u53c2\u6570\u7684\u66f4\u65b0\u6309\u7167\u987a\u5e8f\u65b9\u5f0f\u6267\u884c":22,"\u4f7f\u7528":[6,11,12,13,17,19,20,33,37,39,41,42],"\u4f7f\u75280\u53f7\u548c1\u53f7gpu\u8ba1\u7b97fc2\u5c42":35,"\u4f7f\u75280\u53f7gpu\u8ba1\u7b97fc2\u5c42":35,"\u4f7f\u75281\u53f7gpu\u8ba1\u7b97fc3\u5c42":35,"\u4f7f\u75282\u53f7\u548c3\u53f7gpu\u8ba1\u7b97fc3\u5c42":35,"\u4f7f\u7528\u4e00\u4e2a\u5c3a\u5ea6\u4e3a":6,"\u4f7f\u7528\u4e00\u4e2a\u8bcd\u524d\u4e24\u4e2a\u8bcd\u548c\u540e\u4e24\u4e2a\u8bcd":11,"\u4f7f\u7528\u4e0a\u6587\u521b\u5efa\u7684yaml\u6587\u4ef6\u521b\u5efakubernet":26,"\u4f7f\u7528\u4e0b\u9762\u7684\u547d\u4ee4\u6765\u8fd0\u884c\u5b83":7,"\u4f7f\u7528\u4e86\u540c\u6837\u7684parameter\u548cbia":13,"\u4f7f\u7528\u4ee5\u4e0a\u8bad\u7ec3\u597d\u7684\u6a21\u578b\u8fdb\u884c\u9884\u6d4b":14,"\u4f7f\u7528\u53c2\u6570":[0,21],"\u4f7f\u7528\u591a\u5757\u663e\u5361\u8bad\u7ec3":11,"\u4f7f\u7528\u591a\u7ebf\u7a0b\u8bad\u7ec3":11,"\u4f7f\u7528\u5b66\u4e60\u5b8c\u6210\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u751f\u6210\u5e8f\u5217":42,"\u4f7f\u7528\u5bb9\u5668\u65b9\u5f0f\u8fd0\u884c\u8bad\u7ec3\u4efb\u52a1\u7684kubernet":27,"\u4f7f\u7528\u6211\u4eec\u4e4b\u524d\u6784\u9020\u7684\u955c\u50cf":26,"\u4f7f\u7528\u663e\u5361\u8bad\u7ec3":11,"\u4f7f\u7528\u6848\u4f8b":34,"\u4f7f\u7528\u73af\u5883\u53d8\u91cf":21,"\u4f7f\u7528\u8005\u4e0d\u9700\u8981\u5173\u5fc3":33,"\u4f7f\u7528\u8005\u53ea\u9700\u8981\u5173\u6ce8\u4e8e\u8bbe\u8ba1rnn\u5728\u4e00\u4e2a\u65f6\u95f4\u6b65\u4e4b\u5185\u5b8c\u6210\u7684\u8ba1\u7b97":41,"\u4f7f\u7528\u8005\u65e0\u9700\u5173\u5fc3\u8fd9\u4e2a\u53c2\u6570":33,"\u4f7f\u7528\u8005\u901a\u5e38\u65e0\u9700\u5173\u5fc3":33,"\u4f7f\u7528\u8be5learning_rate_schedule\u65f6":13,"\u4f7f\u7528\u8fd9\u79cd\u65b9\u5f0f":[20,39],"\u4f7f\u7528\u8fdc\u7a0b\u7a00\u758f\u65b9\u5f0f\u8bad\u7ec3\u65f6":6,"\u4f7f\u7528c":17,"\u4f7f\u7528checkgrad\u6a21\u5f0f\u65f6\u7684\u53c2\u6570\u53d8\u5316\u5927\u5c0f":33,"\u4f7f\u7528cpu\u4e24\u7ebf\u7a0b\u8ba1\u7b97fc4\u5c42":35,"\u4f7f\u7528cpu\u8ba1\u7b97fc4\u5c42":35,"\u4f7f\u7528docker":1,"\u4f7f\u7528docker\u5b89\u88c5\u548c\u8fd0\u884cpaddlepaddle\u53ef\u4ee5\u65e0\u9700\u8003\u8651":1,"\u4f7f\u7528docker\u5b89\u88c5\u8fd0\u884c":2,"\u4f7f\u7528docker\u5c31\u4e0d\u7528\u914d\u7f6e\u4ea4\u53c9\u7f16\u8bd1\u73af\u5883\u4e86":0,"\u4f7f\u7528docker\u6784\u5efapaddlepaddle\u7684\u6587\u6863":7,"\u4f7f\u7528fabric\u542f\u52a8\u96c6\u7fa4\u8bad\u7ec3":24,"\u4f7f\u7528init":35,"\u4f7f\u7528lstm\u4f5c\u4e3aencod":39,"\u4f7f\u7528memory\u7684rnn\u5b9e\u73b0\u4fbf\u5982\u4e0b\u56fe\u6240\u793a":39,"\u4f7f\u7528model":35,"\u4f7f\u7528openblas\u7684\u955c\u50cf":1,"\u4f7f\u7528paddlepaddl":[17,20],"\u4f7f\u7528pip\u5b89\u88c5":2,"\u4f7f\u7528rdma\u8fd8\u662ftcp\u4f20\u8f93\u534f\u8bae":33,"\u4f7f\u8be5\u5c42\u7684\u53c2\u6570\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4fdd\u6301\u4e0d\u53d8":13,"\u4f86":39,"\u4f8b\u5982":[0,6,11,12,14,19,21,27,32,33,35,37,39,42],"\u4f8b\u5982\u4e0b\u9762\u4ee3\u7801":11,"\u4f8b\u5982\u4e5f\u53ef\u5728\u7a0b\u5e8f\u8fd0\u884c\u8fc7\u7a0b\u4e2d\u518d\u52a0\u8f7d\u53e6\u5916\u4e00\u4e2a\u6a21\u578b":20,"\u4f8b\u5982\u4f7f\u7528":11,"\u4f8b\u5982\u542b\u6709\u591a\u4e2a\u901a\u9053\u7684\u56fe\u7247":19,"\u4f8b\u5982\u5c06\u7b2c\u4e00\u6761\u6570\u636e\u8f6c\u5316\u4e3a":39,"\u4f8b\u5982\u6587\u672c\u5206\u7c7b\u4e2d":39,"\u4f8b\u5982\u672c\u4f8b\u4e2d\u7684\u4e24\u4e2a\u7279\u5f81":39,"\u4f8b\u5982\u673a\u5668\u4e0a\u67094\u5757gpu":11,"\u4f8b\u5982output\u76ee\u5f55\u4e0b\u5c31\u5b58\u653e\u4e86\u8f93\u51fa\u7ed3\u679c":27,"\u4f8b\u5982sigmoid":6,"\u4f8b\u5b50\u4e2d\u4e3a3\u4e2a":21,"\u4f8b\u5b50\u4e2d\u662f":6,"\u4f8b\u5b50\u4e2d\u662f0":6,"\u4f8b\u5b50\u4e2d\u662f100":6,"\u4f8b\u5b50\u4e2d\u662f4096":6,"\u4f8b\u5b50\u4e2d\u662f8192":6,"\u4f8b\u5b50\u4e2d\u662ffc":6,"\u4f8b\u5b50\u4e2d\u662fsoftmax":6,"\u4f9bpaddlepaddle\u52a0\u8f7d":33,"\u4f9d\u8d56":[0,3],"\u4f9d\u8d56\u73af\u5883\u5373\u53ef\u8fd0\u884c":1,"\u4f9d\u8d56libpython2":0,"\u4fbf\u5229":39,"\u4fbf\u548c\u5355\u5c42rnn\u914d\u7f6e\u4e2d\u7684":39,"\u4fbf\u5b9c":39,"\u4fdd\u5b58\u6a21\u578b\u53c2\u6570\u7684\u76ee\u5f55":33,"\u4fdd\u5b58\u7684\u53c2\u6570\u4e5f\u662ffloat\u7c7b\u578b":13,"\u4fdd\u5b58\u7f51\u7edc\u5c42\u8f93\u51fa\u7ed3\u679c\u7684\u76ee\u5f55":33,"\u4fdd\u5b58\u9884\u6d4b\u7ed3\u679c\u7684\u6587\u4ef6\u540d":33,"\u4fdd\u6301\u5c3d\u91cf\u5c11\u7684commit":4,"\u4fdd\u8bc1\u4f7f\u7528gpu\u8bad\u7ec3\u65f6\u4e5f\u53ef\u4ee5\u83b7\u5f97":11,"\u4fe1\u53f7\u6765\u81ea\u52a8\u7ec8\u6b62\u5b83\u542f\u52a8\u7684\u6240\u6709\u8fdb\u7a0b":23,"\u4fe1\u606f":19,"\u4fee\u6539":26,"\u4fee\u6539\u542f\u52a8\u811a\u672c\u540e":26,"\u4fee\u6539\u6210\u66f4\u5feb\u7684\u7248\u672c":37,"\u503c\u5f97\u6ce8\u610f\u7684\u662f":[4,39],"\u503c\u5f97\u6df1\u5165\u5206\u6790":37,"\u503c\u7c7b\u578b":35,"\u5047\u5982\u6211\u4eec\u662f\u4e09\u5206\u7c7b\u95ee\u9898":13,"\u5047\u8bbe":6,"\u5047\u8bbe\u60a8\u5df2\u7ecf\u5728\u5f53\u524d\u76ee\u5f55":1,"\u5047\u8bbe\u635f\u5931\u51fd\u6570\u662f":6,"\u5047\u8bbe\u7b2c\u4e00\u4e2alayer\u7684\u8f93\u51faa\u662f\u4e00\u4e2a":11,"\u504f\u7f6e\u53c2\u6570\u7684\u5927\u5c0f":6,"\u505c\u6b62\u52a0\u8f7d\u6570\u636e":33,"\u505c\u7535":39,"\u5143\u7d20":38,"\u5143\u7d20\u4e4b\u95f4\u7684\u987a\u5e8f\u662f\u5e8f\u5217\u6240\u643a\u5e26\u7684\u91cd\u8981\u4fe1\u606f":19,"\u5143\u7d20\u4e4b\u95f4\u7684\u987a\u5e8f\u662f\u91cd\u8981\u7684\u8f93\u5165\u4fe1\u606f":38,"\u5148\u5378\u8f7d\u4e4b\u524d\u7684\u7248\u672c":0,"\u5148\u627e\u51fa\u53c2\u6570":12,"\u5148\u67e5\u770b\u4e00\u4e0b\u662f\u5426\u66fe\u7ecf\u5b89\u88c5\u8fc7paddl":8,"\u5168\u5bb6":39,"\u5168\u8fde\u63a5\u5c42\u4ee5\u4e00\u4e2a\u7ef4\u5ea6\u4e3a":6,"\u5168\u8fde\u63a5\u5c42\u6ca1\u6709\u7f51\u7edc\u5c42\u914d\u7f6e\u7684\u8d85\u53c2\u6570":6,"\u5168\u8fde\u63a5\u5c42\u7684\u5b9e\u73b0\u4f4d\u4e8e":6,"\u5168\u8fde\u63a5\u5c42\u7684\u6bcf\u4e2a\u8f93\u51fa\u90fd\u8fde\u63a5\u5230\u4e0a\u4e00\u5c42\u7684\u6240\u6709\u7684\u795e\u7ecf\u5143\u4e0a":6,"\u5168\u8fde\u63a5\u5c42python\u5c01\u88c5\u7684\u4f8b\u5b50\u4e2d\u5305\u542b\u4e0b\u9762\u51e0\u6b65":6,"\u516c\u5f0f":1,"\u5171\u4eab\u5b58\u50a8\u6302\u5728\u7684\u8def\u5f84":27,"\u5173\u4e8e\u4ec0\u4e48\u662f":19,"\u5173\u4e8e\u65f6\u95f4\u5e8f\u5217":39,"\u5173\u4e8e\u6784\u5efa\u548c\u6d4b\u8bd5\u7684\u66f4\u591a\u4fe1\u606f":4,"\u5173\u4e8eavx":1,"\u5173\u4e8ec":18,"\u5173\u4e8elstm":12,"\u5173\u4e8epaddlepaddle\u7684\u5206\u5e03\u5f0f\u8bad\u7ec3":27,"\u5173\u4e8epaddlepaddle\u7684\u66f4\u591a\u4f7f\u7528\u65b9\u6cd5\u8bf7\u53c2\u8003":14,"\u5173\u4e8eunbound":41,"\u5173\u952e\u8bcd\u5305\u62ec":4,"\u5176\u4e2d":[6,11,13,14,42],"\u5176\u4e2d\u5305\u542b\u4e86\u7528\u6237\u7684\u8bad\u7ec3\u7a0b\u5e8f":21,"\u5176\u4e2d\u6bcf\u4e2a\u5143\u7d20\u662f\u53cc\u5c42\u5e8f\u5217\u4e2d\u6bcf\u4e2asubseq\u6700\u540e\u4e00\u4e2a":38,"\u5176\u4e2dcheckgrad\u4e3b\u8981\u4e3a\u5f00\u53d1\u8005\u4f7f\u7528":33,"\u5176\u4e2dmean\u548cstd\u662f\u8bad\u7ec3\u914d\u7f6e\u4e2d\u7684\u53c2\u6570":33,"\u5176\u4e2dx\u8868\u793a\u8f93\u5165\u6570\u636e\u662f\u4e00\u4e2a\u7ef4\u5ea6\u4e3a2\u7684\u7a20\u5bc6\u5411\u91cf":14,"\u5176\u4ed6\u4eba\u53ef\u4ee5\u590d\u73b0\u95ee\u9898\u4ee5\u4fbf\u5e2e\u52a9":0,"\u5176\u4ed6\u5185\u5b58\u6742\u9879":11,"\u5176\u4ed6\u5185\u5b58\u6742\u9879\u662f\u6307paddlepaddle\u672c\u8eab\u6240\u7528\u7684\u4e00\u4e9b\u5185\u5b58":11,"\u5176\u4ed6\u6240\u6709\u5c42\u90fd\u4f1a\u4f7f\u7528gpu\u8ba1\u7b97":35,"\u5176\u4ed6\u7684\u4f9d\u8d56\u8f6f\u4ef6":0,"\u5176\u4ed6\u9ad8\u7ea7\u529f\u80fd\u5305\u62ec\u5b9a\u4e49\u591a\u4e2amemori":42,"\u5176\u4f1a\u81ea\u52a8\u88ab\u52a0\u5165\u7f16\u8bd1\u5217\u8868":6,"\u5176\u5b83\u53ef\u9009\u7f16\u8bd1\u9009\u9879\u6309\u9700\u8fdb\u884c\u8bbe\u5b9a":17,"\u5176\u5b83layer\u7684\u8f93\u51fa":41,"\u5176\u5b9e\u4e5f\u662f\u548c\u6bcf\u4e2amini":11,"\u5176\u6b21":39,"\u5176\u8bf4\u660e\u5982\u4e0b":39,"\u5176\u8f93\u51fa\u88ab\u7528\u4f5cmemory\u7684\u521d\u59cb\u503c":42,"\u5176name\u7531\u53c2\u6570":12,"\u5177\u4f53\u4f7f\u7528\u65b9\u6cd5\u4e3a":11,"\u5177\u4f53\u505a\u6cd5\u8bf7\u53c2\u8003":0,"\u5177\u4f53\u53ef\u4ee5\u53c2\u8003":[6,11],"\u5177\u4f53\u53ef\u53c2\u8003\u6587\u6863":41,"\u5177\u4f53\u60c5\u51b5\u56e0\u4eba\u800c\u5f02":37,"\u5177\u4f53\u64cd\u4f5c\u5982\u4e0b":8,"\u5177\u4f53\u6b65\u9aa4\u5982\u4e0b":8,"\u5177\u4f53\u7684\u89e3\u51b3\u65b9\u6cd5\u662f":8,"\u5177\u4f53\u8bf7\u53c2\u8003":4,"\u5177\u4f53\u8bf7\u89c1":4,"\u5177\u6709\u76f8\u540c\u7684\u7ed3\u679c\u4e86":39,"\u5185":42,"\u5185\u5b58":37,"\u5185\u5b58\u4e0d\u8db3":9,"\u5185\u5b58\u5bb9\u9650\u9608\u503c":33,"\u5185\u5bb9\u5982\u4e0b":26,"\u5185\u5c42\u5e8f\u5217\u5728":19,"\u5185\u5c42inner_step\u7684recurrent_group\u548c\u5355\u5c42\u5e8f\u5217\u7684\u51e0\u4e4e\u4e00\u6837":39,"\u5185\u5df2\u7ecf\u5305\u542bpaddlepaddle\u7684\u6267\u884c\u7a0b\u5e8f\u4f46\u662f\u8fd8\u6ca1\u4e0a\u8ff0\u529f\u80fd":27,"\u5185\u7f6e\u7684":20,"\u5185\u90e8":[20,27],"\u5185\u90e8\u7531":[19,20],"\u518d\u505a\u4e00\u5b9a\u7684reshap":12,"\u518d\u5199\u5165\u7f51\u7edc\u53c2\u6570":13,"\u518d\u5b89\u88c5":[3,8],"\u518d\u5bf9\u6bcf\u4e00\u4e2a\u5355\u5c42\u65f6\u95f4\u5e8f\u5217\u8fdb\u884c\u5904\u7406":39,"\u518d\u5bf9\u6bcf\u4e00\u53e5\u8bdd\u7684\u7f16\u7801\u5411\u91cf\u7528lstm\u7f16\u7801\u6210\u4e00\u4e2a\u6bb5\u843d\u7684\u5411\u91cf":39,"\u518d\u5bf9\u8fd9\u4e2a\u6bb5\u843d\u5411\u91cf\u8fdb\u884c\u5206\u7c7b":39,"\u518d\u5c06\u66f4\u65b0\u540e\u7684\u53c2\u6570\u4e0b\u53d1\u5230\u6bcf\u4e2a\u8ba1\u7b97\u8282\u70b9":22,"\u518d\u6307\u5b9a":0,"\u518d\u6b21\u5bf9\u4ee3\u7801\u8fdb\u884c\u6027\u80fd\u5206\u6790":37,"\u518d\u7528\u8fd9\u4e2a\u68af\u5ea6\u53bb\u548c":6,"\u518d\u901a\u8fc7\u51fd\u6570":27,"\u518d\u91cd\u65b0\u5b89\u88c5":0,"\u5199\u5165\u6587\u4ef6\u4e2d":20,"\u5199\u68af\u5ea6\u68c0\u67e5\u5355\u5143\u6d4b\u8bd5\u662f\u4e00\u4e2a\u9a8c\u8bc1\u65b0\u5b9e\u73b0\u7684\u5c42\u662f\u5426\u6b63\u786e\u7684\u76f8\u5bf9\u7b80\u5355\u7684\u529e\u6cd5":6,"\u51c6\u5907":39,"\u51c6\u5907\u60a8\u7684\u8ba1\u7b97\u96c6\u7fa4":31,"\u51c6\u5907\u8bad\u7ec3\u6570\u636e":28,"\u51c6\u5907\u8bad\u7ec3\u6570\u636e\u548c\u9a8c\u8bc1\u6570\u636e\u96c6":21,"\u51c6\u5907\u9884\u6d4b\u6a21\u578b\u548c":20,"\u51c6\u5907\u9884\u6d4b\u6a21\u578b\u90e8\u5206":20,"\u51cf\u5c0f\u5e8f\u5217\u7684\u957f\u5ea6":11,"\u51cf\u5c0f\u8fd9\u4e2a\u5185\u5b58\u6c60\u5373\u53ef\u51cf\u5c0f\u5185\u5b58\u5360\u7528":11,"\u51cf\u5c0fbatch":11,"\u51e0\u53f0\u5230\u51e0\u5343\u53f0\u89c4\u6a21":31,"\u51fa\u53bb\u73a9":39,"\u51fa\u5dee":39,"\u51fa\u6765":39,"\u51fa\u73b0":8,"\u51fa\u73b0\u4ee5\u4e0b\u9519\u8bef":13,"\u51fa\u73b0\u8be5\u9519\u8bef\u7684\u539f\u56e0\u4e00\u822c\u662f\u7528\u6237\u5bf9\u4e0d\u540clayer\u7684\u53c2\u6570":12,"\u51fa\u73b0\u8fd9\u4e2a\u95ee\u9898\u7684\u4e3b\u8981\u539f\u56e0\u662f":[3,8],"\u51fd\u6570":[6,19,37,42],"\u51fd\u6570\u5047\u8bbe":42,"\u51fd\u6570\u52a0\u5230\u4ee3\u7801\u4e2d":37,"\u51fd\u6570\u53ea\u5173\u6ce8\u4e8ernn\u4e00\u4e2a\u65f6\u95f4\u6b65\u4e4b\u5185\u7684\u8ba1\u7b97":41,"\u51fd\u6570\u5f97\u5230\u7684\u68af\u5ea6\u53bb\u5bf9\u6bd4":6,"\u51fd\u6570\u5fc5\u987b\u5148\u8c03\u7528\u57fa\u7c7b\u4e2d\u7684\u51fd\u6570":6,"\u51fd\u6570\u5fc5\u987b\u8fd4\u56de\u4e00\u4e2a\u6216\u591a\u4e2alayer\u7684\u8f93\u51fa":41,"\u51fd\u6570\u6307\u51fa\u4e86\u5728\u8bad\u7ec3\u65f6\u9700\u8981\u4ece\u53c2\u6570\u670d\u52a1\u5668\u53d6\u51fa\u7684\u884c":6,"\u51fd\u6570\u6765\u5c06\u4fe1\u606f\u8f93\u51fa\u5230\u754c\u9762\u4e2d":37,"\u51fd\u6570\u7684\u5b9e\u73b0\u662f\u6b63\u786e\u7684":6,"\u51fd\u6570\u7684\u5f00\u5934\u5fc5\u987b\u8c03\u7528":6,"\u51fd\u6570\u80fd\u591f\u5c06\u4f7f\u7528":20,"\u5206\u4e3a":20,"\u5206\u522b\u4e3a\u6570\u636e\u8f93\u5165\u6dfb\u52a0\u5916\u5c42\u5e8f\u5217\u548c\u5185\u5c42\u5e8f\u5217\u7684\u5e8f\u5217\u4fe1\u606f":19,"\u5206\u522b\u4ece\u8bcd\u8bed\u548c\u53e5\u5b50\u7ea7\u522b\u7f16\u7801\u8f93\u5165\u6570\u636e":41,"\u5206\u522b\u4f7f\u7528\u5355\u53cc\u5c42rnn\u4f5c\u4e3a\u7f51\u7edc\u914d\u7f6e\u7684\u6a21\u578b":39,"\u5206\u522b\u5b9a\u4e49\u5b50\u53e5\u7ea7\u522b\u548c\u8bcd\u8bed\u7ea7\u522b\u4e0a\u9700\u8981\u5b8c\u6210\u7684\u8fd0\u7b97":41,"\u5206\u522b\u662f":38,"\u5206\u522b\u662frnn\u72b6\u6001\u548c\u8f93\u5165\u7684\u53d8\u6362\u77e9\u9635":42,"\u5206\u522b\u662fsentences\u548clabel":39,"\u5206\u522b\u662fwords\u548clabel":39,"\u5206\u522b\u8ba1\u7b97\u6bcf\u4e2a\u53c2\u6570\u7684\u68af\u5ea6":6,"\u5206\u522b\u8fdb\u884c\u5e8f\u5217\u64cd\u4f5c":39,"\u5206\u5e03\u5f0f\u8bad\u7ec3":36,"\u5206\u5e03\u5f0f\u8bad\u7ec3\u67b6\u6784\u5982\u4e0b\u56fe\u6240\u793a":22,"\u5206\u652f":4,"\u5206\u652f\u4e0a":4,"\u5206\u652f\u4e0a\u521b\u5efa\u65b0\u5206\u652f":4,"\u5206\u652f\u540d":4,"\u5206\u652f\u89c4\u8303":4,"\u5206\u6790\u5f97\u5230\u7684\u4fe1\u606f\u7528\u4e8e\u534f\u52a9\u8fdb\u884c\u7a0b\u5e8f\u7684\u4f18\u5316":37,"\u5206\u7c7b\u4efb\u52a1\u4e2d\u7c7b\u522b\u6807\u7b7e":19,"\u5206\u914d\u5230\u5f53\u524d\u6570\u636e\u5757\u6837\u672c\u6570\u7684\u56db\u5206\u4e4b\u4e00":33,"\u5207\u6362\u5230":4,"\u5207\u6362\u5230\u6240\u5efa\u5206\u652f":4,"\u5217\u5143\u7d20\u6392\u5217\u6210\u7684\u77e9\u5f62\u9635\u5217":19,"\u5217\u8868\u5982\u4e0b":14,"\u5219\u4e0d\u9700\u8981\u91cd\u5199\u8be5\u51fd\u6570":6,"\u5219\u4f1a\u4f7f\u7528openblas\u4f5c\u4e3ablas\u5e93":0,"\u5219\u4f7f\u7528\u540c\u6b65\u8bad\u7ec3":33,"\u5219\u4f7f\u7528\u8be5\u53c2\u6570\u4f5c\u4e3a\u9ed8\u8ba4\u503c":33,"\u5219\u5e76\u4e0d\u4f1a\u7b49\u5f85\u6240\u6709trainer\u63d0\u4ea4\u68af\u5ea6\u624d\u66f4\u65b0\u53c2\u6570":22,"\u5219\u603b\u4f1a\u663e\u793a\u963b\u9694\u6458\u8981\u4fe1\u606f":33,"\u5219\u662f\u5e26gui\u7684nvidia\u53ef\u89c6\u5316\u6027\u80fd\u5206\u6790\u5de5\u5177":37,"\u5219\u663e\u793a\u963b\u9694\u6027\u80fd\u7684\u6458\u8981\u4fe1\u606f":33,"\u5219\u9700\u8981\u4f7f\u7528\u7b49\u4e8e\u6743\u91cd\u53c2\u6570\u89c4\u6a21\u5927\u7ea65\u500d\u7684\u5185\u5b58":11,"\u5219\u9700\u8981\u5728\u672c\u673a\u5b89\u88c5\u4e0b\u9762\u7ae0\u8282\u5217\u51fa\u7684":0,"\u5219\u9700\u8981\u624b\u52a8\u62f7\u8d1d\u5c5e\u4e8e\u6bcf\u4e2atrainer\u8282\u70b9\u7684\u8bad\u7ec3\u6570\u636e\u5230\u5bf9\u5e94\u7684\u8282\u70b9\u4e0a":21,"\u521b\u5efa":[19,20],"\u521b\u5efa\u4e00\u4e2a":16,"\u521b\u5efa\u4e00\u4e2akubernet":27,"\u521b\u5efa\u5e76\u5207\u6362\u5230\u65b0\u5206\u652f":4,"\u521b\u5efa\u6210\u529f\u540e":27,"\u521b\u5efa\u65e5\u5fd7\u76ee\u5f55":28,"\u521b\u5efa\u7a00\u758f\u77e9\u9635\u65f6\u9700\u8981\u663e\u793a\u5730\u6307\u5b9a\u77e9\u9635\u7684":19,"\u521b\u5efaissu":2,"\u521d\u59cb\u5316\u504f\u7f6e\u5411\u91cf":6,"\u521d\u59cb\u5316\u6743\u91cd\u8868":6,"\u521d\u59cb\u5316\u6a21\u578b\u7684\u8def\u5f84":33,"\u521d\u59cb\u5316\u7236\u7c7b":6,"\u521d\u59cb\u5316biases_":6,"\u521d\u59cb\u72b6\u6001":41,"\u5220\u9664":4,"\u5229\u7528\u5206\u5e03\u5f0f\u8bad\u7ec3\u9a7e\u9a6d\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90":11,"\u5229\u7528\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90\u53ef\u4ee5\u5206\u4e3a\u4ee5\u4e0b\u51e0\u4e2a\u65b9\u5f0f\u6765\u8fdb\u884c":11,"\u5229\u7528\u8fd9\u79cd\u7279\u6027":41,"\u5229\u843d":39,"\u522b\u4eba\u5e2e\u4e86\u5fd9":4,"\u522b\u5fd8\u4e86":0,"\u5230":[8,42],"\u5230\u6307\u5b9a\u6587\u4ef6\u4e2d":20,"\u5230\u672c\u5730":4,"\u5236\u4f5c\u65b0\u955c\u50cf\u6765\u5b8c\u6210\u4ee5\u4e0a\u7684\u5de5\u4f5c":27,"\u5236\u4f5cpaddlepaddle\u955c\u50cf":27,"\u5237\u7259":39,"\u524d\u4e00\u7bc7\u6587\u7ae0\u4ecb\u7ecd\u4e86\u5982\u4f55\u5728kubernetes\u96c6\u7fa4\u4e0a\u542f\u52a8\u4e00\u4e2a\u5355\u673apaddlepaddle\u8bad\u7ec3\u4f5c\u4e1a":27,"\u524d\u53f0":39,"\u524d\u5411\u4f20\u64ad":6,"\u524d\u5411\u4f20\u64ad\u7ed9\u5b9a\u8f93\u5165":6,"\u524d\u5411\u548c\u540e\u5411":6,"\u524d\u5411\u8ba1\u7b97\u4e4b\u540epaddlepaddle\u5185\u90e8\u5df2\u7ecf\u5206\u914d":20,"\u524d\u8005\u5728":11,"\u524d\u8005\u622a\u65ad\u53ef\u5b66\u4e60\u53c2\u6570\u7684\u68af\u5ea6":11,"\u524d\u81ea\u52a8\u68c0\u67e5\u4e00\u4e9b\u57fa\u672c\u4e8b\u5b9c":4,"\u524d\u9988":22,"\u52a0\u4e0a\u504f\u7f6e\u5411\u91cf":6,"\u52a0\u5165":37,"\u52a0\u6743\u548c\u7528\u6765\u751f\u6210":42,"\u52a0\u6743\u7f16\u7801\u5411\u91cf":42,"\u52a0\u8f7d\u5177\u4f53\u7f51\u7edc\u53c2\u6570":13,"\u52a0\u8f7d\u6a21\u578b\u53ef\u5176\u5b83\u591a\u79cd\u65b9\u5f0f":20,"\u52a0\u8f7d\u6a21\u578b\u9700\u540c\u65f6\u6307\u5b9a":20,"\u52a0\u8f7d\u9884\u6d4b\u6a21\u578b":20,"\u52a0\u8f7d\u9884\u8bad\u7ec3\u53c2\u6570":13,"\u52a0\u8f7dtest":33,"\u52a0\u901f\u7f16\u8bd1":0,"\u52a0\u901fpaddlepaddle\u8bad\u7ec3\u53ef\u4ee5\u8003\u8651\u4ece\u4ee5\u4e0b\u51e0\u4e2a\u65b9\u9762":11,"\u52a8\u6001\u5e93":17,"\u52a9\u624b":6,"\u5305\u542b\u4f46\u4e0d\u9650\u4e8e":0,"\u5305\u542b\u6d4b\u8bd5\u6570\u636e\u96c6\u7684\u76ee\u5f55":21,"\u5305\u542b\u8bad\u7ec3\u6570\u636e\u7684\u76ee\u5f55":21,"\u5305\u62ec":[17,33],"\u5305\u62ec\u4e86\u7f16\u8bd1\u51fa\u7684paddlepaddle\u5934\u6587\u4ef6\u548c\u94fe\u63a5\u5e93":17,"\u5305\u62ec\u5b57\u7b26\u4e32\u5206\u914d":11,"\u5305\u62ec\u751f\u6210cpu":0,"\u5305\u62ec\u795e\u7ecf\u7f51\u7edc\u62d3\u6251\u7ed3\u6784":14,"\u5305\u62ecbool":35,"\u5305\u7684\u65b9\u6cd5\u662f":8,"\u533a\u522b\u662f\u540c\u65f6\u5904\u7406\u4e86\u4e24\u4e2a\u8f93\u5165":39,"\u533a\u522b\u662frnn\u4f7f\u7528\u4e24\u5c42\u5e8f\u5217\u6a21\u578b":39,"\u5341\u4e00":39,"\u534e\u6da6\u4e07\u5bb6":39,"\u5355\u4f4d\u662fmb":33,"\u5355\u5143\u6d4b\u8bd5\u4f1a\u5f15\u7528site":8,"\u5355\u5143\u6d4b\u8bd5checkgrad_ep":32,"\u5355\u53cc\u5c42\u5e8f\u5217\u7684\u53e5\u5b50\u662f\u4e00\u6837\u7684":39,"\u5355\u53cc\u5c42rnn":40,"\u5355\u5c42":41,"\u5355\u5c42\u4e0d\u7b49\u957frnn":39,"\u5355\u5c42\u548c\u53cc\u5c42\u5e8f\u5217\u7684\u4f7f\u7528\u548c\u793a\u4f8b2\u4e2d\u7684\u793a\u4f8b\u7c7b\u4f3c":39,"\u5355\u5c42\u5e8f\u5217":[19,38],"\u5355\u5c42\u5e8f\u5217\u7684\u6bcf\u4e2a\u5143\u7d20":38,"\u5355\u5c42\u5e8f\u5217\u7b2ci\u4e2a\u5143\u7d20":38,"\u5355\u5c42\u6216\u53cc\u5c42":38,"\u5355\u5c42\u65f6\u95f4\u5e8f\u5217":39,"\u5355\u5c42rnn":[39,41],"\u5355\u5c42rnn\u548c\u53cc\u5c42rnn\u7684\u7f51\u7edc\u914d\u7f6e":39,"\u5355\u673acpu\u8bad\u7ec3":11,"\u5355\u673agpu\u8bad\u7ec3":11,"\u5355\u6b65\u51fd\u6570":42,"\u5355\u6b65\u51fd\u6570\u548c\u8f93\u51fa\u51fd\u6570\u5728":42,"\u5355\u6b65\u51fd\u6570\u548c\u8f93\u51fa\u51fd\u6570\u90fd\u975e\u5e38\u7b80\u5355":42,"\u5355\u6b65\u51fd\u6570\u7684\u5b9e\u73b0\u5982\u4e0b\u6240\u793a":42,"\u5355\u8fdb\u5355\u51fa":41,"\u536b\u751f":39,"\u5373":[7,11,12,27],"\u5373\u4e0a\u8ff0\u4ee3\u7801\u4e2d\u7684\u7b2c19\u884c":39,"\u5373\u4e0b\u8f7d\u5931\u8d25":8,"\u5373\u4e0d\u9700\u8981\u4f7f\u7528memori":39,"\u5373\u4e3a\u4e00\u4e2a\u65f6\u95f4\u6b65":39,"\u5373\u4e3a\u5355\u5c42rnn\u5e8f\u5217\u7684\u4f7f\u7528\u4ee3\u7801":39,"\u5373\u4e3a\u65f6\u95f4\u5e8f\u5217\u7684\u8f93\u5165":39,"\u5373\u4e3a\u8fd9\u4e2a\u53cc\u5c42rnn\u7684\u7f51\u7edc\u7ed3\u6784":39,"\u5373\u4e8c\u7ef4\u6570\u7ec4":39,"\u5373\u4f7f\u7528":12,"\u5373\u4f7f\u95f4\u9694\u5f88\u5c0f":33,"\u5373\u4fbf\u662f":0,"\u5373\u4fbf\u8bbe\u7f6e":8,"\u5373\u521d\u59cb\u72b6\u6001\u4e3a0":41,"\u5373\u5355\u65f6\u95f4\u6b65\u6267\u884c\u7684\u51fd\u6570":42,"\u5373\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217":39,"\u5373\u53cc\u5c42rnn\u7684\u6bcf\u4e2a\u72b6\u6001":41,"\u5373\u53ef":4,"\u5373\u53ef\u4ee5\u6781\u5927\u7684\u52a0\u901f\u6570\u636e\u8f7d\u5165\u6d41\u7a0b":11,"\u5373\u53ef\u5f00\u59cb\u4e0b\u8f7d":3,"\u5373\u53ef\u5f00\u59cb\u4e0b\u9762\u7684\u6b65\u9aa4":1,"\u5373\u5728\u53cc\u5c42\u5e8f\u5217\u7684\u539f\u59cb\u6570\u636e\u4e2d":39,"\u5373\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d":11,"\u5373\u5927\u90e8\u5206\u503c\u4e3a0":14,"\u5373\u5c06\u4e00\u6bb5\u8bdd\u8fdb\u884c\u5206\u7c7b":39,"\u5373\u5c06nchw\u8f6c\u6362\u6210nhwc":12,"\u5373\u5f53\u524d\u65f6\u95f4\u6b65\u4e0b\u7684\u795e\u7ecf\u7f51\u7edc\u4f9d\u8d56\u524d\u4e00\u4e2a\u65f6\u95f4\u6b65\u795e\u7ecf\u7f51\u7edc\u4e2d\u67d0\u4e00\u4e2a\u795e\u7ecf\u5143\u8f93\u51fa":39,"\u5373\u628a\u5355\u5c42rnn\u751f\u6210\u540e\u7684subseq\u7ed9\u62fc\u63a5\u6210\u4e00\u4e2a\u65b0\u7684\u53cc\u5c42seq":41,"\u5373\u6574\u4e2a\u53cc\u5c42group\u662f\u5c06\u524d\u4e00\u4e2a\u5b50\u53e5\u7684\u6700\u540e\u4e00\u4e2a\u5411\u91cf":39,"\u5373\u6574\u4e2a\u8f93\u5165\u5e8f\u5217":38,"\u5373\u6574\u6570\u6570\u7ec4":39,"\u5373\u65f6\u95f4\u9012\u5f52\u795e\u7ecf\u7f51\u7edc":39,"\u5373\u662f\u8de8\u8d8a\u65f6\u95f4\u6b65\u7684\u7f51\u7edc\u8fde\u63a5":39,"\u5373\u7279\u5f81\u7684\u6570\u7ec4":39,"\u5373\u7f51\u5361\u540d":27,"\u5373\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u51fa\u73b0nan\u6216\u8005inf":11,"\u5373\u8bbe\u7f6e":11,"\u5373\u8fd0\u884c\u8bad\u7ec3\u7a0b\u5e8f":1,"\u5373define_py_data_sources2\u5e94\u6539\u4e3a":13,"\u5373input":41,"\u5373rnn\u4e4b\u95f4\u6709\u4e00\u6b21\u5d4c\u5957\u5173\u7cfb":39,"\u5378\u8f7dpaddlepaddle\u5305":8,"\u538b\u7f29\u6210\u4e00\u4e2a\u5411\u91cf":39,"\u539f\u56e0":[4,8],"\u539f\u56e0\u5728\u4e8e\u6ca1\u6709\u628a\u673a\u5668\u4e0acuda\u76f8\u5173\u7684\u9a71\u52a8\u548c\u5e93\u6620\u5c04\u5230\u5bb9\u5668\u5185\u90e8":8,"\u539f\u56e0\u662f\u6bcf\u4e2a\u56de\u590d\u90fd\u4f1a\u53d1\u9001\u4e00\u5c01\u90ae\u4ef6":4,"\u53bb\u8fc7":39,"\u53c2\u6570":[0,6,11,20,21,27,32],"\u53c2\u6570\u5171\u4eab\u7684\u914d\u7f6e\u793a\u4f8b\u4e3a":13,"\u53c2\u6570\u548c\u73af\u5883\u53d8\u91cf":21,"\u53c2\u6570\u670d\u52a1\u5668":[22,32],"\u53c2\u6570\u670d\u52a1\u5668\u4e4b\u95f4\u4e0d\u76f8\u4e92\u4f9d\u8d56":22,"\u53c2\u6570\u670d\u52a1\u5668\u4e5f\u4e0d\u4f1a\u7b49\u5f85\u8ba1\u7b97\u8282\u70b9\u5168\u90e8\u90fd\u63d0\u4ea4\u68af\u5ea6\u4e4b\u540e\u624d\u5f00\u59cb\u4e0b\u4e00\u6b65":22,"\u53c2\u6570\u670d\u52a1\u5668\u63a5\u6536\u4ece\u8ba1\u7b97\u8282\u70b9\u4e0a\u4f20\u7684\u68af\u5ea6":22,"\u53c2\u6570\u670d\u52a1\u5668\u7684\u53c2\u6570\u5206\u5757\u5927\u5c0f":33,"\u53c2\u6570\u670d\u52a1\u5668\u7684\u76d1\u542c\u7aef\u53e3":33,"\u53c2\u6570\u670d\u52a1\u5668\u7684\u7f51\u7edc\u8bbe\u5907\u540d\u79f0":33,"\u53c2\u6570\u670d\u52a1\u5668\u7684ip\u5730\u5740":33,"\u53c2\u6570\u670d\u52a1\u5668\u7a00\u758f\u66f4\u65b0\u7684\u53c2\u6570\u5206\u5757\u5927\u5c0f":33,"\u53c2\u6570\u6765\u63a7\u5236\u7f13\u5b58\u65b9\u6cd5":11,"\u53c2\u6570\u6982\u8ff0":34,"\u53c2\u6570\u7684\u4e2a\u6570\u548c\u53c2\u6570\u5217\u8868":20,"\u53c2\u6570\u7684\u89e3\u6790":27,"\u53c2\u6570\u8bbe\u7f6e":10,"\u53c2\u6570\u8bbe\u7f6e\u4e86\u5916\u5c42":39,"\u53c2\u6570\u8bf4\u660e":21,"\u53c2\u6570\u8bf4\u660e\u5bb9\u5668\u5df2\u4ea4\u4e92\u5f0f\u8fd0\u884c":1,"\u53c2\u6570\u8f93\u5165":11,"\u53c2\u6570\u9700\u8981\u5b9e\u73b0":42,"\u53c2\u7167\u4e0a\u8ff0\u6b65\u9aa4\u66f4\u65b0":4,"\u53c2\u8003":1,"\u53c2\u8003\u4e0b\u8ff0\u53ef\u9009\u6b65\u9aa4":0,"\u53c2\u8003\u5f3a\u8c03\u90e8\u5206":37,"\u53c2\u8003\u65f6\u95f4\u5e8f\u5217":39,"\u53c2\u8003\u6837\u4f8b\u6570\u636e\u51c6\u5907\u811a\u672c":21,"\u53c2\u8003\u955c\u50cf\u7684":27,"\u53c8":39,"\u53c8\u662f\u4e00\u4e2a\u5355\u5c42\u7684\u5e8f\u5217":38,"\u53c8\u8981\u4fdd\u8bc1\u6570\u636e\u662f\u968f\u673a\u7684":11,"\u53ca":6,"\u53cc\u5411\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u9690\u85cf\u72b6\u6001":42,"\u53cc\u5c42":41,"\u53cc\u5c42\u4e0d\u7b49\u957frnn":39,"\u53cc\u5c42\u5e8f\u5217":[19,38],"\u53cc\u5c42\u5e8f\u5217\u5728\u5904\u7406\u957f\u5e8f\u5217\u7684\u4efb\u52a1\u6216\u662f\u6784\u5efa\u5c42\u7ea7\u6a21\u578b\u65f6\u4f1a\u53d1\u6325\u4f5c\u7528":19,"\u53cc\u5c42\u5e8f\u5217\u6216\u5355\u5c42\u5e8f\u5217":38,"\u53cc\u5c42\u5e8f\u5217\u6570\u636e\u4e00\u5171\u67094\u4e2a\u6837\u672c":39,"\u53cc\u5c42\u5e8f\u5217\u662f\u4e00\u4e2a\u5d4c\u5957\u7684\u5e8f\u5217":38,"\u53cc\u5c42\u5e8f\u5217\u662fpaddlepaddle\u652f\u6301\u7684\u4e00\u79cd\u975e\u5e38\u7075\u6d3b\u7684\u6570\u636e\u7ec4\u7ec7\u65b9\u5f0f":41,"\u53cc\u5c42\u5e8f\u5217\u6bcf\u4e2asubseq\u4e2d\u6bcf\u4e2a\u5143\u7d20":38,"\u53cc\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a0\u5c42\u5e8f\u5217":38,"\u53cc\u5c42\u5e8f\u5217\u9700\u8981\u8bbe\u7f6e\u5206\u522b\u4e3a\u5916\u5c42\u5e8f\u5217\u548c\u5185\u5c42\u5e8f\u5217\u5206\u522b\u8bbe\u7f6e":19,"\u53cc\u5c42\u6216\u8005\u5355\u5c42":38,"\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217\u7684dataprovider\u7684\u4ee3\u7801":39,"\u53cc\u5c42rnn":41,"\u53cc\u5c42rnn\u6570\u636e\u968f\u610f\u52a0\u4e86\u4e00\u4e9b\u9694\u65ad":39,"\u53cc\u5c42rnn\u987e\u540d\u601d\u4e49":39,"\u53cc\u8fdb\u5355\u51fa":41,"\u53cc\u8fdb\u53cc\u51fa":41,"\u53cd\u5411\u4f20\u64ad":6,"\u53cd\u5411\u4f20\u64ad\u6839\u636e\u8f93\u51fa\u7684\u68af\u5ea6":6,"\u53d1\u6563\u5230\u4e86\u4e00\u4e2a\u6570\u503c\u7279\u522b\u5927\u7684\u5730\u65b9":11,"\u53d1\u884c\u548c\u7ef4\u62a4":4,"\u53d1\u9001\u53c2\u6570\u7684\u7aef\u53e3\u53f7":33,"\u53d6\u503c\u76f8\u540c\u7684layer":12,"\u53d6\u5176\u4e2d\u4e00\u4e2a\u6a21\u578bparams_pass_90":14,"\u53d8\u6362\u77e9\u9635":6,"\u53e3\u5934":39,"\u53e5\u5b50\u662f\u7531\u8bcd\u8bed\u6784\u6210\u7684\u5e8f\u5217":19,"\u53e6\u4e00\u4e2a\u662f\u5185\u5b58\u64cd\u4f5c\u91cf":37,"\u53e6\u4e00\u4e2a\u662f\u6bcf\u6761\u5e8f\u5217":11,"\u53e6\u4e00\u79cd\u65b9\u5f0f\u662f\u5c06\u7f51\u7edc\u5c42\u5212\u5206\u5230\u4e0d\u540c\u7684gpu\u4e0a\u53bb\u8ba1\u7b97":35,"\u53e6\u5916":[0,39],"\u53e6\u5916\u6700\u65b0\u7684pip\u5b98\u65b9\u6e90\u4e2d\u7684\u5b89\u88c5\u5305\u9ed8\u8ba4\u662fmanylinux1\u6807\u51c6":3,"\u53ea\u4f5c\u4e3aread":41,"\u53ea\u4fdd\u5b58\u6700\u540e\u4e00\u8f6e\u7684\u53c2\u6570":33,"\u53ea\u5728\u7b2c\u4e00\u6b21cmake\u7684\u65f6\u5019\u6709\u6548":0,"\u53ea\u5bf9\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u8fdb\u884c\u5e8f\u5217\u5316":20,"\u53ea\u5c06\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u8fdb\u884c\u5e8f\u5217\u5316":20,"\u53ea\u662f\u53cc\u5c42\u5e8f\u5217\u5c06\u5176\u53c8\u505a\u4e86\u5b50\u5e8f\u5217\u5212\u5206":39,"\u53ea\u6709":39,"\u53ea\u6709\u5f53\u8bbe\u7f6e\u4e86spars":33,"\u53ea\u7528\u4e8e\u5728\u5e8f\u5217\u751f\u6210\u4efb\u52a1\u4e2d\u6307\u5b9a\u8f93\u5165\u6570\u636e":41,"\u53ea\u7559\u4e0b\u6838\u5fc3\u8ba1\u7b97\u5c42":20,"\u53ea\u80fd\u5728recurrent_group\u4e2d\u4f5c\u4e3astep":12,"\u53ea\u80fd\u6309\u884c\u8ba1\u7b97":12,"\u53ea\u80fd\u6d4b\u8bd5\u5355\u4e2a\u6a21\u578b":35,"\u53ea\u80fd\u8bbf\u95ee\u5b83\u4eec\u7684\u8f93\u51fa\u503c":12,"\u53ea\u8981\u4e00\u7cfb\u5217\u7279\u5f81\u6570\u636e\u4e2d\u7684":39,"\u53ea\u8981\u51fa\u73b0\u6d6e\u70b9\u6570\u5f02\u5e38":11,"\u53ea\u8bfbmemory\u8f93\u5165":41,"\u53ea\u9488\u5bf9\u5185\u5b58":11,"\u53ea\u9700\u4e2d\u65ad":23,"\u53ea\u9700\u5728\u7f16\u8bd1\u65f6\u9700\u914d\u5236\u4e0b\u9762\u8fd9\u4e9b\u7f16\u8bd1\u9009\u9879":17,"\u53ea\u9700\u7528\u60a8\u5b9a\u4e49\u7684\u76ee\u5f55\u4fee\u6539":23,"\u53ea\u9700\u8981":42,"\u53ea\u9700\u8981\u8bbe\u7f6e\u884c\u504f\u79fb":19,"\u53ea\u9700\u8981\u94fe\u63a5":17,"\u53ea\u9700\u8fdb\u884c\u524d\u5411\u8ba1\u7b97\u800c\u65e0\u9700\u8c03\u7528\u53cd\u5411\u8ba1\u7b97":20,"\u53ef\u4ee5":[1,4,7,39],"\u53ef\u4ee5\u4ece":1,"\u53ef\u4ee5\u4ece\u6211\u4eec\u7684ci\u7cfb\u7edf\u4e2d\u4e0b\u8f7d\u6700\u65b0\u7684whl\u5b89\u88c5\u5305\u548cc":3,"\u53ef\u4ee5\u4f30\u8ba1\u51fa\u5982\u679c\u6a21\u578b\u91c7\u7528\u4e0d\u53d8\u7684\u8f93\u51fa\u6700\u5c0f\u7684cost0\u662f\u591a\u5c11":13,"\u53ef\u4ee5\u4f7f\u7528":[13,20],"\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u7684\u547d\u4ee4\u66f4\u65b0\u60a8\u7684pip":3,"\u53ef\u4ee5\u4f7f\u7528\u5982\u4e0b\u4ee3\u7801":13,"\u53ef\u4ee5\u4f7f\u7528\u76f8\u5e94\u6570\u636e\u7c7b\u578b\u7684":13,"\u53ef\u4ee5\u4f7f\u7528\u8be5\u53c2\u6570":33,"\u53ef\u4ee5\u4f7f\u7528kubernetes\u7684\u547d\u4ee4\u884c\u5de5\u5177\u521b\u5efajob":27,"\u53ef\u4ee5\u5148\u4f7f\u7528":12,"\u53ef\u4ee5\u51cf\u5c11\u7f13\u5b58\u6c60\u7684\u5927\u5c0f":11,"\u53ef\u4ee5\u521b\u5efa\u4e00\u4e2a":26,"\u53ef\u4ee5\u52a0\u901fpaddlepaddle\u7684\u8ba1\u7b97":1,"\u53ef\u4ee5\u53c2\u8003":[0,1,4,39,42],"\u53ef\u4ee5\u53c2\u8003\u4e0b\u9762\u7684\u6b65\u9aa4\u6392\u67e5":9,"\u53ef\u4ee5\u53c2\u8003paddlepaddl":14,"\u53ef\u4ee5\u542b\u6709\u4e00\u6761\u6216\u591a\u6761\u6837\u672c":19,"\u53ef\u4ee5\u544a\u8bc9\u60a8\u67d0\u4e2a\u64cd\u4f5c\u5230\u5e95\u82b1\u4e86\u591a\u957f\u65f6\u95f4":37,"\u53ef\u4ee5\u5728":[0,23],"\u53ef\u4ee5\u5728\u5171\u4eab\u5b58\u50a8\u4e0a\u67e5\u770b\u8f93\u51fa\u7684\u65e5\u5fd7\u548c\u6a21\u578b":27,"\u53ef\u4ee5\u5728\u8fd9\u4e2a":4,"\u53ef\u4ee5\u5728event_handler\u4e2d":11,"\u53ef\u4ee5\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u7684sgd\u65b9\u6cd5\u7684\u8bad\u7ec3":22,"\u53ef\u4ee5\u5c06memory\u7406\u89e3\u4e3a\u4e00\u4e2a\u65f6\u5ef6\u64cd\u4f5c":41,"\u53ef\u4ee5\u5c1d\u8bd5\u4ee5\u4e0b\u7684\u65b9\u6cd5":1,"\u53ef\u4ee5\u5e2e\u60a8\u63d0\u4f9b\u4e00\u4e9b\u5b9a\u4f4d\u6027\u80fd\u74f6\u9888\u7684\u5efa\u8bae":37,"\u53ef\u4ee5\u5e76\u884c\u7f16\u8bd1\u5417":0,"\u53ef\u4ee5\u5feb\u901f\u5728\u672c\u5730\u542f\u52a8\u4e00\u4e2a\u5305\u542b\u4e86paddlepaddle\u5b98\u65b9book\u6559\u7a0b\u7684jupyt":1,"\u53ef\u4ee5\u6267\u884c":[3,8],"\u53ef\u4ee5\u6267\u884c\u4ee5\u4e0b\u547d\u4ee4\u7f16\u8bd1\u751f\u6210\u6587\u6863":7,"\u53ef\u4ee5\u628a\u5b83\u60f3\u8c61\u4e3a\u4e00\u4e2a\u7c7b\u4f3c":0,"\u53ef\u4ee5\u6307\u5b9a\u540c\u65f6\u6267\u884cgpu\u4e0a\u7684\u5355\u5143\u6d4b\u8bd5":0,"\u53ef\u4ee5\u6307\u5b9a\u54ea\u4e00\u4e2a\u8f93\u5165\u548c\u8f93\u51fa\u5e8f\u5217\u4fe1\u606f\u4e00\u81f4":39,"\u53ef\u4ee5\u6307\u5b9a\u5f00\u542f\u81ea\u52a8\u68c0\u6d4bsm\u67b6\u6784":0,"\u53ef\u4ee5\u6309\u7167\u4e0b\u9762\u7684\u65b9\u6cd5":0,"\u53ef\u4ee5\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":[38,41],"\u53ef\u4ee5\u662f\u4e00\u4e2a\u975e\u5e8f\u5217":41,"\u53ef\u4ee5\u662f\u4ece\u5206\u5e03\u5f0f\u5b58\u50a8\u6302\u8f7d\u8fc7\u6765\u7684":21,"\u53ef\u4ee5\u662f\u4ee5\u4e0b\u51e0\u79cd":6,"\u53ef\u4ee5\u662f\u6574\u578b":19,"\u53ef\u4ee5\u663e\u793a\u5730\u6307\u5b9a\u4e00\u4e2alayer\u7684\u8f93\u51fa\u7528\u4e8e\u521d\u59cb\u5316memori":41,"\u53ef\u4ee5\u6709\u4ee5\u4e0b\u4e24\u79cd":41,"\u53ef\u4ee5\u6709\u6548\u51cf\u5c0f\u7f51\u7edc\u7684\u963b\u585e":33,"\u53ef\u4ee5\u67e5\u770b":27,"\u53ef\u4ee5\u67e5\u770b\u6b64pod\u8fd0\u884c\u7684\u5bbf\u4e3b\u673a":26,"\u53ef\u4ee5\u6d4b\u8bd5\u591a\u4e2a\u6a21\u578b":35,"\u53ef\u4ee5\u7528":0,"\u53ef\u4ee5\u7528\u5982\u4e0b\u547d\u4ee4":4,"\u53ef\u4ee5\u7528\u6765\u8ba1\u7b97cpu\u51fd\u6570\u6216cuda\u5185\u6838\u7684\u65f6\u95f4\u6d88\u8017":37,"\u53ef\u4ee5\u76f4\u63a5\u8fd0\u884c":20,"\u53ef\u4ee5\u7701\u7565\u6b65\u9aa43\u4e2d":0,"\u53ef\u4ee5\u770b\u4f5c\u662f\u4e00\u4e2a\u975e\u5e8f\u5217\u8f93\u5165":38,"\u53ef\u4ee5\u770b\u51fa":22,"\u53ef\u4ee5\u7cbe\u786e\u8bf4\u660e\u4e00\u4e2a\u957f\u8017\u65f6\u64cd\u4f5c\u7684\u5177\u4f53\u539f\u56e0":37,"\u53ef\u4ee5\u8003\u8651\u4f7f\u7528\u4e00\u4e9b\u4f18\u5316\u7b97\u6cd5":11,"\u53ef\u4ee5\u8054\u7cfbop":9,"\u53ef\u4ee5\u8054\u7cfbop\u662f\u5426\u53ef\u4ee5\u66f4\u6362\u96c6\u7fa4\u6216\u5347\u7ea7\u5f53\u524d\u96c6\u7fa4":9,"\u53ef\u4ee5\u83b7\u53d6\u7f51\u7edc\u4e2d\u5b9a\u4e49\u7684\u4efb\u610f\u591a\u4e2a":20,"\u53ef\u4ee5\u88c5\u7684\u662f":0,"\u53ef\u4ee5\u8bbf\u95ee\u7531recurr":12,"\u53ef\u4ee5\u8f7b\u677e\u5730\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u914d\u7f6e":14,"\u53ef\u4ee5\u8fd0\u884c":21,"\u53ef\u4ee5\u9009\u5728\u5728\u5f53\u524d\u673a\u5668\u5b89\u88c5\u4e5f\u53ef\u4ee5\u62f7\u8d1d\u5230\u76ee\u6807\u673a\u5668\u5b89\u88c5":0,"\u53ef\u4ee5\u9009\u62e9\u662f\u5426\u4f7f\u7528\u53c2\u6570":35,"\u53ef\u4ee5\u901a\u8fc7":4,"\u53ef\u4ee5\u901a\u8fc7\u4fee\u6539\u8fd9\u4e24\u4e2a\u51fd\u6570\u6765\u5b9e\u73b0\u590d\u6742\u7684\u7f51\u7edc\u914d\u7f6e":42,"\u53ef\u4ee5\u901a\u8fc7\u5728":11,"\u53ef\u4ee5\u901a\u8fc7\u7f51\u9875\u6d4f\u89c8":1,"\u53ef\u4ee5\u901a\u8fc7\u8fd9\u4e2a\u8f93\u51fa\u6765\u5b8c\u6210\u81ea\u5b9a\u4e49\u7684\u8bc4\u4f30\u6307\u6807\u8ba1\u7b97\u7b49\u529f\u80fd":11,"\u53ef\u4ee5\u91cd\u547d\u540d\u8fd9\u4e2awhl\u5305\u4e3a":[3,8],"\u53ef\u53c2\u8003":20,"\u53ef\u5728":17,"\u53ef\u7528\u4e8e\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u89e3\u6790\u8fd9\u4e9b\u53c2\u6570":35,"\u53ef\u76f4\u63a5\u8fd0\u884c":20,"\u53ef\u80fd\u4f1a\u5bfc\u81f4\u51fa\u9519":27,"\u53ef\u80fd\u7684\u4ee3\u7801\u4e3a":11,"\u53ef\u80fd\u7684\u539f\u56e0\u662f":13,"\u53ef\u80fd\u7684\u60c5\u51b5\u4e0b":37,"\u53ef\u80fd\u9700\u8981\u6ce8\u610f\u7ed9\u8fd9\u4e2a\u865a\u62df\u673a\u591a\u5206\u914d\u4e00\u4e9b":0,"\u53ef\u9009":[0,6,20,21],"\u53ef\u9009\u6b65\u9aa4":0,"\u53ef\u9009\u7684\u4e0d\u540c\u7f16\u8bd1\u73af\u5883docker\u955c\u50cf":0,"\u53ef\u9009\u914d\u7f6e\u9009\u9879":17,"\u53ef\u901a\u8fc7pip\u4e00\u952e\u5b89\u88c5":2,"\u53ef\u914d\u7f6e\u4e3a":17,"\u53ef\u91c7\u7528\u7b2c\u4e8c\u79cd\u65b9\u5f0f":12,"\u5403":39,"\u5403\u996d":39,"\u5404\u65b9\u9762":39,"\u5408":39,"\u5408\u5e76\u5165\u4e00\u4e2a\u6587\u4ef6":20,"\u5408\u5e76\u6a21\u578b\u6587\u4ef6":20,"\u5408\u7406":39,"\u540c\u65f6":[8,11,37],"\u540c\u65f6\u4e5f\u4f1a\u8bfb\u53d6\u76f8\u5173\u8def\u5f84\u53d8\u91cf\u6765\u8fdb\u884c\u641c\u7d22":0,"\u540c\u65f6\u4e5f\u53ef\u4ee5\u52a0\u901f\u5f00\u59cb\u8bad\u7ec3\u524d\u6570\u636e\u8f7d\u5165\u7684\u8fc7\u7a0b":11,"\u540c\u65f6\u4e5f\u53ef\u4ee5\u901a\u8fc7":4,"\u540c\u65f6\u4e5f\u80fd\u591f\u5f15\u5165\u66f4\u52a0\u590d\u6742\u7684\u8bb0\u5fc6\u673a\u5236":41,"\u540c\u65f6\u5176\u5185\u90e8\u5b9e\u73b0\u53ef\u4ee5\u907f\u514d\u7eafcpu\u7248\u672cpaddlepaddle\u5728\u6267\u884c\u672c\u8bed\u53e5\u65f6\u53d1\u751f\u5d29\u6e83":37,"\u540c\u65f6\u5728\u5185\u5b58\u91cc\u76f4\u63a5\u968f\u5373\u9009\u53d6\u6570\u636e\u6765\u505ashuffl":11,"\u540c\u65f6\u5c06\u53c2\u6570\u521d\u59cb\u5316\u4e3a":13,"\u540c\u65f6\u7528\u6237\u9700\u8981\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u6307\u5b9a":35,"\u540c\u65f6\u8bbe\u7f6e\u5185\u5b58\u7f13\u5b58\u529f\u80fd":11,"\u540c\u65f6\u8f93\u51fa\u5e8f\u5217\u5c42\u548c\u975e\u5e8f\u5217\u5c42":11,"\u540c\u6837":14,"\u540c\u6837\u4e5f\u53ef\u4ee5\u5728\u6d4b\u8bd5\u6a21\u5f0f\u4e2d\u6307\u5b9a\u6a21\u578b\u8def\u5f84":33,"\u540c\u6837\u53ef\u4ee5\u6269\u5c55\u5230\u53cc\u5c42\u5e8f\u5217\u7684\u5904\u7406\u4e0a":41,"\u540c\u6b65\u6267\u884c\u64cd\u4f5c\u7684\u7ebf\u7a0b\u6570":33,"\u540e":[0,4,13,27],"\u540e\u5411\u4f20\u64ad":6,"\u540e\u5411\u4f20\u64ad\u7ed9\u5b9a\u8f93\u51fa\u7684\u68af\u5ea6":6,"\u540e\u7f00\u4e3a":21,"\u540e\u8005\u5728\u6fc0\u6d3b\u51fd\u6570\u53cd\u5411\u8ba1\u7b97\u65f6\u88ab\u8c03\u7528":11,"\u540e\u8005\u622a\u65ad\u56de\u4f20\u7ed9\u524d\u5c42\u7684\u68af\u5ea6":11,"\u540e\u9762\u7684gradient":21,"\u540e\u9988":22,"\u5411\u91cf":19,"\u5411\u91cfenable_parallel_vector":32,"\u5417":0,"\u5426\u5219\u4f1a\u628a":4,"\u5426\u5219\u4f7f\u7528\u591a\u673a\u8bad\u7ec3":33,"\u5426\u5219\u4f7f\u7528cpu\u6a21\u5f0f":33,"\u5426\u5219\u4f7f\u7528gpu":35,"\u5426\u5219\u5b83\u4ee5\u4e00\u4e2a\u5e8f\u5217\u8f93\u5165":42,"\u5426\u5219\u9891\u7e41\u7684\u591a\u8282\u70b9\u5de5\u4f5c\u7a7a\u95f4\u90e8\u7f72\u53ef\u80fd\u4f1a\u5f88\u9ebb\u70e6":23,"\u542b\u6709\u5e8f\u5217\u4fe1\u606f\u548c\u5b50\u5e8f\u5217\u4fe1\u606f\u7684\u7a20\u5bc6\u5411\u91cf":6,"\u542b\u6709\u5e8f\u5217\u4fe1\u606f\u7684\u6574\u6570":6,"\u542b\u6709\u5e8f\u5217\u4fe1\u606f\u7684\u7a20\u5bc6\u5411\u91cf":6,"\u542f\u52a8\u4e00\u4e2a\u6d4b\u8bd5\u96c6\u7fa4":23,"\u542f\u52a8\u53c2\u6570\u8bf4\u660e":22,"\u542f\u52a8\u5bb9\u5668\u5f00\u59cb\u8bad\u7ec3":27,"\u542f\u52a8\u5e76\u884c\u5411\u91cf\u7684\u9608\u503c":33,"\u542f\u52a8\u5feb\u901f\u5e94\u7b54":33,"\u542f\u52a8\u8bad\u7ec3\u4efb\u52a1":28,"\u542f\u7528\u68af\u5ea6\u53c2\u6570\u7684\u9608\u503c":33,"\u5440":39,"\u5468\u56f4":39,"\u547d\u4ee4\u4e3a":[8,26],"\u547d\u4ee4\u521b\u5efa\u65b0\u955c\u50cf":26,"\u547d\u4ee4\u53ef\u4ee5\u8bbe\u7f6e":0,"\u547d\u4ee4\u6709\u65f6\u5019\u4f1a\u4ea7\u751f\u4e00\u4e9b\u4e2d\u95f4\u7ed3\u679c":0,"\u547d\u4ee4\u770b\u5230\u505c\u6b62\u540e\u4f46\u662f\u6ca1\u6709\u5220\u9664\u7684":0,"\u547d\u4ee4\u7f16\u8bd1\u6e90\u7801\u5373\u53ef":0,"\u547d\u4ee4\u884c\u53c2\u6570\u8bbe\u7f6e":36,"\u547d\u4ee4\u8bbe\u7f6e\u8be5\u7c7b\u7f16\u8bd1\u9009\u9879":0,"\u547d\u4ee4\u9009\u9879\u5e76\u4e14":23,"\u547d\u4ee4\u91cc\u90fd\u7528\u4e86":0,"\u547d\u540d\u4e3a":4,"\u548c":[0,4,6,7,11,12,13,17,19,21,22,23,35,37,39,42],"\u548c\u4e00\u4e2a\u5df2\u7ecf\u5206\u8bcd\u540e\u7684\u53e5\u5b50":39,"\u548c\u4e09\u79cd\u5e8f\u5217\u6a21\u5f0f":14,"\u548c\u4e2d\u6587\u6587\u6863":7,"\u548c\u4e4b\u524d\u51cf\u5c0f\u901a\u8fc7\u51cf\u5c0f\u7f13\u5b58\u6c60\u6765\u51cf\u5c0f\u5185\u5b58\u5360\u7528\u7684\u539f\u7406\u4e00\u81f4":11,"\u548c\u504f\u7f6e\u5411\u91cf":6,"\u548c\u5185\u5b58":0,"\u548c\u5217\u53f7":19,"\u548c\u53cc\u5c42\u5e8f\u5217\u542b\u6709subseq":38,"\u548c\u5e8f\u5217\u4e2d\u542b\u6709\u5143\u7d20\u7684\u6570\u76ee\u540c":38,"\u548c\u5f02\u6b65\u968f\u673a\u68af\u5ea6\u4e0b\u964d":22,"\u548c\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u8f93\u5165":42,"\u548c\u64cd\u4f5c\u7cfb\u7edf\u4e0a\u76f4\u63a5\u8fd0\u884c\u7684":0,"\u548c\u793a\u4f8b2\u4e2d\u7684\u914d\u7f6e\u7c7b\u4f3c":39,"\u548c\u90e8\u5206layer":41,"\u548cpool":38,"\u548cpserver\u4e4b\u95f4\u7528\u4e8e\u7a00\u758f\u7c7b\u578b\u53c2\u6570\u901a\u4fe1\u7684\u7aef\u53e3\u4e2a\u6570":21,"\u54c1\u8d28":39,"\u54ea\u4e9b\u4e0d\u662f":39,"\u5546\u52a1":39,"\u554a":39,"\u56db\u79cd\u6570\u636e\u7c7b\u578b":14,"\u56e0\u4e3a\u4e0a\u8ff0\u7279\u70b9":18,"\u56e0\u4e3a\u5168\u8fde\u63a5\u5c42\u7684\u6fc0\u6d3b\u53ef\u4ee5\u662fsoftmax":6,"\u56e0\u4e3a\u53c2\u6570":35,"\u56e0\u4e3a\u5b83\u4eec\u7684\u8ba1\u7b97\u6548\u7387\u6bd4":42,"\u56e0\u4e3a\u5b83\u6bd4":42,"\u56e0\u4e3a\u5b98\u65b9\u955c\u50cf":27,"\u56e0\u4e3a\u6211\u4eec\u4f1a\u628a\u6240\u6709\u7f16\u8bd1\u5de5\u5177\u90fd\u5b89\u88c5\u8fdb\u4e00\u4e2a":0,"\u56e0\u4e3a\u6e90\u7801\u5c31\u5728\u672c\u673a\u4e0a":0,"\u56e0\u4e3a\u8f93\u5165\u6570\u636e\u53ef\u80fd\u6709\u591a\u79cd\u7ed3\u6784":18,"\u56e0\u4e3apython\u7684\u641c\u7d22\u8def\u5f84\u662f\u4f18\u5148\u5df2\u7ecf\u5b89\u88c5\u7684python\u5305":8,"\u56e0\u6b64":[6,39,41],"\u56e0\u6b64\u53cc\u5c42\u5e8f\u5217\u7684\u914d\u7f6e\u4e2d":39,"\u56e0\u6b64\u5b83\u662finteger_value_sub_sequ":39,"\u56e0\u6b64\u6211\u4eec\u91c7\u7528\u8f93\u51fa\u7684\u52a0\u6743\u548c":6,"\u56e0\u6b64\u7528\u6237\u5e76\u4e0d\u9700\u8981\u5173\u5fc3\u5b83\u4eec":32,"\u56e0\u6b64\u9519\u8bef\u7684\u4f7f\u7528\u4e8c\u8fdb\u5236\u53d1\u884c\u7248\u53ef\u80fd\u4f1a\u5bfc\u81f4\u8fd9\u79cd\u9519\u8bef":8,"\u56fd\u5185\u7528\u6237\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u7684\u955c\u50cf\u6e90\u6765\u52a0\u901f\u8bbf\u95ee":1,"\u56fe1":[19,20],"\u56fe2":19,"\u56fe\u8868":1,"\u5728":[0,4,19,20,21,38,39,42],"\u5728\u4e00\u4e2a\u529f\u80fd\u9f50\u5168\u7684kubernetes\u673a\u7fa4\u91cc":26,"\u5728\u4e00\u4e2a\u53c2\u6570\u7684\u68af\u5ea6\u88ab\u66f4\u65b0\u540e":6,"\u5728\u4e00\u4e9b\u5206\u5e03\u5f0f\u7cfb\u7edf\u4e2d":21,"\u5728\u4e00\u8f6e\u4e2d\u6bcfsave":33,"\u5728\u4e0a\u9762\u4ee3\u7801\u4e2d":39,"\u5728\u4e0b\u4e00\u7bc7\u4e2d":26,"\u5728\u4e0d\u540c\u96c6\u7fa4\u4e2d\u8fd0\u884c":22,"\u5728\u4e4b\u540e\u7684":11,"\u5728\u4e86\u89e3docker\u7684\u57fa\u672c\u4f7f\u7528\u65b9\u6cd5\u4e4b\u540e":1,"\u5728\u4efb\u610f\u65f6\u95f4\u67d0\u4e00\u53f0\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u4fdd\u5b58\u7684\u53c2\u6570\u53ef\u80fd\u6bd4\u53e6\u4e00\u53f0\u8981\u66f4\u65b0":22,"\u5728\u4f7f\u7528\u4e0d\u540c\u7684\u5206\u5e03\u5f0f\u8ba1\u7b97\u5e73\u53f0\u65f6":21,"\u5728\u4f7f\u7528\u540c\u6b65sgd\u8bad\u7ec3\u795e\u7ecf\u7f51\u7edc\u65f6":22,"\u5728\u4f7f\u7528\u65f6":12,"\u5728\u4f7f\u7528\u8be5\u6587\u6863\u4e4b\u524d":14,"\u5728\u4f7f\u7528c":17,"\u5728\u4f7f\u7528paddlepaddl":17,"\u5728\u5168\u8fde\u63a5\u5c42\u4e2d":6,"\u5728\u51c6\u5907\u53d1\u8d77":4,"\u5728\u51fd\u6570":27,"\u5728\u5206\u5e03\u5f0f\u73af\u5883\u4e2d\u6d4b\u8bd5":33,"\u5728\u5206\u5e03\u5f0f\u8bad\u7ec3\u4e2d":33,"\u5728\u521b\u5efaparameters\u540e":13,"\u5728\u5355\u5c42\u6570\u636e\u7684\u57fa\u7840\u4e0a":39,"\u5728\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u52a0\u8f7d\u548c\u4fdd\u5b58\u53c2\u6570":33,"\u5728\u53c2\u6570\u670d\u52a1\u5668\u7ec8\u7aef\u6bcflog":33,"\u5728\u53cc\u5c42rnn\u4e2d\u7684\u7ecf\u5178\u60c5\u51b5\u662f\u5c06\u5185\u5c42\u7684\u6bcf\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217\u6570\u636e":39,"\u5728\u53cd\u5411\u4f20\u9012\u7684\u65f6\u5019":11,"\u5728\u53d8\u6362\u65f6\u9700\u8981\u5c06\u8f93\u5165\u5e8f\u5217\u4f20\u5165":39,"\u5728\u542f\u52a8job\u4e4b\u524d":27,"\u5728\u56de\u590d\u8bc4\u5ba1\u4eba\u610f\u89c1\u65f6":4,"\u5728\u56fe\u50cf\u4efb\u52a1\u4e2d":12,"\u5728\u591acpu\u8bad\u7ec3\u65f6\u5171\u4eab\u8be5\u53c2\u6570":33,"\u5728\u5b8c\u6210\u4e00\u5b9a\u91cf\u6570\u636e\u7684\u8bad\u7ec3\u540e":22,"\u5728\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u7684\u642d\u5efa\u4e4b\u540e":14,"\u5728\u5b9a\u4e49\u8f93\u5165layer\u4e4b\u540e":14,"\u5728\u5b9e\u9645\u5e94\u7528\u4e2d":12,"\u5728\u5bb9\u5668\u4e2d\u7f16\u8f91\u4ee3\u7801":1,"\u5728\u5bb9\u5668\u521b\u5efa\u540e":27,"\u5728\u5bf9\u5bb9\u5668\u7684\u63cf\u8ff0":27,"\u5728\u5c42\u4e2d\u6307\u5b9a":35,"\u5728\u5e8f\u5217\u751f\u6210\u4efb\u52a1\u4e2d":41,"\u5728\u5f02\u6b65sgd\u4e2d":22,"\u5728\u5f53\u524d":11,"\u5728\u5f53\u524d\u7684\u5b9e\u73b0\u65b9\u5f0f\u4e0b":6,"\u5728\u5f97\u5230":27,"\u5728\u6211\u4eec\u7684\u4f8b\u5b50\u4e2d":42,"\u5728\u63d0\u4ea4":4,"\u5728\u642d\u5efa\u795e\u7ecf\u7f51\u7edc\u7684\u8fc7\u7a0b\u4e2d":14,"\u5728\u65e0\u7279\u6b8a\u9700\u6c42\u60c5\u51b5\u4e0b":17,"\u5728\u672c\u4f8b\u4e2d":[4,35,39],"\u5728\u672c\u6559\u7a0b\u4e2d":42,"\u5728\u672c\u793a\u4f8b\u4e2d":39,"\u5728\u672c\u8282\u4e2d":42,"\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b":11,"\u5728\u6811\u7684\u6bcf\u4e00\u5c42\u4e0a":33,"\u5728\u6b64":[32,35],"\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u4e2d":42,"\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u7684\u5b50\u5e8f\u5217\u957f\u5ea6\u53ef\u4ee5\u4e0d\u76f8\u7b49":39,"\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u957f":42,"\u5728\u6bcf\u4e2apod\u4e0a\u90fd\u901a\u8fc7volume\u65b9\u5f0f\u6302\u8f7d\u5206\u5e03\u5f0f\u6587\u4ef6\u7cfb\u7edf\u7684\u4e00\u4e2a\u76ee\u5f55\u7528\u4e8e\u4fdd\u5b58\u8bad\u7ec3\u6570\u636e\u548c\u8f93\u51fa\u7ed3\u679c":27,"\u5728\u6d4b\u8bd5\u9636\u6bb5":33,"\u5728\u6e90\u7801\u76ee\u5f55\u6811\u7684\u6839\u76ee\u5f55\u4e2d\u8fd0\u884c":4,"\u5728\u751f\u6210\u65f6":42,"\u5728\u76f8\u5e94\u7684\u4f18\u5316\u7b97\u6cd5\u91cc\u8bbe\u7f6elearning_rate_schedule\u53ca\u76f8\u5173\u53c2\u6570":13,"\u5728\u76f8\u5e94layer\u7684":12,"\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u7b49\u4e8e\u4e00\u6b21\u9884\u6d4b\u5904\u7406\u7684\u6837\u672c\u6570":19,"\u5728\u7a0b\u5e8f\u5b9e\u73b0\u4e2d\u90fd\u4f1a\u8f6c\u5316\u4e3a\u4e8c\u7ef4\u77e9\u9635":19,"\u5728\u7ebf\u4e0a\u7cfb\u7edf\u4e2d":21,"\u5728\u7ec4\u5408\u65f6":14,"\u5728\u7ec4\u7ec7\u795e\u7ecf\u7f51\u7edc\u8f93\u5165":20,"\u5728\u7ec4\u7ec7\u795e\u7ecf\u7f51\u7edc\u8f93\u5165\u65f6":19,"\u5728\u7ec8\u7aef\u6267\u884c":20,"\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d":6,"\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d":38,"\u5728\u8bad\u7ec3\u4e2d":42,"\u5728\u8bad\u7ec3\u4e4b\u524d":27,"\u5728\u8bad\u7ec3\u65f6":26,"\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d":27,"\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6bcfshow":33,"\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u53c2\u6570\u7684\u6743\u91cd\u548c\u68af\u5ea6":11,"\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u67d0\u4e00\u4e2alayer\u7684output":11,"\u5728\u8be5\u793a\u4f8b\u4e2d":13,"\u5728\u8be5\u914d\u7f6e\u76847":39,"\u5728\u8c03\u7528":20,"\u5728\u8c03\u7528c":20,"\u5728\u8f6f\u4ef6\u5de5\u7a0b\u7684\u8303\u7574\u91cc":37,"\u5728\u8f93\u51fa\u7684\u8fc7\u7a0b\u4e2d":41,"\u5728\u8fd0\u884c\u65f6\u5c06\u795e\u7ecf\u7f51\u7edc\u7684\u591a\u4e2a\u53ef\u5b66\u4e60\u53c2\u6570\u653e\u5728\u540c\u4e00\u4e2a\u76ee\u5f55\u4e2d":20,"\u5728\u8fd9\u4e00\u90e8\u5206":34,"\u5728\u8fd9\u4e2a\u4f8b\u5b50\u91cc":[6,26],"\u5728\u8fd9\u4e2a\u51fd\u6570\u4e2d":39,"\u5728\u8fd9\u4e2a\u6559\u7a0b\u4e2d":37,"\u5728\u8fd9\u4e2a\u6a21\u578b\u4e2d":42,"\u5728\u8fd9\u4e9blayer\u4e2d":39,"\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b":[6,42],"\u5728\u8fd9\u79cd\u7ed3\u6784\u4e2d":41,"\u5728\u8fd9\u7bc7\u6587\u6863\u91cc":26,"\u5728\u8fd9\u7bc7\u6587\u7ae0\u91cc":27,"\u5728\u8fd9\u91cc":41,"\u5728\u8fd9\u91cc\u6211\u4eec\u4f7f\u7528\u5168\u8fde\u63a5\u5c42\u4f5c\u4e3a\u4f8b\u5b50\u6765\u5c55\u793a\u5b9e\u73b0\u65b0\u7f51\u7edc\u5c42\u6240\u9700\u8981\u7684\u56db\u4e2a\u6b65\u9aa4":6,"\u5728\u8fdb\u884c\u5206\u5e03\u5f0f\u8bad\u7ec3\u65f6":21,"\u5728\u8fdb\u884c\u7f51\u7edc\u914d\u7f6e\u4e4b\u524d":14,"\u5728\u91c7\u7528sgd":13,"\u5728\u96c6\u7fa4\u4e0a\u8bad\u7ec3\u4e00\u4e2a\u7a00\u758f\u6a21\u578b\u9700\u8981\u52a0\u4e0a\u4e0b\u9762\u7684\u53c2\u6570":35,"\u5728\u975e\u5e8f\u5217\u8f93\u5165\u65f6":11,"\u5728build\u76ee\u5f55\u4e0b\u6267\u884c":8,"\u5728c":19,"\u5728cmake\u7684\u547d\u4ee4\u884c\u4e2d":0,"\u5728hpc\u9886\u57df\u4f7f\u7528\u975e\u5e38\u7684\u5e7f\u6cdb":24,"\u5728openmpi\u96c6\u7fa4\u4e2d\u542f\u52a8\u8bad\u7ec3":24,"\u5728paddl":27,"\u5728paddle\u4e2d":35,"\u5728paddlepaddl":19,"\u5728paddlepaddle\u4e2d":[14,41],"\u5728paddlepaddle\u4e2d\u4f7f\u7528dropout\u6709\u4e24\u79cd\u65b9\u5f0f":12,"\u5728paddlepaddle\u4e2d\u5305\u542b\u4ee5\u4e0b":12,"\u5728paddlepaddle\u5185\u90e8":[19,20],"\u5728paddlepaddle\u7684\u6587\u6863\u4e2d":39,"\u5728step\u51fd\u6570\u4e2d\u5b9a\u4e49":41,"\u5728step\u51fd\u6570\u4e2d\u5b9a\u4e49memori":41,"\u5728trainer":35,"\u5728trainer\u4e2d\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u53d6\u6a21\u7684\u65b9\u6cd5\u4e3a\u6bcf\u4e2atrainer\u5206\u914d\u8bad\u7ec3\u6570\u636e\u6587\u4ef6":21,"\u5730\u6bb5":39,"\u5730\u7406\u4f4d\u7f6e":39,"\u5730\u94c1\u7ad9":39,"\u5747\u4f1a\u5b58\u653e\u4e8e":17,"\u5747\u5300\u5206\u5e03":13,"\u5747\u5300\u5206\u5e03\u7684\u8303\u56f4\u662f":33,"\u5747\u6709\u4e09\u4e2a\u5b50\u5e8f\u5217":39,"\u5747\u6709\u4e24\u7ec4\u7279\u5f81":39,"\u57fa\u4e8e\u53cc\u5c42\u5e8f\u5217\u8f93\u5165":41,"\u57fa\u672c\u4f7f\u7528\u6982\u5ff5":[15,20],"\u57fa\u7840\u4e0a":19,"\u586b\u5199":4,"\u589e\u52a0\u4e86\u4e00\u6761cd\u547d\u4ee4":26,"\u589e\u52a0\u5982\u4e0b\u53c2\u6570":35,"\u589e\u52a0\u68af\u5ea6\u68c0\u6d4b\u7684\u5355\u5143\u6d4b\u8bd5":6,"\u5904\u7406\u5668\u6709\u4e24\u4e2a\u5173\u952e\u6027\u80fd\u9650\u5236":37,"\u5904\u7406\u7684\u8f93\u5165\u5e8f\u5217\u4e3b\u8981\u5206\u4e3a\u4ee5\u4e0b\u4e09\u79cd\u7c7b\u578b":41,"\u5907\u6ce8":37,"\u590d\u6742\u5ea6\u6216\u65f6\u95f4\u590d\u6742\u5ea6":37,"\u5916\u5c42\u5e8f\u5217\u5728":19,"\u5916\u5c42memory\u662f\u4e00\u4e2a\u5143\u7d20":39,"\u5916\u5c42outer_step\u4e2d":39,"\u591a\u4e2a":20,"\u591a\u4e2a\u5c42\u7684\u8f93\u51fa\u77e9\u9635\u7684\u9ad8\u5ea6\u4e0d\u4e00\u81f4\u5bfc\u81f4\u62fc\u63a5\u5931\u8d25":11,"\u591a\u4e2a\u6392\u6210\u4e00\u5217\u7684\u5143\u7d20":19,"\u591a\u4e2a\u8f93\u51fa\u5c42\u5904\u7406\u591a\u4e2a\u4e0d\u540c\u957f\u5ea6\u7684\u5e8f\u5217":11,"\u591a\u4e2aip\u4f7f\u7528":21,"\u591a\u53e5\u8bdd\u8fdb\u4e00\u6b65\u6784\u6210\u4e86\u6bb5\u843d":41,"\u591a\u673a\u8bad\u7ec3":11,"\u591a\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1":20,"\u591a\u8f6e\u5bf9\u8bdd\u7b49\u66f4\u4e3a\u590d\u6742\u7684\u8bed\u8a00\u6570\u636e":41,"\u5927\u4e8e\u7b49\u4e8e\u4e00\u4e2a":20,"\u5927\u591a\u6570\u5c42\u4e0d\u9700\u8981\u8fdc\u7a0b\u7a00\u758f\u8bad\u7ec3\u51fd\u6570":6,"\u5927\u591a\u6570\u5c42\u9700\u8981\u8bbe\u7f6e\u4e3a":6,"\u5927\u591a\u6570\u7f51\u7edc\u5c42\u4e0d\u9700\u8981\u652f\u6301\u8fdc\u7a0b\u7a00\u758f\u66f4\u65b0":6,"\u5927\u5bb6\u53ef\u4ee5\u7528\u628a\u5f00\u53d1\u5de5\u5177\u5b89\u88c5\u8fdb\u5165":0,"\u5927\u5bb6\u53ef\u4ee5\u901a\u8fc7\u5b83\u9605\u8bfb\u6559\u7a0b":1,"\u5927\u5c0f\u4e0d\u4e00\u6837\u65f6":11,"\u5927\u6982\u82b1\u5341\u5206\u949f\u770b\u4e00\u4e0b":0,"\u5929":39,"\u5929\u4e00\u5e7f\u573a":39,"\u5929\u4e00\u9601":39,"\u5934\u4fe1\u606f\u4e2d":13,"\u5934\u6587\u4ef6\u4e2d\u628a\u53c2\u6570\u5b9a\u4e49\u4e3a\u7c7b\u7684\u6210\u5458\u53d8\u91cf":6,"\u5934\u6587\u4ef6\u5982\u4e0b":6,"\u597d":39,"\u597d\u5403":39,"\u5982":[4,35,42],"\u5982\u4e0b":21,"\u5982\u4e0b\u56fe\u6240\u793a":[37,39],"\u5982\u4e0b\u6240\u793a":35,"\u5982\u4f55\u5b9e\u73b0\u65b0\u7684\u7f51\u7edc\u5c42":5,"\u5982\u4f55\u8d21\u732e\u4ee3\u7801":5,"\u5982\u4f55\u8d21\u732e\u6587\u6863":5,"\u5982\u679c\u4e00\u4e2a\u7f51\u7edc\u5c42\u9700\u8981\u914d\u7f6e\u7684\u8bdd":6,"\u5982\u679c\u4e0d\u4e3a0":33,"\u5982\u679c\u4e0d\u4f7f\u7528\u5206\u5e03\u5f0f\u5b58\u50a8":21,"\u5982\u679c\u4e0d\u60f3\u4f7f\u7528":7,"\u5982\u679c\u4e0d\u6536\u655b":13,"\u5982\u679c\u4e3a0":33,"\u5982\u679c\u4e3a\u5426\u5219\u662f\u7528openbla":0,"\u5982\u679c\u4e3afals":33,"\u5982\u679c\u4e3atrue":33,"\u5982\u679c\u4e4b\u540e\u60f3\u8981\u91cd\u65b0\u8bbe\u7f6e":0,"\u5982\u679c\u4ec5\u4ec5\u4fee\u6539\u4e00\u4e2a\u6587\u4ef6\u4f46\u63d0\u4ea4\u4e86\u5341\u51e0\u4e2acommit":4,"\u5982\u679c\u4ecd\u7136\u5b58\u5728\u95ee\u9898":3,"\u5982\u679c\u4ed4\u7ec6\u8bbe\u7f6e\u7684\u8bdd":33,"\u5982\u679c\u4f60\u53ea\u9700\u8981\u4f7f\u7528\u7b80\u5355\u7684rnn":42,"\u5982\u679c\u4f60\u60f3\u4f7f\u7528\u8fd9\u4e9b\u7279\u6027":35,"\u5982\u679c\u4f60\u60f3\u8981\u4fdd\u5b58\u67d0\u4e9b\u5c42\u7684\u7279\u5f81\u56fe":33,"\u5982\u679c\u4f60\u6b63\u5728\u5904\u7406\u5e8f\u5217\u6807\u8bb0\u4efb\u52a1":42,"\u5982\u679c\u4f60\u8981\u4e3a\u4e86\u6d4b\u8bd5\u800c\u589e\u52a0\u65b0\u7684\u6587\u4ef6":6,"\u5982\u679c\u4f7f\u7528":20,"\u5982\u679c\u4f7f\u7528\u81ea\u884c":0,"\u5982\u679c\u4f7f\u7528mkl\u5e76\u4e14\u673a\u5668\u542b\u6709avx2\u6307\u4ee4\u96c6":0,"\u5982\u679c\u5173\u95edmkl":0,"\u5982\u679c\u51fa\u73b0\u4ee5\u4e0bpython\u76f8\u5173\u7684\u5355\u5143\u6d4b\u8bd5\u90fd\u8fc7\u4e0d\u4e86\u7684\u60c5\u51b5":8,"\u5982\u679c\u53c2\u6570\u4fdd\u5b58\u4e0b\u6765\u7684\u6a21\u578b\u76ee\u5f55":11,"\u5982\u679c\u53d1\u73b0\u6700\u65e9\u7684\u62a5\u9519\u5c31\u662f\u7f51\u7edc\u901a\u4fe1\u7684\u95ee\u9898":9,"\u5982\u679c\u540c\u65f6\u4f7f\u7528":21,"\u5982\u679c\u5728\u5b89\u88c5\u8fc7\u7a0b\u4e2d\u9047\u5230\u4e86\u95ee\u9898":2,"\u5982\u679c\u5728\u70b9\u51fb\u4e0b\u9762\u94fe\u63a5\u65f6\u51fa\u73b0\u5982\u4e0b\u767b\u9646\u754c\u9762":3,"\u5982\u679c\u5728\u7f16\u8bd1":17,"\u5982\u679c\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u672a\u8bbe\u7f6easync":33,"\u5982\u679c\u5728\u8bad\u7ec3\u671f\u95f4\u540c\u65f6\u53d1\u8d77\u53e6\u5916\u4e00\u4e2a\u8fdb\u7a0b\u8fdb\u884c\u6d4b\u8bd5":33,"\u5982\u679c\u5728\u8bad\u7ec3\u914d\u7f6e\u4e2d\u8bbe\u7f6ebatch":33,"\u5982\u679c\u5728\u8bad\u7ec3nlp\u76f8\u5173\u6a21\u578b\u65f6":13,"\u5982\u679c\u5c06\u8fd9\u4e2a\u5185\u5b58\u6c60\u51cf\u5c0f":11,"\u5982\u679c\u5c0f\u4e8e75m":8,"\u5982\u679c\u5df2\u7ecf\u6709pod\u8fd0\u884c":27,"\u5982\u679c\u5e0c\u671b\u53ef\u4ee5\u5728\u540e\u53f0\u8fd0\u884cpserver\u7a0b\u5e8f":21,"\u5982\u679c\u5f53\u524dmpi\u96c6\u7fa4\u5e76\u4e0d\u652f\u6301\u4efb\u52a1\u72ec\u5360\u6a21\u5f0f":9,"\u5982\u679c\u60a8\u5728\u4f7f\u7528window":1,"\u5982\u679c\u60a8\u60f3\u8981\u66f4\u6df1\u5165\u4e86\u89e3deep":1,"\u5982\u679c\u60a8\u671f\u671b\u5728\u7f16\u8bd1\u5b8c\u6210\u540e\u7acb\u5373\u6267\u884c\u6240\u6709\u7684\u5355\u5143\u6d4b\u8bd5":0,"\u5982\u679c\u60a8\u6ca1\u6709\u542c\u8bf4":0,"\u5982\u679c\u60a8\u7684\u7535\u8111\u4e0d\u652f\u6301avx":1,"\u5982\u679c\u60a8\u7684\u95ee\u9898\u672a\u5728\u6b64\u5904":10,"\u5982\u679c\u60a8\u7684gpu\u7406\u8bba\u53ef\u4ee5\u8fbe\u52306":37,"\u5982\u679c\u60a8\u9009\u62e9\u4e0d\u4f7f\u7528docker\u955c\u50cf":0,"\u5982\u679c\u60f3\u4f7f\u7528\u53ef\u89c6\u5316\u7684\u5206\u6790\u5668":37,"\u5982\u679c\u60f3\u5f88\u597d\u7684\u7406\u89e3\u7a0b\u5e8f\u7684\u884c\u4e3a":37,"\u5982\u679c\u60f3\u8981\u4e86\u89e3\u53cc\u5c42rnn\u5728\u5177\u4f53\u95ee\u9898\u4e2d\u7684\u4f7f\u7528":39,"\u5982\u679c\u60f3\u8981\u542f\u7528paddlepaddle\u7684\u5185\u7f6e\u5b9a\u65f6\u5668":37,"\u5982\u679c\u60f3\u8be6\u7ec6\u4e86\u89e3":24,"\u5982\u679c\u6211\u77e5\u9053\u5185\u6838\u82b1\u4e8610ms\u6765\u79fb\u52a81gb\u6570\u636e":37,"\u5982\u679c\u6307\u5b9a\u4e862\u4e2alayer\u4f5c\u4e3a\u8f93\u51fa\u5c42":11,"\u5982\u679c\u63d0\u793a\u6b63\u786e":7,"\u5982\u679c\u652f\u6301\u589e\u52a0\u6b64\u53c2\u6570\u63d0\u4ea4":9,"\u5982\u679c\u662f\u4e00\u4e2a\u5e8f\u5217\u8f93\u5165":19,"\u5982\u679c\u662f\u7528\u7f16\u8bd1\u65f6\u6307\u5b9acpu\u7248\u672c":17,"\u5982\u679c\u6709\u591a\u4e2a\u8f93\u5165":41,"\u5982\u679c\u6709\u591a\u4e2a\u8f93\u5165\u5e8f\u5217":41,"\u5982\u679c\u6709\u9700\u8981\u4fee\u6539\u7684\u5730\u65b9":4,"\u5982\u679c\u671f\u671b\u6267\u884c\u5176\u4e2d\u4e00\u4e2a\u5355\u5143\u6d4b\u8bd5":0,"\u5982\u679c\u672a\u8bbe\u7f6e":33,"\u5982\u679c\u672a\u8bbe\u7f6egpu":35,"\u5982\u679c\u673a\u5668\u4e2d\u5df2\u7ecf\u5b89\u88c5\u8fc7paddlepaddl":0,"\u5982\u679c\u67d0\u4e00\u5757\u6839\u672c\u5c31\u4e0d\u600e\u4e48\u8017\u65f6":37,"\u5982\u679c\u68c0\u67e5\u5230\u5206\u914d\u5728\u4e0d\u540c\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u7684\u53c2\u6570\u7684\u5206\u5e03\u4e0d\u5747\u5300\u6b21\u6570\u5927\u4e8echeck":33,"\u5982\u679c\u6ca1\u6709\u5b89\u88c5nvidia":1,"\u5982\u679c\u6ca1\u6709\u5b9a\u4e49memori":41,"\u5982\u679c\u6ca1\u8fc7":4,"\u5982\u679c\u6d88\u606f\u6570\u636e\u592a\u5c0f":33,"\u5982\u679c\u7528\u516c\u7528\u7684\u7535\u8111\u5f00\u53d1":0,"\u5982\u679c\u7528\u6237\u4e0d\u9700\u8981\u8bbf\u95eelstm\u7684\u4e2d\u95f4\u53d8\u91cf":12,"\u5982\u679c\u7528\u6237\u60f3\u8981\u81ea\u5b9a\u4e49\u521d\u59cb\u5316\u65b9\u5f0f":13,"\u5982\u679c\u7528\u81ea\u5df1\u7684\u7535\u8111\u5f00\u53d1":0,"\u5982\u679c\u771f\u60f3\u6316\u6398\u5185\u6838\u6df1\u5904\u7684\u67d0\u4e2a\u79d8\u5bc6":37,"\u5982\u679c\u795e\u7ecf\u7f51\u7edc\u6709\u591a\u4e2a\u8f93\u5165\u6216\u8005\u591a\u4e2a\u8f93\u51fa":[19,20],"\u5982\u679c\u7a0b\u5e8f\u5d29\u6e83\u4f60\u4e5f\u53ef\u4ee5\u624b\u52a8\u7ec8\u6b62":23,"\u5982\u679c\u7cfb\u7edf\u5b89\u88c5\u4e86\u591a\u4e2apython\u7248\u672c":8,"\u5982\u679c\u7cfb\u7edf\u652f\u6301":[3,8],"\u5982\u679c\u7cfb\u7edf\u652f\u6301\u7684\u662f":[3,8],"\u5982\u679c\u7f16\u8bd1\u65f6\u6307\u5b9a\u7f16\u8bd1cpu\u7248\u672c":17,"\u5982\u679c\u7f16\u8bd1\u65f6\u6307\u5b9a\u7f16\u8bd1gpu\u7248\u672c":17,"\u5982\u679c\u7f51\u7edc\u5c42\u4e0d\u9700\u8981\u8fdc\u7a0b\u7a00\u758f\u66f4\u65b0":6,"\u5982\u679c\u7f51\u7edc\u67b6\u6784\u7b80\u5355":42,"\u5982\u679c\u8981\u542f\u7528gpu":31,"\u5982\u679c\u8981\u8fd0\u884c\u6240\u6709\u7684\u5355\u5143\u6d4b\u8bd5":4,"\u5982\u679c\u89e3\u51b3\u4e86\u67d0\u4e2aissue\u7684\u95ee\u9898":4,"\u5982\u679c\u8bad\u7ec3\u4e00\u4e2apass":13,"\u5982\u679c\u8bad\u7ec3\u8fc7\u7a0b\u7684\u7684cost\u660e\u663e\u9ad8\u4e8e\u8fd9\u4e2a\u5e38\u6570\u8f93\u51fa\u7684cost":13,"\u5982\u679c\u8bbe\u7f6e\u8be5\u53c2\u6570":33,"\u5982\u679c\u8bc4\u5ba1\u610f\u89c1\u6bd4\u8f83\u591a":4,"\u5982\u679c\u8f93\u5165\u662f\u5e8f\u5217\u6570\u636e":19,"\u5982\u679c\u8f93\u51fa\u662f\u4e00\u4e2a\u5e8f\u5217":19,"\u5982\u679c\u8f93\u51fa\u662fno":1,"\u5982\u679c\u8fd0\u884c":8,"\u5982\u679c\u8fd8\u4e0d\u884c":8,"\u5982\u679c\u95ee\u9898\u6ca1\u6709\u5f97\u5230\u89e3\u51b3":2,"\u5982\u679c\u9700\u8981":17,"\u5982\u679c\u9700\u8981\u5728c\u7ef4\u5ea6\u8ba1\u7b97softmax":12,"\u5982\u679c\u9700\u8981\u5b89\u88c5\u652f\u6301gpu\u7684\u7248\u672c":[3,16],"\u5982\u679c\u9700\u8981\u6269\u5927\u77e9\u9635":6,"\u5982\u679c\u9700\u8981\u7f29\u51cf\u77e9\u9635":6,"\u5982\u679c\u9700\u8981\u83b7\u53d6":3,"\u5982\u679clearning_rate\u592a\u5927":13,"\u5982\u679clearning_rate\u592a\u5c0f":13,"\u5982\u679cpaddlepaddle\u5305\u5df2\u7ecf\u5728python\u7684sit":8,"\u5982\u6bcf\u4e2a\u6587\u4ef6\u53ea\u6709\u4e00\u4e2a":4,"\u5982\u795e\u7ecf\u5143\u6fc0\u6d3b\u503c\u7b49":11,"\u5982\u8981build\u8fd9\u4e2a\u5f00\u53d1\u955c\u50cf":4,"\u5982\u9ad8\u4eae\u90e8\u5206":37,"\u5982train":21,"\u5b50":39,"\u5b50\u53e5":41,"\u5b50\u53e5\u7684\u5355\u8bcd\u6570\u548c\u6307\u5b9a\u7684\u4e00\u4e2a\u8f93\u5165\u5e8f\u5217\u4e00\u81f4":41,"\u5b57\u6bb5\u4e2d":27,"\u5b57\u6bb5\u4e3a\u4f8b":11,"\u5b57\u6bb5\u7684\u53d6\u503c":19,"\u5b57\u6bb5\u8868\u793a\u5bb9\u5668\u7684\u73af\u5883\u53d8\u91cf":27,"\u5b57\u6bb5\u8868\u793a\u8fd9\u4e2ajob\u4f1a\u540c\u65f6\u5f00\u542f3\u4e2apaddlepaddle\u8282\u70b9":27,"\u5b58\u50a8":[19,20],"\u5b58\u50a8\u6d6e\u70b9\u7c7b\u578b\u8f93\u5165":20,"\u5b58\u50a8\u7684\u538b\u7f29\u6587\u4ef6":20,"\u5b58\u6570\u6570\u636e":19,"\u5b66\u4e60":0,"\u5b66\u4e60\u7387\u4e3a":13,"\u5b81\u6ce2":39,"\u5b83\u5305\u542b\u4ee5\u4e0b\u51e0\u6b65":6,"\u5b83\u5305\u542b\u4ee5\u4e0b\u53c2\u6570":6,"\u5b83\u53eb\u505a":42,"\u5b83\u53ef\u4ee5\u5e2e\u52a9\u51cf\u5c11\u5206\u53d1\u5ef6\u8fdf":23,"\u5b83\u53ef\u4ee5\u5e2e\u52a9\u6211\u4eec\u683c\u5f0f\u5316\u6e90\u4ee3\u7801":4,"\u5b83\u53ef\u4ee5\u6307\u6d4b\u91cf\u4e00\u4e2a\u7a0b\u5e8f\u7684\u7a7a\u95f4":37,"\u5b83\u53ef\u80fd\u6709\u4e0d\u6b62\u4e00\u4e2a\u6743\u91cd":6,"\u5b83\u5b9a\u4e49\u4e86":42,"\u5b83\u5b9a\u4e49\u89e3\u7801\u7f51\u7edc\u7684":42,"\u5b83\u5c06\u88ab\u5206\u53d1\u5230":23,"\u5b83\u5e76\u4e0d\u662f\u4e00\u4e2a\u5b8c\u6574\u7684recurr":12,"\u5b83\u5e94\u8be5\u6253\u5370\u51fa\u9884\u6d4b\u4f4f\u623f\u6570\u636e\u7684\u6e05\u5355":16,"\u5b83\u652f\u6301\u591a\u7ebf\u7a0b\u66f4\u65b0":6,"\u5b83\u662finteger_value\u7c7b\u578b\u7684":39,"\u5b83\u662finteger_value_sequence\u7c7b\u578b\u7684":39,"\u5b83\u6709\u52a9\u4e8e\u5e2e\u52a9\u9891\u7e41\u4fee\u6539\u548c\u8bbf\u95ee\u5de5\u4f5c\u533a\u6587\u4ef6\u7684\u7528\u6237\u51cf\u5c11\u8d1f\u62c5":23,"\u5b83\u7684":42,"\u5b83\u7684\u6bcf\u4e00\u4e2a\u5143\u7d20":38,"\u5b83\u7684\u8f93\u5165\u4e0e\u7ecf\u8fc7\u5b66\u4e60\u7684\u53c2\u6570\u505a\u5185\u79ef\u5e76\u52a0\u4e0a\u504f\u7f6e":6,"\u5b83\u9996\u5148\u8c03\u7528\u57fa\u6784\u9020\u51fd\u6570":6,"\u5b89\u6392":39,"\u5b89\u88c5\u4e0e\u7f16\u8bd1":43,"\u5b89\u88c5\u4e0e\u7f16\u8bd1c":18,"\u5b89\u88c5\u540e":1,"\u5b89\u88c5\u597ddocker\u4e4b\u540e\u53ca\u53ef\u7528\u4ee5\u4e0b\u547d\u4ee4\u542f\u52a8\u5de5\u5177":7,"\u5b89\u88c5\u597ddocker\u4e4b\u540e\u53ef\u4ee5\u4f7f\u7528\u6e90\u7801\u76ee\u5f55\u4e0b\u7684\u811a\u672c\u6784\u5efa\u6587\u6863":7,"\u5b89\u88c5\u5b8c\u6210\u4e4b\u540e":31,"\u5b89\u88c5\u6587\u6863":14,"\u5b89\u88c5\u65b9\u5f0f\u6765\u5feb\u901f\u5b89\u88c5paddlepaddl":31,"\u5b89\u9759":39,"\u5b8c\u6210":4,"\u5b8c\u6210\u4e0a\u8ff0\u51c6\u5907\u4e4b\u540e":20,"\u5b8c\u6210\u4efb\u610f\u7684\u8fd0\u7b97\u903b\u8f91":41,"\u5b8c\u6210\u540evolume\u4e2d\u7684\u6587\u4ef6\u5185\u5bb9\u5927\u81f4\u5982\u4e0b":27,"\u5b8c\u6210\u5728windows\u4e0a\u5b89\u88c5\u548c\u4f7f\u7528dock":1,"\u5b8c\u6210\u5b89\u88c5":3,"\u5b8c\u6210\u76f8\u5e94\u7684\u8ba1\u7b97":38,"\u5b8c\u6210paddlepaddle\u7684\u5b89\u88c5":14,"\u5b8c\u6574\u4ee3\u7801\u53ef\u4ee5\u53c2\u8003\u793a\u4f8b":11,"\u5b8c\u6574\u4ee3\u7801\u53ef\u4ee5\u67e5\u770b":20,"\u5b8c\u6574\u6e90\u7801\u53ef\u53c2\u8003":13,"\u5b8c\u6574\u7684\u53c2\u6570\u77e9\u9635\u88ab\u5206\u5e03\u5728\u4e0d\u540c\u7684\u53c2\u6570\u670d\u52a1\u5668\u4e0a":6,"\u5b8c\u6574\u7684\u914d\u7f6e\u6587\u4ef6\u5728":42,"\u5b98\u65b9\u6587\u6863":0,"\u5b9a\u4e49\u4e00\u4e2a\u65f6\u95f4\u6b65\u4e4b\u5185rnn\u5355\u5143\u5b8c\u6210\u7684\u8ba1\u7b97":41,"\u5b9a\u4e49\u4e86\u4e00\u4e2a\u53ea\u8bfb\u7684memori":41,"\u5b9a\u4e49\u4e86lstm\u5355\u5143\u5728\u4e00\u4e2a\u65f6\u95f4\u6b65\u5185\u7684\u8ba1\u7b97\u8fc7\u7a0b":12,"\u5b9a\u4e49\u5728\u5916\u5c42":41,"\u5b9a\u4e49\u5f02\u6b65\u8bad\u7ec3\u7684\u957f\u5ea6":33,"\u5b9a\u4e49\u6e90\u8bed\u53e5\u7684\u6570\u636e\u5c42":42,"\u5b9a\u4e49\u7684\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784":20,"\u5b9a\u4e49\u89e3\u7801\u5668\u7684memori":42,"\u5b9a\u4e49\u8f93\u51fa\u51fd\u6570":42,"\u5b9a\u4e49\u95e8\u63a7\u5faa\u73af\u5355\u5143\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u5355\u6b65\u51fd\u6570":42,"\u5b9d\u5854\u7684\u5e95\u7aef\u9700\u8981\u575a\u5b9e\u7684\u57fa\u5ea7\u6765\u652f\u6491":14,"\u5b9e\u73b0\u4e24\u4e2a\u5b8c\u5168\u7b49\u4ef7\u7684\u5168\u8fde\u63a5rnn":39,"\u5b9e\u73b0\u524d\u5411\u4f20\u64ad\u7684\u90e8\u5206\u6709\u4e0b\u9762\u51e0\u4e2a\u6b65\u9aa4":6,"\u5b9e\u73b0\u5355\u6b65\u51fd\u6570":42,"\u5b9e\u73b0\u540e\u5411\u4f20\u64ad\u7684\u90e8\u5206\u6709\u4e0b\u9762\u51e0\u4e2a\u6b65\u9aa4":6,"\u5b9e\u73b0\u6784\u9020\u51fd\u6570":6,"\u5b9e\u73b0\u7684":12,"\u5b9e\u73b0\u7ec6\u8282":6,"\u5b9e\u73b0\u7f51\u7edc\u5c42\u7684\u524d\u5411\u4f20\u64ad":6,"\u5b9e\u73b0\u7f51\u7edc\u5c42\u7684\u540e\u5411\u4f20\u64ad":6,"\u5b9e\u73b0\u8bcd\u8bed\u548c\u53e5\u5b50\u4e24\u4e2a\u7ea7\u522b\u7684\u53cc\u5c42rnn\u7ed3\u6784":41,"\u5b9e\u73b0\u8be5\u5c42\u7684c":6,"\u5b9e\u9645\u4e0a\u4f7f\u7528\u4e86":12,"\u5b9e\u9645\u4e0a\u9700\u8981\u7684\u8f93\u51fa\u7ed3\u679c\u662f\u4e24\u4e2a\u77e9\u9635":11,"\u5ba2\u6237":39,"\u5bb6":39,"\u5bb9\u5668\u8fd0\u884c\u90fd\u8fd0\u884c":27,"\u5bbd\u5ea6":19,"\u5bbd\u5ea6\u4e3a":19,"\u5bbd\u5ea6\u7b49\u4e8e\u914d\u7f6e\u4e2dlayer\u7684s":11,"\u5bc4\u5b58\u5668\u4f7f\u7528\u60c5\u51b5\u548c\u5171\u4eab\u5185\u5b58\u4f7f\u7528\u60c5\u51b5\u80fd\u8ba9\u6211\u4eec\u5bf9gpu\u7684\u6574\u4f53\u4f7f\u7528\u6709\u66f4\u597d\u7684\u7406\u89e3":37,"\u5bf9":[20,39],"\u5bf9\u4e00\u4e2a5\u7ef4\u975e\u5e8f\u5217\u7684\u7a00\u758f01\u5411\u91cf":14,"\u5bf9\u4e00\u4e2a5\u7ef4\u975e\u5e8f\u5217\u7684\u7a00\u758f\u6d6e\u70b9\u5411\u91cf":14,"\u5bf9\u4e8e":42,"\u5bf9\u4e8e\u4e0d\u540c\u7684\u8bad\u7ec3\u4efb\u52a1":21,"\u5bf9\u4e8e\u4e24\u79cd\u4e0d\u540c\u7684\u8f93\u5165\u6570\u636e\u7c7b\u578b":39,"\u5bf9\u4e8e\u4e60\u60ef\u4f7f\u7528windows\u548cmacos\u7684\u5f00\u53d1\u8005\u6765\u8bf4":0,"\u5bf9\u4e8e\u5355\u5c42rnn":39,"\u5bf9\u4e8e\u5355\u5c42rnn\u7684\u6570\u636e\u4e00\u5171\u6709\u4e24\u4e2a\u6837\u672c":39,"\u5bf9\u4e8e\u53cc\u5c42rnn":39,"\u5bf9\u4e8e\u540c\u6837\u7684\u6570\u636e":39,"\u5bf9\u4e8e\u56fd\u5185\u7528\u6237":1,"\u5bf9\u4e8e\u6211\u4eec\u652f\u6301\u7684\u5168\u90e8\u77e9\u9635\u64cd\u4f5c":6,"\u5bf9\u4e8e\u6709\u5b9a\u5236\u5316\u4e8c\u8fdb\u5236\u6587\u4ef6\u9700\u6c42\u7684\u7528\u6237":2,"\u5bf9\u4e8e\u672c\u6837\u4f8b\u4ee3\u7801":21,"\u5bf9\u4e8e\u6bb5\u843d\u7684\u6587\u672c\u5206\u7c7b":39,"\u5bf9\u4e8e\u6bcf\u4e00\u4e2a\u5355\u5c42rnn\u7684\u6570\u636e":39,"\u5bf9\u4e8e\u8fd9\u6837\u7684\u9700\u6c42":20,"\u5bf9\u4e8e\u914d\u5907\u6709\u6ce8\u610f\u529b\u673a\u5236\u7684\u89e3\u7801\u5668":42,"\u5bf9\u4e8enchw":12,"\u5bf9\u4ee3\u7801\u8fdb\u884c\u6027\u80fd\u5206\u6790":37,"\u5bf9\u4f7f\u7528\u7684\u4e2d\u95f4\u53d8\u91cf\u548c\u8d44\u6e90\u8fdb\u884c\u6e05\u7406\u548c\u91ca\u653e":20,"\u5bf9\u5168\u8fde\u63a5\u5c42\u6765\u8bf4":6,"\u5bf9\u52a0\u8f7d\u9884\u8bad\u7ec3\u53c2\u6570\u7684\u5c42":13,"\u5bf9\u53cc\u5c42\u5e8f\u5217\u6765\u8bb2":19,"\u5bf9\u5df2\u7ecfpush\u5230\u8fdc\u7a0b\u4ed3\u5e93\u7684\u591a\u4e2acommit":4,"\u5bf9\u5e94\u4e00\u4e2a\u5b50\u53e5":41,"\u5bf9\u5e94\u4e00\u4e2a\u8bcd":41,"\u5bf9\u5e94\u4e8e\u8c03\u7528c":19,"\u5bf9\u5e94\u7740\u4e0a\u6587\u63d0\u5230\u7684\u4e00\u7ef4\u6574\u578b\u6570\u7ec4":19,"\u5bf9\u5e94\u7740\u4e0a\u6587\u63d0\u5230\u7684\u4e8c\u7ef4\u6d6e\u70b9\u578b\u77e9\u9635":19,"\u5bf9\u63a8\u8350\u914d\u7f6e\u4e2d\u7684\u9009\u9879\u5efa\u8bae\u6309\u7167\u8bbe\u7f6e":17,"\u5bf9\u6bcf\u4e2a\u8f93\u5165":6,"\u5bf9\u6bcf\u4e2a\u8f93\u5165\u4e58\u4e0a\u53d8\u6362\u77e9\u9635":6,"\u5bf9\u6fc0\u6d3b\u6c42\u5bfc":6,"\u5bf9\u795e\u7ecf\u7f51\u7edc\u6765\u8bf4":19,"\u5bf9\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u8fdb\u884c\u5e8f\u5217\u5316":20,"\u5bf9\u8bc4\u5ba1\u610f\u89c1\u4e0d\u540c\u610f\u7684":4,"\u5bf9\u8bc4\u5ba1\u610f\u89c1\u540c\u610f\u4e14\u6309\u5176\u4fee\u6539\u5b8c\u7684":4,"\u5bf9\u8c61":13,"\u5bf9\u8c61\u5206\u914d\u7a7a\u95f4":20,"\u5bf9\u8f93\u5165\u6570\u636e\u7684\u683c\u5f0f\u505a\u6e05\u6670\u7b80\u6d01\u7684\u5c01\u88c5":18,"\u5bf9\u8f93\u51fa\u7684\u5408\u5e76":41,"\u5bf9\u9762":39,"\u5bf9sparse_binary_vector\u548csparse_float_vector":14,"\u5bfc\u81f4\u4e86\u6d6e\u70b9\u6570\u6ea2\u51fa":11,"\u5bfc\u81f4\u53c2\u6570\u6536\u655b\u5230\u4e86\u4e00\u4e9b\u5947\u5f02\u7684\u60c5\u51b5":11,"\u5bfc\u81f4\u53c2\u6570\u7d2f\u52a0":11,"\u5bfc\u81f4\u7f16\u8bd1paddlepaddle\u5931\u8d25":8,"\u5bfc\u81f4\u8bad\u7ec3\u65f6\u95f4\u8fc7\u957f":13,"\u5bfc\u81f4mklml\u5e93\u4e0b\u8f7d\u4e0d\u6210\u529f":8,"\u5c01\u88c5\u4e86":37,"\u5c01\u88c5\u8be5\u5c42\u7684python\u63a5\u53e3":6,"\u5c06":[13,37],"\u5c06\u4e0a\u4e00\u65f6\u95f4\u6b65\u6240\u751f\u6210\u7684\u8bcd\u7684\u5411\u91cf\u6765\u4f5c\u4e3a\u5f53\u524d\u65f6\u95f4\u6b65\u7684\u8f93\u5165":42,"\u5c06\u4f1a\u4f18\u5148\u4f7f\u7528":21,"\u5c06\u4f1a\u81ea\u52a8\u8ba1\u7b97\u51fa\u4e00\u4e2a\u5408\u9002\u7684\u503c":33,"\u5c06\u4f1a\u88ab\u629b\u5f03":21,"\u5c06\u5176\u8bbe\u7f6e\u6210":11,"\u5c06\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u5148\u53d8\u6362\u6210\u5355\u5c42\u65f6\u95f4\u5e8f\u5217\u6570\u636e":39,"\u5c06\u542b\u6709\u5b50\u53e5":41,"\u5c06\u542b\u6709\u8bcd\u8bed\u7684\u53e5\u5b50\u5b9a\u4e49\u4e3a\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":41,"\u5c06\u56fe\u7247\u5206\u7c7b\u5230":20,"\u5c06\u591a\u53e5\u8bdd\u770b\u6210\u4e00\u4e2a\u6574\u4f53\u540c\u65f6\u4f7f\u7528encoder\u538b\u7f29":39,"\u5c06\u591a\u53f0\u673a\u5668\u7684\u6d4b\u8bd5\u7ed3\u679c\u5408\u5e76":33,"\u5c06\u5b57\u5178\u7684\u5730\u5740\u4f5c\u4e3aargs\u4f20\u7ed9dataprovid":13,"\u5c06\u5b83\u4eec\u653e\u5728\u540c\u4e00\u76ee\u5f55\u4e2d":20,"\u5c06\u5bf9\u5e94\u6570\u636e\u5c42\u7684\u7ef4\u6570\u8bbe\u7f6e\u6210\u4e00\u4e2a\u5927\u4e8e\u8f93\u5165\u6570\u636e\u7ef4\u6570\u7684\u503c\u7528\u4e8e\u5360\u4f4d\u5373\u53ef":12,"\u5c06\u5e8f\u5217\u5316\u7ed3\u679c\u5199\u5165\u4e00\u4e2a\u6587\u4ef6\u5185":20,"\u5c06\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u524d\u5411\u548c\u53cd\u5411\u90e8\u5206\u6df7\u5408\u5728\u4e00\u8d77":42,"\u5c06\u6570\u636e\u5207\u5206\u6210\u591a\u4efd":21,"\u5c06\u65b0\u5efa\u7684\u6743\u91cd\u52a0\u5165\u6743\u91cd\u8868":6,"\u5c06\u660e\u6587\u53c2\u6570\u8f6c\u5316\u4e3apaddlepaddle\u53ef\u52a0\u8f7d\u7684\u6a21\u578b\u53c2\u6570\u65f6":13,"\u5c06\u672c\u5730\u7684\u4fee\u6539\u63a8\u9001\u5230":4,"\u5c06\u6b64\u76ee\u5f55\u6302\u8f7d\u4e3a\u5bb9\u5668\u7684":27,"\u5c06\u73af\u5883\u53d8\u91cf\u8f6c\u6362\u6210paddle\u7684\u547d\u4ee4\u884c\u53c2\u6570":27,"\u5c06\u7ed3\u679c\u4fdd\u5b58\u5230\u6b64\u76ee\u5f55\u91cc":27,"\u5c06\u7f51\u7edc\u7ed3\u6784\u5b9a\u4e49\u548c\u8bad\u7ec3\u7ed3\u675f\u5b58\u50a8\u4e0b\u6765\u7684\u6a21\u578b\u53c2\u6570\u6587\u4ef6":20,"\u5c06\u8bad\u7ec3\u6587\u4ef6\u4e0e\u5207\u5206\u597d\u7684\u6570\u636e\u4e0a\u4f20\u5230\u5171\u4eab\u5b58\u50a8":27,"\u5c06\u8df3\u8fc7\u5206\u53d1\u9636\u6bb5\u76f4\u63a5\u542f\u52a8\u6240\u6709\u8282\u70b9\u7684\u96c6\u7fa4\u4f5c\u4e1a":23,"\u5c06\u8fd9\u79cd\u8de8\u8d8a\u65f6\u95f4\u6b65\u7684\u8fde\u63a5\u7528\u4e00\u4e2a\u7279\u6b8a\u7684\u795e\u7ecf\u7f51\u7edc\u5355\u5143\u5b9e\u73b0":39,"\u5c06\u8fdc\u7a0b\u4ed3\u5e93":4,"\u5c06\u900f\u660e":23,"\u5c06\u9700\u8981\u8f93\u51fa\u7684\u5c42\u4f5c\u4e3a":11,"\u5c06cuda\u5e93\u548clinux\u8bbe\u5907\u6302\u8f7d\u5230docker\u5bb9\u5668\u5185":1,"\u5c06ip\u6392\u5e8f\u751f\u6210\u7684\u5e8f\u53f7\u4f5c\u4e3atrain":27,"\u5c06node\u8282\u70b9\u7684ip\u5730\u5740\u4fdd\u5b58\u5230machines\u6587\u4ef6\u4e2d":28,"\u5c06paddlepaddle\u4fdd\u5b58\u7684\u6a21\u578b\u53c2\u6570\u8fd8\u539f\u56de\u660e\u6587\u65f6":13,"\u5c06recurr":12,"\u5c1a\u53ef":39,"\u5c31":39,"\u5c31\u4f1a\u5728\u5b8c\u6210\u7f16\u8bd1\u4e4b\u540e":0,"\u5c31\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u7684\u547d\u4ee4\u5f00\u59cb\u6267\u884c\u8bad\u7ec3":1,"\u5c31\u53ef\u4ee5\u6309":0,"\u5c31\u5c06\u8fd9\u4e9b\u5c42\u52a0\u5165\u4e00\u4e2apython":20,"\u5c31\u5f88\u5bb9\u6613\u5bfc\u81f4\u5185\u5b58\u8d85\u9650":11,"\u5c31\u662f":39,"\u5c31\u662f\u7528\u4e8e\u5c55\u793a\u4e0a\u8ff0\u5206\u6790\u5de5\u5177\u7684\u7528\u6cd5":37,"\u5c31\u662fpaddlepaddle\u4e2d\u6240\u6307\u7684":19,"\u5c31\u8fd9\u4e48\u7b80\u5355":1,"\u5c31\u901a\u5e38\u7684gpu\u6027\u80fd\u5206\u6790\u6765\u8bf4":37,"\u5c31\u9700\u8981\u9009\u62e9\u4f7f\u7528no":1,"\u5c42\u524d\u5411\u8ba1\u7b97\u7684\u7ed3\u679c":20,"\u5c42\u548c\u8f93\u5165\u7684\u914d\u7f6e":6,"\u5c42\u6b21\u5316\u7684rnn":41,"\u5c42\u7684\u540d\u79f0\u4e0e":42,"\u5c42\u7684\u5927\u5c0f":6,"\u5c42\u7684\u7c7b\u578b":6,"\u5c42\u7684\u8f93\u51fa\u88ab\u7528\u4f5c\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684":42,"\u5c45\u7136":39,"\u5c55\u793a\u4e86\u4e00\u4e2a\u542b\u67094\u4e2a\u5e8f\u5217\u7684":19,"\u5c55\u793a\u4e86\u90e8\u5206\u547d\u4ee4\u884c\u53c2\u6570\u7684\u4f7f\u7528":34,"\u5c5e\u4e8e\u8fd9\u4e00\u7c7b\u7684\u5b9e\u73b0":12,"\u5de5\u4f5c\u6a21\u5f0f":33,"\u5de5\u4f5c\u7a7a\u95f4\u4e2d\u7684":23,"\u5de5\u4f5c\u7a7a\u95f4\u5e94\u5982\u4e0b\u6240\u793a":21,"\u5de5\u5177\u670d\u52a1\u5668\u5c06\u8bfb\u53d6\u73af\u5883\u53d8\u91cf":7,"\u5de5\u5177\u6765\u7ba1\u7406":4,"\u5de5\u5177\u6765\u7f16\u8bd1\u6587\u6863":7,"\u5df2\u6253\u5f00":4,"\u5df2\u7ecf\u5728\u96c6\u7fa4\u63d0\u4ea4\u73af\u5883\u4e2d\u5b8c\u6210\u8bbe\u7f6e":33,"\u5e02\u9762\u4e0a\u5df2\u7ecf\u6709nvidia\u6216\u7b2c\u4e09\u65b9\u63d0\u4f9b\u7684\u4f17\u591a\u5de5\u5177":37,"\u5e0c\u671b\u80fd\u591f\u5c06\u5e8f\u5217\u5316\u540e\u7684\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u548c\u8bad\u7ec3\u597d\u7684\u6a21\u578b\u53c2\u6570\u6253\u5305\u8fdb\u4e00\u4e2a\u6587\u4ef6":20,"\u5e2e\u52a9\u6211\u4eec\u5b8c\u6210\u5bf9\u8f93\u5165\u5e8f\u5217\u7684\u62c6\u5206":41,"\u5e2e\u52a9\u6211\u4eec\u66f4\u597d\u5730\u63cf\u8ff0\u6bb5\u843d":41,"\u5e2e\u52a9\u6211\u4eec\u6784\u9020\u4e00\u4e9b\u590d\u6742\u7684\u8f93\u5165\u4fe1\u606f":38,"\u5e38\u5e38\u51fa\u73b0":8,"\u5e38\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1":19,"\u5e38\u89c1\u95ee\u9898\u89e3\u7b54":2,"\u5e72\u51c0":39,"\u5e76\u4e0d\u4fdd\u8bc1":6,"\u5e76\u4e0d\u662f\u4f7f\u7528\u53cc\u5c42rnn\u89e3\u51b3\u5b9e\u9645\u7684\u95ee\u9898":39,"\u5e76\u4e0d\u662fkubernetes\u4e2d\u7684node\u6982\u5ff5":27,"\u5e76\u4e0d\u771f\u6b63":[19,20],"\u5e76\u4e0d\u771f\u6b63\u7684\u548c":39,"\u5e76\u4e0d\u96be":0,"\u5e76\u4e14":42,"\u5e76\u4e14\u4e5f\u53ef\u4ee5\u5728windows\u7684docker\u4e2d\u8fd0\u884c":1,"\u5e76\u4e14\u5185\u5c42\u7684\u5e8f\u5217\u64cd\u4f5c\u4e4b\u95f4\u72ec\u7acb\u65e0\u4f9d\u8d56":39,"\u5e76\u4e14\u52a0\u4e0a\u4e0b\u9762\u7684\u547d\u4ee4\u884c\u53c2\u6570":35,"\u5e76\u4e14\u5305\u62ecunit":4,"\u5e76\u4e14\u53ef\u80fd\u4f1a\u52a0\u901f\u8bad\u7ec3\u8fc7\u7a0b":11,"\u5e76\u4e14\u542f\u52a8\u8bad\u7ec3":27,"\u5e76\u4e14\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528paddlepaddle\u6765\u89e3\u51b3\u4e00\u4e2a\u7ecf\u5178\u7684\u7ebf\u6027\u56de\u5f52\u95ee\u9898":14,"\u5e76\u4e14\u67e5\u8be2paddlepaddle\u5355\u5143\u6d4b\u8bd5\u7684\u65e5\u5fd7":8,"\u5e76\u4e14\u7f16\u8bd1\u80fd\u901a\u8fc7\u4ee3\u7801\u6837\u5f0f\u68c0\u67e5":4,"\u5e76\u4e14\u8f93\u51fa\u4e00\u4e2a":4,"\u5e76\u4e14\u9700\u8981\u91cd\u5199\u57fa\u7c7b\u4e2d\u7684\u4ee5\u4e0b\u51e0\u4e2a\u865a\u51fd\u6570":6,"\u5e76\u4e14softmax\u5c42\u7684\u4e24\u4e2a\u8f93\u5165\u4e5f\u4f7f\u7528\u4e86\u540c\u6837\u7684\u53c2\u6570":13,"\u5e76\u4f7f\u7528":23,"\u5e76\u4fdd\u5b58\u8f93\u51fa\u5230\u4e00\u4e2a\u65e5\u5fd7\u6587\u4ef6":21,"\u5e76\u5177\u5907\u4ee5\u4e0b\u7279\u70b9":18,"\u5e76\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u6587\u4ef6":4,"\u5e76\u521b\u5efaoptim":14,"\u5e76\u53ef\u4ee5\u5728\u5927\u591a\u6570\u4e3b\u6d41\u7684linux\u64cd\u4f5c\u7cfb\u7edf\u4ee5\u53camacos\u4e0a\u6267\u884c":3,"\u5e76\u548c\u53c2\u6570\u670d\u52a1\u5668\u901a\u4fe1":22,"\u5e76\u5728\u4e58\u79ef\u7ed3\u679c\u4e0a\u518d\u52a0\u4e0a\u7ef4\u5ea6\u4e3a":6,"\u5e76\u5728\u6700\u5f00\u59cb\u521d\u59cb\u5316\u4e3a\u8d77\u59cb\u8bcd":42,"\u5e76\u5728\u7c7b\u6784\u5efa\u51fd\u6570\u4e2d\u628a\u5b83\u653e\u5165\u4e00\u4e2a\u7c7b\u6210\u5458\u53d8\u91cf\u91cc":6,"\u5e76\u5728\u8be5layer\u91cc\u91c7\u7528\u7b2c\u4e00\u79cd\u65b9\u5f0f\u8bbe\u7f6e":12,"\u5e76\u5904\u7406\u4e0e\u4e4b\u76f8\u5173\u7684\u6240\u6709\u7ec6\u8282":20,"\u5e76\u5b89\u88c5\u4e86python":8,"\u5e76\u5b89\u88c5\u6700\u65b0":3,"\u5e76\u5b89\u88c5\u6709python2":16,"\u5e76\u5b8c\u6210\u53c2\u6570\u4f18\u5316\u66f4\u65b0":22,"\u5e76\u5bf9\u6bd4\u662f\u5426\u548c\u6b63\u5728\u5b89\u88c5\u7684\u540e\u7f00\u4e00\u81f4":8,"\u5e76\u5bf9\u76f8\u5e94\u7684\u53c2\u6570\u8c03\u7528":6,"\u5e76\u5c06\u5176\u6295\u5c04\u5230":42,"\u5e76\u5c06\u8be5layer\u4e0a\u4e00\u65f6\u95f4\u6b65\u7684\u8f93\u51fa\u4f5c\u4e3a\u81ea\u8eab\u5f53\u524d\u65f6\u95f4\u6b65\u7684\u8f93\u51fa":12,"\u5e76\u624b\u52a8\u751f\u6210download\u6210\u529f\u6807\u7b7e":8,"\u5e76\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4":1,"\u5e76\u628a\u8fd9\u4e2a\u5305\u542b\u4e86\u8bad\u7ec3\u6570\u636e\u7684container\u4fdd\u5b58\u4e3a\u4e00\u4e2a\u65b0\u7684\u955c\u50cf":26,"\u5e76\u66f4\u6362job":9,"\u5e76\u6839\u636e\u5206\u5e03\u5f0f\u8bad\u7ec3\u5e76\u53d1\u6570":21,"\u5e76\u68c0\u67e5\u548c\u9700\u5b89\u88c5\u7684\u5305\u662f\u5426\u5339\u914d":3,"\u5e76\u7c98\u8d34\u6b64python\u4ee3\u7801":16,"\u5e76\u81ea\u52a8\u4e0b\u8f7d\u5b89\u88c5\u4f9d\u8d56\u8f6f\u4ef6":3,"\u5e76\u884c\u5730\u6267\u884c\u6a21\u578b\u7684\u8bad\u7ec3":22,"\u5e76\u884c\u5730\u63a5\u6536\u68af\u5ea6\u548c\u66f4\u65b0\u53c2\u6570":22,"\u5e76\u89c2\u5bdf\u7ed3\u679c":37,"\u5e76\u8bb0\u5f55\u5b83\u7684\u7f16\u53f7":4,"\u5e76\u8fdb\u884c\u521d\u59cb\u5316\u64cd\u4f5c":14,"\u5e8a\u4e0a\u7528\u54c1":39,"\u5e8a\u57ab":39,"\u5e8f\u5217\u4e2d\u542b\u6709\u5143\u7d20\u7684\u6570\u76ee\u540c":38,"\u5e8f\u5217\u4e2d\u7684\u4e00\u4e2a\u5143\u7d20":19,"\u5e8f\u5217\u4e2d\u7684\u5143\u7d20\u662f\u8bcd\u8bed":19,"\u5e8f\u5217\u4e2d\u7684\u6bcf\u4e00\u4e2a\u5143\u7d20\u53c8\u662f\u4e00\u4e2a\u5e8f\u5217":19,"\u5e8f\u5217\u4e2d\u7684\u6bcf\u4e00\u4e2a\u5143\u7d20\u662f\u975e\u5e8f\u5217":19,"\u5e8f\u5217\u4fe1\u606f":19,"\u5e8f\u5217\u5316\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u914d\u7f6e":20,"\u5e8f\u5217\u5316\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u65f6":20,"\u5e8f\u5217\u5316\u7ed3\u679c\u4f1a\u5199\u5165\u5f53\u524d\u8fd0\u884c\u76ee\u5f55\u4e0b\u7684":20,"\u5e8f\u5217\u6570\u636e\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u9762\u5bf9\u7684\u4e00\u79cd\u4e3b\u8981\u8f93\u5165\u6570\u636e\u7c7b\u578b":41,"\u5e8f\u5217\u662f\u4e00\u79cd\u5e38\u89c1\u7684\u6570\u636e\u7c7b\u578b":38,"\u5e8f\u5217\u751f\u6210\u4efb\u52a1\u5927\u591a\u9075\u5faaencod":41,"\u5e8f\u5217\u751f\u6210\u4efb\u52a1\u7684\u8f93\u5165":41,"\u5e8f\u5217\u7684\u6bcf\u4e2a\u5143\u7d20\u662f\u539f\u6765\u53cc\u5c42\u5e8f\u5217\u6bcf\u4e2asubseq\u5143\u7d20\u7684\u5e73\u5747\u503c":38,"\u5e8f\u5217\u8f93\u5165":19,"\u5e8f\u5217\u8f93\u5165\u65f6\u7b49\u4e8e":11,"\u5e8f\u5217\u8f93\u5165\u793a\u610f\u56fe":19,"\u5e93\u6709\u81ea\u5df1\u72ec\u7acb\u7684\u52a8\u6001\u5e93\u6587\u4ef6":17,"\u5e94\u7528\u524d\u5411\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":42,"\u5e94\u7528\u53cd\u5411\u9012\u5f52\u795e\u7ecf\u7f51\u7edc":42,"\u5e94\u80fd\u53cd\u6620\u5f53\u524dcommit\u7684\u5185\u5bb9":4,"\u5e94\u8be5":39,"\u5e94\u8be5\u4e0e\u5b83\u7684memory\u540d\u5b57\u76f8\u540c":42,"\u5e94\u8be5\u8bf4\u8c22\u8c22":4,"\u5e94\u8be5\u964d\u4f4e\u5b66\u4e60\u7387":13,"\u5e95\u5c42\u8fdb\u7a0b":23,"\u5efa\u7acb\u4e00\u4e2a":4,"\u5efa\u8bae":4,"\u5efa\u8bae\u5c06\u8be5\u53c2\u6570\u8bbe\u4e3atrue":33,"\u5f00\u53d1\u4eba\u5458\u4f7f\u7528":4,"\u5f00\u53d1\u5206\u652f":3,"\u5f00\u53d1\u6807\u51c6":43,"\u5f00\u53d1\u8005\u4f7f\u7528":0,"\u5f00\u53d1\u955c\u50cf":4,"\u5f00\u53d1\u9884\u6d4b\u5e8f":20,"\u5f00\u53d1\u9884\u6d4b\u7a0b\u5e8f\u94fe\u63a5":17,"\u5f00\u542f":0,"\u5f00\u5934\u7684\u90e8\u5206":21,"\u5f00\u5934\u90e8\u5206\u6307\u5b9a":21,"\u5f00\u59cb\u6807\u8bb0":42,"\u5f00\u59cb\u795e\u7ecf\u7f51\u7edc\u7684":22,"\u5f00\u59cb\u9636\u6bb5":37,"\u5f02\u6b65\u968f\u673a\u68af\u5ea6\u4e0b\u964d":32,"\u5f02\u6b65sgd\u66f4\u65b0\u7684\u6b65\u957f\u63a7\u5236":21,"\u5f15\u5bfc\u5c42":42,"\u5f15\u7528memory\u5f97\u5230\u8fd9layer\u4e0a\u4e00\u65f6\u523b\u8f93\u51fa":41,"\u5f3a\u70c8\u63a8\u8350":39,"\u5f52\u4e00\u5316\u6982\u7387\u5411\u91cf":42,"\u5f53":35,"\u5f53\u4e00\u4e2a":19,"\u5f53\u4e0a\u8ff0\u63a5\u53e3\u7b2c4\u4e2a\u53c2\u6570":19,"\u5f53\u4f60\u6267\u884c\u547d\u4ee4":6,"\u5f53\u4fdd\u5b58\u7684\u7f51\u7edc\u53c2\u6570\u4e3afloat\u7c7b\u578b\u65f6\u4e3a4":13,"\u5f53\u524d\u65f6\u95f4\u6b65\u5904\u7684memory\u7684\u8f93\u51fa\u4f5c\u4e3a\u4e0b\u4e00\u65f6\u95f4\u6b65memory\u7684\u8f93\u5165":42,"\u5f53\u524d\u7684\u5b66\u4e60\u7387\u4e3a\u6240\u8bbe\u7f6e":13,"\u5f53\u524d\u7684\u5b9e\u73b0\u65b9\u5f0f\u4e0b":6,"\u5f53\u524d\u7684\u8f93\u5165y\u548c\u4e0a\u4e00\u4e2a\u65f6\u95f4\u6b65\u7684\u8f93\u51farnn_state\u505a\u4e86\u4e00\u4e2a\u5168\u94fe\u63a5":39,"\u5f53\u524d\u8bad\u7ec3\u4efb\u52a1\u542f\u52a8\u7684pserver\u7684ip\u5217\u8868":21,"\u5f53\u524d\u8bad\u7ec3\u4efb\u52a1pserver\u603b\u6570":21,"\u5f53\u524d\u8bad\u7ec3\u4efb\u52a1trainer\u603b\u6570":21,"\u5f53\u524dtrainer\u7684\u7ebf\u7a0b\u6570\u76ee":21,"\u5f53\u5728\u7f51\u7edc\u5c42\u914d\u7f6e\u4e2d\u8bbe\u7f6e":33,"\u5f53\u5728\u8bad\u7ec3\u914d\u7f6e\u4e2d\u8bbe\u7f6e":33,"\u5f53\u5df2\u8bad\u7ec3\u6837\u672c\u6570\u5927\u4e8e1000\u5c0f\u4e8e\u7b49\u4e8e2000\u65f6":13,"\u5f53\u5df2\u8bad\u7ec3\u6837\u672c\u6570\u5927\u4e8e2000\u65f6":13,"\u5f53\u5df2\u8bad\u7ec3\u6837\u672c\u6570\u5c0f\u4e8e\u7b49\u4e8e1000\u65f6":13,"\u5f53\u5df2\u8bad\u7ec3pass\u6570\u5927\u4e8e1\u5c0f\u4e8e\u7b49\u4e8e2\u65f6":13,"\u5f53\u5df2\u8bad\u7ec3pass\u6570\u5927\u4e8e2\u65f6":13,"\u5f53\u5df2\u8bad\u7ec3pass\u6570\u5c0f\u4e8e\u7b49\u4e8e1\u65f6":13,"\u5f53\u6211\u4eec\u8bad\u7ec3\u5b8c\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u4e4b\u540e":18,"\u5f53\u6240\u6709pod\u90fd\u5904\u4e8erunning\u72b6\u6001":27,"\u5f53\u6a21\u578b\u53c2\u6570\u4e0d\u5b58\u5728\u65f6":33,"\u5f53\u6a21\u5f0f\u4e3a":33,"\u5f53\u7136":[1,37],"\u5f53\u7136\u53ef\u4ee5":0,"\u5f53\u7528\u6237\u6ca1\u6709\u663e\u5f0f\u8bbe\u5b9a\u65f6":12,"\u5f53\u7f51\u7edc\u5c42\u7528\u4e00\u4e2a\u6279\u6b21\u505a\u8bad\u7ec3\u65f6":6,"\u5f53\u89e3\u8bfb\u6bcf\u4e00\u4e2a":42,"\u5f53\u8d85\u8fc7\u8be5\u9608\u503c\u65f6":33,"\u5f53\u8f93\u5165\u662f\u7ef4\u5ea6\u5f88\u9ad8\u7684\u7a00\u758f\u6570\u636e\u65f6":35,"\u5f53\u975e\u5e8f\u5217\u8f93\u5165\u65f6":19,"\u5f53n1":11,"\u5f62\u6210recurr":41,"\u5f62\u6210recurrent\u8fde\u63a5":41,"\u5f88":39,"\u5f88\u591a":[0,39],"\u5f88\u5b89\u9759":39,"\u5f88\u5e72\u51c0":39,"\u5f88\u65b9\u4fbf":39,"\u5f88\u6709\u53ef\u80fd\u5b9e\u9645\u5e94\u7528\u5c31\u662f\u6ca1\u6709\u6309\u7167\u60a8\u7684\u9884\u671f\u60c5\u51b5\u8fd0\u884c":37,"\u5f88\u6709\u53ef\u80fd\u662f\u975e\u72ec\u5360\u65b9\u5f0f\u6267\u884c\u5bfc\u81f4\u7684\u7aef\u53e3\u51b2\u7a81":9,"\u5f97":39,"\u5f97\u5230\u9884\u6d4b\u7ed3\u679c\u7684\u8fc7\u7a0b":18,"\u5faa\u73af\u5c55\u5f00\u7684\u6bcf\u4e2a\u65f6\u95f4\u6b65\u603b\u662f\u80fd\u591f\u5f15\u7528\u6240\u6709\u8f93\u5165":41,"\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u4e2d":42,"\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u4f5c\u4e3a\u4f7f\u7528":42,"\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u548c":42,"\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u9aa4\u987a\u5e8f\u5730\u5904\u7406\u5e8f\u5217":42,"\u5faa\u73af\u7f51\u7edc\u4ece":42,"\u5fc5\u9009":21,"\u5fc5\u987b":6,"\u5fc5\u987b\u5148\u6267\u884c\u7b2c2\u6b65":0,"\u5fc5\u987b\u5c06\u524d\u4e00\u4e2a\u5b50\u53e5\u7684\u6700\u540e\u4e00\u4e2a\u5143\u7d20":39,"\u5fc5\u987b\u6307\u5411\u4e00\u4e2apaddlepaddle\u5b9a\u4e49\u7684lay":41,"\u5fc5\u987b\u6307\u5b9a\u4e3a":20,"\u5fc5\u987b\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":41,"\u5fc5\u987b\u662f\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":38,"\u5fc5\u987b\u7531\u53ea\u8bfbmemory\u7684":42,"\u5fc5\u987b\u914d\u7f6e\u4e3a":17,"\u5fc5\u987b\u914d\u7f6e\u9009\u9879":17,"\u5feb":39,"\u5feb\u901f\u5f00\u59cb":15,"\u6027\u4ef7\u6bd4":39,"\u6027\u80fd\u5206\u6790":37,"\u6027\u80fd\u5206\u6790\u5de5\u5177\u662f\u7528\u4e8e\u7ed9\u5e94\u7528\u7a0b\u5e8f\u7684\u6027\u80fd\u505a\u5b9a\u91cf\u5206\u6790\u7684":37,"\u6027\u80fd\u5206\u6790\u662f\u6027\u80fd\u4f18\u5316\u7684\u5173\u952e\u4e00\u6b65":37,"\u6027\u80fd\u548c\u628a\u7f16\u8bd1\u5de5\u5177\u5b89\u88c5\u5728\u672c\u673a\u8fd0\u884c\u4e00\u6837":0,"\u6027\u80fd\u8c03\u4f18":32,"\u603b\u4f53\u6765\u8bf4":39,"\u60a8\u4e5f\u53ef\u4ee5\u8fdb\u5165\u5230docker\u5bb9\u5668\u4e2d":1,"\u60a8\u4f1a\u5728\u63a5\u4e0b\u6765\u7684\u90e8\u5206\u4e2d\u83b7\u5f97\u66f4\u591a\u7684\u7ec6\u8282\u4ecb\u7ecd":37,"\u60a8\u53ef\u4ee5\u4ece\u4e0b\u9762\u7684\u8868\u683c\u4e2d\u627e\u5230\u9700\u8981\u7684\u7248\u672c":3,"\u60a8\u53ef\u4ee5\u4efb\u610f\u4f7f\u7528\u4e00\u4e2a\u6216\u4e24\u4e2a\u6765\u5bf9\u611f\u5174\u8da3\u7684\u4ee3\u7801\u6bb5\u505a\u6027\u80fd\u5206\u6790":37,"\u60a8\u53ef\u4ee5\u5728":[1,24],"\u60a8\u53ef\u4ee5\u5728\u5bb9\u5668\u4e2d\u6267\u884c":1,"\u60a8\u53ef\u4ee5\u5bfc\u5165":37,"\u60a8\u53ef\u4ee5\u6309\u7167\u4e0b\u9762\u7684\u6b65\u9aa4\u5728openmpi\u96c6\u7fa4\u4e2d\u63d0\u4ea4paddle\u8bad\u7ec3\u4efb\u52a1":28,"\u60a8\u53ef\u4ee5\u91c7\u7528\u4e0b\u9762\u4e94\u4e2a\u6b65\u9aa4":37,"\u60a8\u53ef\u80fd\u9700\u8981\u4fee\u6539":21,"\u60a8\u5c06\u4e86\u89e3\u5982\u4f55":42,"\u60a8\u5c31\u80fd\u83b7\u5f97\u5982\u4e0b\u7684\u5206\u6790\u7ed3\u679c":37,"\u60a8\u6309\u5982\u4e0b\u6b65\u9aa4\u64cd\u4f5c\u5373\u53ef":37,"\u60a8\u6700\u597d\u5148\u786e\u8ba4":37,"\u60a8\u9996\u5148\u9700\u8981\u5728\u76f8\u5173\u4ee3\u7801\u6bb5\u4e2d\u52a0\u5165":37,"\u60f3\u4e86\u89e3\u66f4\u591apaddlepaddl":7,"\u610f\u5473\u7740\u4e0d\u540c\u65f6\u95f4\u6b65\u7684\u8f93\u5165\u90fd\u662f\u76f8\u540c\u7684\u503c":42,"\u610f\u601d\u662f\u4e0d\u4f7f\u7528\u5e73\u5747\u53c2\u6570\u6267\u884c\u6d4b\u8bd5":33,"\u610f\u601d\u662f\u4e0d\u4fdd\u5b58\u7ed3\u679c":33,"\u610f\u601d\u662f\u4f7f\u7528\u7b2ctest":33,"\u610f\u601d\u662f\u5728gpu\u6a21\u5f0f\u4e0b\u4f7f\u75284\u4e2agpu":33,"\u611f\u89c9":39,"\u6210\u529f\u7f16\u8bd1\u540e":17,"\u6210\u529f\u8bad\u7ec3\u4e14\u9000\u51fa\u7684pod\u6570\u76ee\u4e3a3\u65f6":27,"\u6211\u4eec\u4e0d\u80fd\u901a\u8fc7\u5e38\u89c4\u7684\u68af\u5ea6\u68c0\u67e5\u7684\u65b9\u5f0f\u6765\u8ba1\u7b97\u68af\u5ea6":6,"\u6211\u4eec\u4e3b\u8981\u4f1a\u4ecb\u7ecdnvprof\u548cnvvp":37,"\u6211\u4eec\u4e5f\u652f\u6301\u5728aws\u4e0a\u90e8\u7f72paddlepaddl":24,"\u6211\u4eec\u4ec5\u4ec5\u5bf9\u795e\u7ecf\u7f51\u7edc\u7684\u8f93\u5165\u8fdb\u884c\u4e86\u63cf\u8ff0":14,"\u6211\u4eec\u4ec5\u6709\u4e00\u4e2a\u8f93\u5165":6,"\u6211\u4eec\u4ecb\u7ecd\u5982\u4f55\u5728":26,"\u6211\u4eec\u4ecb\u7ecd\u5982\u4f55\u5728kubernetes\u96c6\u7fa4\u4e0a\u8fdb\u884c\u5206\u5e03\u5f0fpaddlepaddle\u8bad\u7ec3\u4f5c\u4e1a":27,"\u6211\u4eec\u4ee5\u624b\u5199\u6570\u5b57\u8bc6\u522b\u4efb\u52a1\u4e3a\u4f8b\u8fdb\u884c\u4ecb\u7ecd":20,"\u6211\u4eec\u4f1a\u53ca\u65f6\u8fdb\u884c\u56de\u590d":10,"\u6211\u4eec\u4f1a\u5bf9\u6bcf\u4e2a\u8bad\u7ec3\u4efb\u52a1\u90fd\u4f1a\u5728\u6bcf\u4e2a\u8282\u70b9\u4e0a\u521b\u5efa\u4e00\u4e2a\u5de5\u4f5c\u7a7a\u95f4":21,"\u6211\u4eec\u4f1a\u7ee7\u7eed\u4f7f\u7528\u73b0\u6709\u7684\u5185\u5b58\u5757":6,"\u6211\u4eec\u4f1a\u91cd\u65b0\u5206\u914d\u5185\u5b58":6,"\u6211\u4eec\u4f7f\u7528":6,"\u6211\u4eec\u4f7f\u7528\u4e0d\u540c\u7684layer\u8fdb\u884c\u7ec4\u5408":14,"\u6211\u4eec\u4f7f\u7528\u4e86":39,"\u6211\u4eec\u4f7f\u7528paddl":21,"\u6211\u4eec\u5047\u8bbe\u4e00\u53f0\u673a\u5668\u4e0a\u67094\u4e2agpu":35,"\u6211\u4eec\u5373\u53ef\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u7684\u642d\u5efa":14,"\u6211\u4eec\u53ea\u6f14\u793a\u4e00\u4e2a\u5355\u673a\u4f5c\u4e1a":26,"\u6211\u4eec\u53ea\u9700\u8981\u4f7f\u7528lstm":39,"\u6211\u4eec\u53ea\u9700\u8981\u8fd0\u884c\u4e0b\u9762\u547d\u4ee4\u628a\u7f16\u8bd1\u597d\u7684paddlepaddle\u6253\u5305\u6210\u4e00\u4e2a":4,"\u6211\u4eec\u53ea\u9700\u8981\u914d\u7f6e":0,"\u6211\u4eec\u53ef\u4ee5\u4f7f\u7528\u5176\u4ed6layer\u8fdb\u884c\u7ec4\u5408":14,"\u6211\u4eec\u53ef\u4ee5\u4f7f\u7528\u5b83\u6765\u751f\u6210\u5e8f\u5217":42,"\u6211\u4eec\u53ef\u4ee5\u521b\u5efatrainer\u6765\u5bf9\u7f51\u7edc\u8fdb\u884c\u8bad\u7ec3":14,"\u6211\u4eec\u53ef\u4ee5\u5728":4,"\u6211\u4eec\u53ef\u4ee5\u5b9a\u4e49\u5982\u4e0b\u7684layer\u7ec4\u5408":14,"\u6211\u4eec\u53ef\u4ee5\u5b9a\u4e49\u5982\u4e0blayer\u6765\u63cf\u8ff0\u795e\u7ecf\u7f51\u7edc\u7684\u8f93\u5165":14,"\u6211\u4eec\u53ef\u4ee5\u6309\u7167\u5982\u4e0b\u5c42\u6b21\u5b9a\u4e49\u975e\u5e8f\u5217":38,"\u6211\u4eec\u53ef\u4ee5\u8bbe\u8ba1\u642d\u5efa\u4e00\u4e2a\u7075\u6d3b\u7684":41,"\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u65e5\u5fd7\u67e5\u770b\u5bb9\u5668\u8bad\u7ec3\u7684\u60c5\u51b5":27,"\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u8bbe\u7f6e":21,"\u6211\u4eec\u540c\u6837\u63d0\u4f9b\u4e86\u4ece\u6e90\u7801\u7f16\u8bd1\u5b89\u88c5paddlepaddle\u7684\u65b9\u6cd5":2,"\u6211\u4eec\u5728\u51fd\u6570\u7684\u7ed3\u5c3e\u8fd4\u56de":42,"\u6211\u4eec\u5bf9\u6a21\u578b\u8fdb\u884c\u4e86\u4ee5\u4e0b\u66f4\u6539":42,"\u6211\u4eec\u5c06":27,"\u6211\u4eec\u5c06\u4e00\u6bb5\u8bdd\u770b\u6210\u53e5\u5b50\u7684\u6570\u7ec4":39,"\u6211\u4eec\u5c06\u4ecb\u7ecd\u5982\u4f55\u542f\u52a8\u5206\u5e03\u5f0f\u8bad\u7ec3\u4f5c\u4e1a":26,"\u6211\u4eec\u5c06\u4f7f\u7528":42,"\u6211\u4eec\u5c06\u4f7f\u7528\u7b80\u5355\u7684":42,"\u6211\u4eec\u5c06\u539f\u59cb\u6570\u636e\u7684\u6bcf\u4e00\u7ec4":39,"\u6211\u4eec\u5c06\u5b83\u4eec\u5212\u5206\u4e3a\u4e0d\u540c\u7684\u7c7b\u522b":32,"\u6211\u4eec\u5c06\u795e\u7ecf\u7f51\u7edc\u4e00\u6b21\u8ba1\u7b97\u63a5\u53d7\u7684\u6240\u6709\u8f93\u5165\u6837\u672c\u79f0\u4e4b\u4e3a\u4e00\u4e2a":19,"\u6211\u4eec\u5c06\u8bad\u7ec3\u7ed3\u675f\u540e\u5b58\u50a8\u4e0b\u6765\u7684\u6a21\u578b\u8f6c\u6362\u6210\u9884\u6d4b\u6a21\u578b":20,"\u6211\u4eec\u5c31\u5b8c\u6210\u4e86\u4e00\u6b21\u4ee3\u7801\u8d21\u732e\u7684\u8fc7\u7a0b":4,"\u6211\u4eec\u5df2\u7ecf\u5b9e\u73b0\u4e86\u5927\u591a\u6570\u5e38\u7528\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u67b6\u6784":42,"\u6211\u4eec\u5efa\u8bae\u4f60\u4e3a\u4f60\u7684python\u5c01\u88c5\u5b9e\u73b0\u4e00\u4e2a":6,"\u6211\u4eec\u5efa\u8bae\u4f60\u5728\u5199\u65b0\u7f51\u7edc\u5c42\u65f6\u628a\u6d4b\u8bd5\u4ee3\u7801\u653e\u5165\u65b0\u7684\u6587\u4ef6\u4e2d":6,"\u6211\u4eec\u5efa\u8bae\u4f7f\u7528\u7b2c\u4e8c\u7c7b\u5b9e\u73b0":12,"\u6211\u4eec\u63a8\u8350\u4f7f\u7528":[1,31],"\u6211\u4eec\u63a8\u8350\u4f7f\u7528\u6700\u65b0\u7248\u672c\u7684cudnn":0,"\u6211\u4eec\u63a8\u8350\u5728docker\u4e2d\u8fd0\u884cpaddlepaddl":2,"\u6211\u4eec\u63d0\u4f9b\u4e86\u4f7f\u7528fabric":24,"\u6211\u4eec\u63d0\u4f9b\u4e86\u52a0\u901f\u8bbf\u95ee\u7684\u955c\u50cf\u6e90":1,"\u6211\u4eec\u63d0\u4f9b\u4e86\u591a\u79cd\u7684\u96c6\u7fa4\u90e8\u7f72\u65b9\u5f0f":24,"\u6211\u4eec\u63d0\u4f9b\u4e86\u5982\u4e0b\u6307\u5357":18,"\u6211\u4eec\u63d0\u4f9b\u53ef\u4ee5\u76f4\u63a5\u8fd0\u884cpaddlepaddl":1,"\u6211\u4eec\u662f\u5bf9\u6bcf\u4e00\u4e2a\u5b50\u5e8f\u5217\u53d6\u6700\u540e\u4e00\u4e2a\u5143\u7d20":39,"\u6211\u4eec\u6709\u4e00\u4e2a\u5e8f\u5217\u4f5c\u4e3a\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u72b6\u6001":42,"\u6211\u4eec\u7684":0,"\u6211\u4eec\u7684\u6807\u51c6\u5f00\u53d1\u6d41\u7a0b\u662f\u628a\u8fd9\u4e9b\u5de5\u5177\u90fd\u88c5\u8fdb\u4e00\u4e2adocker":4,"\u6211\u4eec\u770b\u4e00\u4e0b\u5355\u5c42rnn\u7684\u914d\u7f6e":39,"\u6211\u4eec\u770b\u4e00\u4e0b\u8bed\u4e49\u76f8\u540c\u7684\u53cc\u5c42rnn\u7684\u7f51\u7edc\u914d\u7f6e":39,"\u6211\u4eec\u771f\u8bda\u5730\u611f\u8c22\u60a8\u7684\u8d21\u732e":4,"\u6211\u4eec\u79f0\u4e4b\u4e3a\u4e00\u4e2a0\u5c42\u7684\u5e8f\u5217":38,"\u6211\u4eec\u8fd8\u53ef\u4ee5\u767b\u5f55\u5230\u5bbf\u4e3b\u673a\u4e0a\u67e5\u770b\u8bad\u7ec3\u7ed3\u679c":26,"\u6211\u4eec\u8fd8\u5c06\u7f16\u7801\u5411\u91cf\u6295\u5c04\u5230":42,"\u6211\u4eec\u9009\u53d6\u5355\u53cc\u5c42\u5e8f\u5217\u914d\u7f6e\u4e2d\u7684\u4e0d\u540c\u90e8\u5206":39,"\u6211\u4eec\u901a\u5e38\u5c06\u4e00\u53e5\u8bdd\u7406\u89e3\u6210\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217":39,"\u6211\u4eec\u901a\u8fc7\u8bfb\u53d6":27,"\u6211\u4eec\u9700\u8981":0,"\u6211\u4eec\u9700\u8981\u5236\u4f5c\u4e00\u4e2a\u5305\u542b\u8bad\u7ec3\u6570\u636e\u7684paddlepaddle\u955c\u50cf":26,"\u6211\u4eec\u9700\u8981\u5728\u96c6\u7fa4\u7684\u6240\u6709\u8282\u70b9\u4e0a\u5b89\u88c5":31,"\u6211\u4eec\u9700\u8981\u8ba1\u7b97":6,"\u6211\u4eec\u9996\u5148\u9700\u8981\u6839\u636e\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u6765\u521b\u5efa\u6240\u9700\u8981\u4f18\u5316\u7684paramet":14,"\u6211\u5220\u9664\u4e86":4,"\u6211\u53ef\u4ee5\u7528":0,"\u6211\u53ef\u4ee5\u9009\u62e9\u4e0d\u7528docker\u5417":0,"\u6216":[19,37],"\u6216\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"\u6216\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a0\u5c42\u5e8f\u5217":38,"\u6216\u4e00\u4e2a\u5411\u91cf":41,"\u6216\u5355\u5c42\u5e8f\u5217\u7ecf\u8fc7\u8fd0\u7b97\u53d8\u6210\u4e00\u4e2a0\u5c42\u5e8f\u5217":38,"\u6216\u662f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u8868\u793a\u8bcd\u8bed\u5728\u5b57\u5178\u4e2d\u7684\u5e8f\u53f7":19,"\u6216\u6700\u5927\u503c":38,"\u6216\u79f0\u4f5cweight":11,"\u6216\u7b2c\u4e00\u4e2a":38,"\u6216\u7b2c\u4e00\u4e2a\u5143\u7d20":38,"\u6216\u7f16\u5199\u7a0b\u5e8f\u65f6":21,"\u6216\u8005":[0,4,11,19,37,38,39],"\u6216\u8005\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":41,"\u6216\u8005\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":[38,41],"\u6216\u8005\u4e5f\u53ef\u4ee5\u4f7f\u7528\u4e3a\u4e0a\u8ff0\u53ef\u9009\u6b65\u9aa4\u6784\u5efa\u7684\u955c\u50cf":0,"\u6216\u8005\u4ece\u5de5\u5177\u7684\u754c\u9762\u91cc\u8fd0\u884c\u60a8\u7684\u5e94\u7528":37,"\u6216\u8005\u5236\u4f5c\u548c\u5206\u4eab\u5e26\u6709\u4ee3\u7801":1,"\u6216\u8005\u53cd\u5411\u5730\u4ece":42,"\u6216\u8005\u53ef\u88abdns\u89e3\u6790\u7684\u4e3b\u673a\u540d":31,"\u6216\u8005\u5728cpu\u6a21\u5f0f\u4e0b\u4f7f\u75284\u4e2a\u7ebf\u7a0b":33,"\u6216\u8005\u5df2\u7ecf\u5728\u96c6\u7fa4\u63d0\u4ea4\u73af\u5883\u4e2d\u81ea\u52a8\u8bbe\u7f6e":32,"\u6216\u8005\u5f15\u8d77\u884c\u65f6\u9519\u8bef":19,"\u6216\u8005\u6570\u7ec4\u7684\u6570\u7ec4\u8fd9\u4e2a\u6982\u5ff5":39,"\u6216\u8005\u662f\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":38,"\u6216\u8005\u662f\u51fd\u6570\u8c03\u7528\u7684\u9891\u7387\u548c\u8017\u65f6\u7b49":37,"\u6216\u8005\u66f4\u65e9":13,"\u6216\u8005\u6bcf\u4e00\u4e2a\u7cfb\u5217\u91cc\u7684\u7279\u5f81\u6570\u636e":39,"\u6216\u8005\u76f4\u63a5\u6254\u6389\u975e\u5e38\u957f\u7684\u5e8f\u5217":11,"\u6216\u8005\u8f93\u5165\u6570\u636e\u5c3a\u5ea6\u8fc7\u5927":11,"\u6216\u8005\u8fd0\u884c":8,"\u6216\u8005\u91c7\u7528\u5e76\u884c\u8ba1\u7b97\u6765\u52a0\u901f\u67d0\u4e9b\u5c42\u7684\u66f4\u65b0":35,"\u6216gpu":33,"\u622a\u65ad\u5bf9\u8c61\u4e0d\u540c":11,"\u623f":39,"\u623f\u95f4":39,"\u6240\u4ee5":[1,11],"\u6240\u4ee5\u4e0d\u80fd\u91c7\u7528\u7b2c\u4e00\u79cd\u65b9\u5f0f\u5728\u8fd9\u51e0\u4e2alayer\u91cc\u8bbe\u7f6e":12,"\u6240\u4ee5\u505a\u6cd5\u53ef\u4ee5\u6709\u4e24\u79cd":11,"\u6240\u4ee5\u53ef\u4ee5\u7b80\u5316\u5bf9\u73af\u5883\u7684\u8981\u6c42":26,"\u6240\u4ee5\u5916\u5c42\u8f93\u51fa\u7684\u5e8f\u5217\u5f62\u72b6":39,"\u6240\u4ee5\u5bf9":39,"\u6240\u4ee5\u5f00\u53d1\u8005\u9700\u8981\u6839\u636e\u81ea\u5df1\u8bad\u7ec3\u4efb\u52a1\u7684\u5b9e\u9645\u573a\u666f\u5b8c\u6210\u8bad\u7ec3\u6570\u636e\u7684\u5206\u5272\u548c":21,"\u6240\u4ee5\u6027\u80fd\u4e5f\u5c31\u9010\u6b65\u53d8\u6210\u4e86\u6df1\u5ea6\u5b66\u4e60\u9886\u57df\u6700\u91cd\u8981\u7684\u6307\u6807":37,"\u6240\u4ee5\u6211\u4eec\u53ef\u4ee5\u5728\u8fd9\u4e2a\u57fa\u7840\u4e0a":27,"\u6240\u4ee5\u6211\u4eec\u786e\u4fdd\u53d1\u5e03\u7684\u4e8c\u8fdb\u5236\u5305\u53ef\u4ee5\u652f\u6301\u4e3b\u6d41\u7684linux\u64cd\u4f5c\u7cfb\u7edf":3,"\u6240\u4ee5\u6211\u4eec\u9700\u8981\u5c06\u8f93\u5165\u6570\u636e\u6807\u8bb0\u6210":39,"\u6240\u4ee5\u6211\u4eec\u9ed8\u8ba4\u4f7f\u7528cento":3,"\u6240\u4ee5\u76f8\u6bd4\u4e8erecurr":12,"\u6240\u4ee5\u8fd9\u4e00\u6b65\u662f\u5fc5\u8981\u7684":6,"\u6240\u5bf9\u5e94\u7684\u8bcd\u8868index\u6570\u7ec4":39,"\u6240\u6709\u4ee3\u7801\u5fc5\u987b\u5177\u6709\u5355\u5143\u6d4b\u8bd5":4,"\u6240\u6709\u53c2\u6570\u7f6e\u4e3a\u96f6":33,"\u6240\u6709\u547d\u4ee4\u884c\u9009\u9879\u53ef\u4ee5\u8bbe\u7f6e\u4e3a":23,"\u6240\u6709\u7684":[4,6],"\u6240\u6709\u7684\u5355\u6d4b\u90fd\u4f1a\u88ab\u6267\u884c\u4e00\u6b21":6,"\u6240\u6709\u7684\u64cd\u4f5c\u90fd\u662f\u9488\u5bf9\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u6765\u8fdb\u884c\u7684":39,"\u6240\u6709\u7684python\u5c01\u88c5\u90fd\u4f7f\u7528":6,"\u6240\u6709\u7684python\u5c01\u88c5\u90fd\u5728":6,"\u6240\u6709\u7f51\u7edc\u5c42\u7684\u68af\u5ea6\u68c0\u67e5\u5355\u6d4b\u90fd\u4f4d\u4e8e":6,"\u6240\u6709\u8f93\u5165\u5e8f\u5217\u5e94\u8be5\u6709\u76f8\u540c\u7684\u957f\u5ea6":42,"\u624b\u5199\u6570\u5b57\u8bc6\u522b":20,"\u624b\u5199\u6570\u5b57\u8bc6\u522b\u4efb\u52a1\u5b9a\u4e49\u4e86\u4e00\u4e2a\u542b\u6709":20,"\u624b\u52a8\u4e0b\u8f7d\u4e14\u89e3\u538b\u7f29":8,"\u624b\u52a8\u4e0b\u8f7d\u5e76\u5b89\u88c5":8,"\u624d\u53ef\u4ee5\u5b89\u88c5":3,"\u624d\u80fd\u4fdd\u8bc1\u548c\u5355\u5c42\u5e8f\u5217\u7684\u914d\u7f6e\u4e2d":39,"\u624d\u80fd\u53d1\u6325\u5176\u5168\u90e8\u80fd\u529b":37,"\u6253\u5f00":37,"\u6253\u5f00\u6d4f\u89c8\u5668\u8bbf\u95ee\u5bf9\u5e94\u76ee\u5f55\u4e0b\u7684index":7,"\u6267\u884c":[16,17,23],"\u6267\u884c\u4e0a\u8ff0\u4ee3\u7801\u751f\u6210makefile\u6587\u4ef6\u540e":17,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u4ee5\u542f\u52a83\u4e2a\u8282\u70b9\u7684openmpi\u96c6\u7fa4\u548c\u4e00\u4e2a":28,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u5373\u53ef\u5728\u5f53\u524d\u673a\u5668\u4e0a\u5b89\u88c5paddlepaddle\u7684\u8fd0\u884c\u65f6\u73af\u5883":3,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u53ef\u4ee5\u67e5\u770b\u5df2\u7ecf\u5b89\u88c5\u7684\u7248\u672c":31,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u5b8c\u6210\u5feb\u901f\u5b89\u88c5":16,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u7f16\u8bd1cpu":0,"\u6267\u884c\u4e0b\u9762\u7684\u547d\u4ee4\u83b7\u53d6\u6700\u65b0\u7684paddlepaddl":1,"\u6267\u884c\u4ee5\u4e0b\u547d\u4ee4\u542f\u52a8\u4f7f\u7528python\u7f16\u5199\u7684trainer\u7a0b\u5e8f":21,"\u6267\u884c\u4ee5\u4e0b\u64cd\u4f5c":42,"\u6267\u884c\u4ee5\u4e0b\u7684\u547d\u4ee4\u542f\u52a8\u4e00\u4e2a\u53c2\u6570\u670d\u52a1\u5668\u5e76\u7b49\u5f85\u548c\u8ba1\u7b97\u8282\u70b9\u7684\u6570\u636e\u4ea4\u4e92":21,"\u6267\u884c\u60a8\u7684\u4ee3\u7801":37,"\u627e\u5230":[0,42],"\u627e\u5230\u4ee5\u4e0a\u76f8\u5173\u7684\u4f8b\u5b50":24,"\u627e\u5230\u6700\u65e9\u62a5\u9519\u7684\u5730\u65b9":9,"\u627e\u5230\u8fd0\u884c\u6162\u7684\u539f\u56e0":37,"\u627e\u5230\u8fd0\u884c\u6162\u7684\u90e8\u5206":37,"\u628a":6,"\u628a\u5de5\u5177\u548c\u914d\u7f6e\u90fd\u5b89\u88c5\u5728\u4e00\u4e2a":0,"\u628a\u8bad\u7ec3\u6570\u636e\u76f4\u63a5\u653e\u5728":26,"\u628a\u8fd9\u4e9b\u5de5\u5177\u5b89\u88c5\u5230\u672c\u673a":0,"\u6295\u5c04\u53cd\u5411rnn\u7684\u7b2c\u4e00\u4e2a\u5b9e\u4f8b\u5230":42,"\u6295\u5c04\u7f16\u7801\u5411\u91cf\u5230":42,"\u62c6\u6210\u4ee5\u4e0a\u4e24\u4e2a\u9759\u6001\u94fe\u63a5\u5e93":17,"\u62c6\u89e3":41,"\u62c6\u89e3\u6210\u7684\u6bcf\u4e00\u53e5\u8bdd\u518d\u901a\u8fc7\u4e00\u4e2alstm\u7f51\u7edc":39,"\u62f7\u8d1d\u5230numpi":11,"\u62f7\u8d1d\u5fc5\u8981\u7684\u6587\u4ef6\u5230head\u8282\u70b9":28,"\u62f7\u8d1d\u8bad\u7ec3\u6570\u636e\u5230\u5404\u81ea\u7684\u8282\u70b9":28,"\u62f7\u8d1d\u8bad\u7ec3\u6587\u4ef6\u5230\u5bb9\u5668\u5185":27,"\u62f7\u8d1d\u8bad\u7ec3\u7a0b\u5e8f\u548c\u5b57\u5178\u6587\u4ef6\u5230\u6bcf\u53f0mpi\u8282\u70b9":28,"\u62fc\u63a5":11,"\u6302\u8f7d\u5230\u5bb9\u5668\u5185\u90e8\u7684":1,"\u6302\u8f7d\u6216\u4e0b\u8f7d\u7684\u8bad\u7ec3\u6570\u636e\u5206\u7247":21,"\u6307\u5411\u4e00\u4e2alayer":41,"\u6307\u5b9a":[11,12,41,42],"\u6307\u5b9a\u4e00\u53f0\u673a\u5668\u4e0a\u4f7f\u7528\u7684\u7ebf\u7a0b\u6570":33,"\u6307\u5b9a\u4e3a":19,"\u6307\u5b9a\u4f7f\u75282":11,"\u6307\u5b9a\u52a0\u8f7d\u7684\u65b9\u5f0f":33,"\u6307\u5b9a\u5c06\u5f53\u524d\u8def\u5f84":1,"\u6307\u5b9a\u7684\u5185\u5bb9\u5b58\u50a8\u5e93\u8fd0\u884c\u547d\u4ee4":7,"\u6307\u5b9a\u7684\u8f93\u5165\u4e0d\u4f1a\u88ab":41,"\u6307\u5b9a\u8981\u8f93\u51fa\u7684\u5b57\u6bb5\u8fdb\u884c\u8f93\u51fa":11,"\u6307\u5b9a\u9700\u8981\u4f7f\u7528\u7684\u5bb9\u5668":1,"\u6307\u5b9acudnn\u7684\u6700\u5927\u5de5\u4f5c\u7a7a\u95f4\u5bb9\u9650":33,"\u6307\u5bf9\u4e8e\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217\u8f93\u5165\u6570\u636e":39,"\u6307\u793a\u4f7f\u7528\u54ea\u4e2agpu\u6838":33,"\u6307\u793a\u5728\u7b80\u5355\u7684recurrentlayer\u5c42\u7684\u8ba1\u7b97\u4e2d\u662f\u5426\u4f7f\u7528\u6279\u5904\u7406\u65b9\u6cd5":33,"\u6307\u793a\u5f53\u6307\u5b9a\u8f6e\u7684\u6d4b\u8bd5\u6a21\u578b\u4e0d\u5b58\u5728\u65f6":33,"\u6307\u793a\u662f\u5426\u4f7f\u7528\u591a\u7ebf\u7a0b\u6765\u8ba1\u7b97\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc":33,"\u6307\u793a\u662f\u5426\u5f00\u542f\u53c2\u6570\u670d\u52a1\u5668":33,"\u6307\u793a\u662f\u5426\u663e\u793a\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u7684\u7a00\u758f\u53c2\u6570\u5206\u5e03\u7684\u65e5\u5fd7\u7ec6\u8282":33,"\u6307\u793a\u662f\u5426\u68c0\u67e5\u6240\u6709\u53c2\u6570\u670d\u52a1\u5668\u4e0a\u7684\u7a00\u758f\u53c2\u6570\u7684\u5206\u5e03\u662f\u5747\u5300\u7684":33,"\u6309\u542f\u53d1\u5f0f\u635f\u5931\u7684\u5927\u5c0f\u9012\u589e\u6392\u5e8f":33,"\u6309\u7167\u4e0b\u9762\u6b65\u9aa4\u5373\u53ef":27,"\u6309\u7167\u5177\u4f53\u5b9e\u73b0\u65b9\u5f0f\u53ef\u4ee5\u5f52\u7eb3\u4e3a2\u7c7b":12,"\u6309\u7167\u57fa\u672c\u6570\u636e\u7c7b\u578b\u5728paddlepaddle\u5185\u90e8\u7684\u5b9a\u4e49\u548c\u5b9e\u73b0":19,"\u6309\u94ae":4,"\u633a":39,"\u633a\u597d":39,"\u635f\u5931\u51fd\u6570\u5c42":20,"\u6362":39,"\u6392\u6210\u4e00\u5217\u7684\u591a\u4e2a\u5143\u7d20":38,"\u63a5\u4e0a\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42":14,"\u63a5\u4e0a\u5e73\u65b9\u8bef\u5dee\u5c42":14,"\u63a5\u4e0b\u6765\u53ef\u4ee5\u8003\u8651\u4e0b\u65f6\u95f4\u7ebf\u7684\u5206\u6790":37,"\u63a5\u4e0b\u6765\u5c31\u53ef\u4ee5\u4f7f\u7528":37,"\u63a5\u4e0b\u6765\u6211\u4eec\u521b\u5efa\u4e00\u4e2a\u539f\u59cb":4,"\u63a5\u4e0b\u6765\u6211\u4eec\u53d6\u6d88\u5bf9":4,"\u63a5\u4e0b\u6765\u7b49\u5f85":4,"\u63a5\u53e3":20,"\u63a5\u53e3\u5b8c\u6210\u795e\u7ecf\u7f51\u7edc\u7684\u524d\u5411\u8ba1\u7b97":20,"\u63a5\u53e3\u5bf9\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u548c\u8bad\u7ec3\u597d\u7684\u53c2\u6570\u8fdb\u884c\u5e8f\u5217\u5316":20,"\u63a5\u53e3\u6709\u4e00\u4e2a":11,"\u63a5\u53e3\u7684":11,"\u63a5\u53e3\u8bf4\u660e\u8bf7\u67e5\u770b":19,"\u63a5\u6536\u5230\u8db3\u591f\u7684gradient":21,"\u63a5\u7740\u5bf9\u6240\u6709\u53c2\u6570\u7684\u4f7f\u7528\u573a\u5408\u8fdb\u884c\u6982\u8ff0\u548c\u5206\u7c7b":34,"\u63a5\u7740\u7f16\u8bd1\u5373\u53ef":8,"\u63a7\u5236":33,"\u63a8\u5bfc\u8be5\u5c42\u524d\u5411\u548c\u540e\u5411\u4f20\u9012\u7684\u65b9\u7a0b":6,"\u63a8\u8350":39,"\u63a8\u8350\u4f7f\u7528\u6b64\u65b9\u5f0f":17,"\u63a8\u8350\u4f7f\u7528centos\u7684devtools2":0,"\u63a8\u8350\u6e05\u7406\u6574\u4e2a\u7f16\u8bd1\u76ee\u5f55":0,"\u63a8\u8350\u8bbe\u7f6e\u4e3a":17,"\u63a8\u8350\u914d\u7f6e\u4e3a":17,"\u63a8\u8350\u914d\u7f6e\u9009\u9879":17,"\u63a8\u9001\u5230\u8fdc\u7a0b\u4ed3\u5e93":4,"\u63cf\u8ff0\u7684\u9ed8\u8ba4\u5165\u53e3\u7a0b\u5e8f":0,"\u63cf\u8ff0\u95ee\u9898":4,"\u63d0\u4ea4\u65b9\u5f0f\u53c2\u89c1":7,"\u63d0\u4ea4pull":4,"\u63d0\u4f9b":23,"\u63d0\u4f9b\u4e86\u4e00\u4e2a\u542f\u52a8\u811a\u672c":27,"\u63d0\u4f9b\u4e86\u547d\u4ee4\u6837\u4f8b\u6765\u8fd0\u884c":23,"\u63d0\u4f9b\u4e86\u81ea\u52a8\u5316\u811a\u672c\u6765\u542f\u52a8\u4e0d\u540c\u8282\u70b9\u4e2d\u7684\u6240\u6709":23,"\u63d0\u4f9b\u51e0\u4e4e\u6240\u6709\u8bad\u7ec3\u7684\u5185\u90e8\u8f93\u51fa\u65e5\u5fd7":23,"\u63d0\u4f9b\u6269\u5c55\u7684\u957f\u5ea6\u4fe1\u606f":38,"\u63d0\u4f9b\u8bad\u7ec3\u8fc7\u7a0b\u7684":23,"\u63d0\u793a":8,"\u641c\u7d22\u4ee3\u7801\u5e93":7,"\u642d\u5efa\u795e\u7ecf\u7f51\u7edc\u5c31\u50cf\u4f7f\u7528\u79ef\u6728\u642d\u5efa\u5b9d\u5854\u4e00\u6837":14,"\u64cd\u4f5c":39,"\u64cd\u4f5c\u7cfb\u7edf":[0,3],"\u652f\u6301\u4e24\u79cd\u5e8f\u5217\u7c7b\u578b":19,"\u652f\u6301\u53cc\u5c42\u5e8f\u5217\u4f5c\u4e3a\u8f93\u5165\u7684layer":[40,41],"\u652f\u6301\u5927\u89c4\u6a21\u96c6\u7fa4\u751f\u4ea7\u73af\u5883\u7684\u5b8c\u6574\u96c6\u7fa4\u65b9\u6848":24,"\u652f\u6301\u7ef4\u6570\u53ef\u53d8\u7684\u6570\u636e\u8f93\u5165":12,"\u6539\u53d8\u7ef4\u5ea6\u987a\u5e8f":12,"\u653e\u5728\u8fd9\u4e2a\u76ee\u5f55\u91cc\u7684\u6587\u4ef6\u5176\u5b9e\u662f\u4fdd\u5b58\u5230\u4e86mfs\u4e0a":27,"\u653e\u5fc3":39,"\u6545\u800c\u662f\u4e00\u4e2a\u5355\u5c42\u65f6\u95f4\u5e8f\u5217":39,"\u6559\u7a0b":1,"\u6570":41,"\u6570\u5b66\u5e93":17,"\u6570\u5fc5\u987b\u4e25\u683c\u76f8\u7b49":41,"\u6570\u636e":[19,20],"\u6570\u636e\u4e2d0":13,"\u6570\u636e\u5206\u7247":22,"\u6570\u636e\u63d0\u4f9b\u5668":32,"\u6570\u636e\u8f93\u5165":[19,41],"\u6570\u636e\u96c6":20,"\u6570\u76ee":35,"\u6574\u4e2a\u5b89\u88c5\u8fc7\u7a0b\u8017\u65f6\u8f83\u957f":2,"\u6574\u4f53":39,"\u6574\u4f53\u4f7f\u7528\u6d41\u7a0b":20,"\u6574\u4f53\u6570\u636e\u548c\u539f\u59cb\u6570\u636e\u5b8c\u5168\u4e00\u6837":39,"\u6574\u4f53\u7684\u7ed3\u6784\u56fe\u5982\u4e0b":27,"\u6574\u578b\u6570\u7ec4":19,"\u6574\u6570":6,"\u6574\u6570\u6807\u7b7e":14,"\u6574\u6d01":39,"\u6587\u4ef6":[4,19,26],"\u6587\u4ef6\u4e2d":[20,27],"\u6587\u4ef6\u4e3a":11,"\u6587\u4ef6\u4e4b\u5916":4,"\u6587\u4ef6\u540d\u4e3a\u4efb\u610f\u6587\u4ef6\u540d":21,"\u6587\u4ef6\u59390":27,"\u6587\u4ef6\u5de5\u5177\u662f\u4f7f\u7528docker":7,"\u6587\u4ef6\u7684\u6539\u53d8":4,"\u6587\u4ef6\u7684\u8def\u5f84\u6765\u52a0\u8f7d\u9884\u6d4b\u6a21\u578b":20,"\u6587\u4ef6model":35,"\u6587\u5b57\u7684\u4ea4\u4e92\u5f0f\u6587\u6863":1,"\u6587\u6863":8,"\u6587\u6863\u90fd\u662f\u901a\u8fc7":7,"\u6587\u7ae0":27,"\u65b0":39,"\u65b0\u5efa\u4e00\u4e2a\u6743\u91cd":6,"\u65b0\u624b\u5165\u95e8":43,"\u65b9\u4fbf":39,"\u65b9\u4fbf\u4f7f\u7528":18,"\u65b9\u4fbf\u5206\u4eab\u8fd0\u884c\u65f6\u73af\u5883":2,"\u65b9\u4fbf\u6392\u67e5\u4ee5\u53ca\u5feb\u901f\u5b9a\u4f4d\u95ee\u9898":11,"\u65b9\u4fbf\u63d0\u4ea4\u96c6\u7fa4\u8bad\u7ec3\u4efb\u52a1":24,"\u65b9\u5f0f1":11,"\u65b9\u5f0f2":11,"\u65b9\u5f0f\u5c06\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u548c\u8bad\u7ec3\u597d\u7684\u53c2\u6570\u5e8f\u5217\u5316\u5230\u4e00\u4e2a\u6587\u4ef6":20,"\u65b9\u6cd5\u4e00":35,"\u65b9\u6cd5\u4e09":35,"\u65b9\u6cd5\u4e8c":35,"\u65c1\u8fb9":39,"\u65e0":39,"\u65e0\u5ef6\u8fdf":33,"\u65e0\u9700\u5173\u5fc3\u548c\u5904\u7406\u5e8f\u5217\u4fe1\u606f":19,"\u65e0\u9700\u5355\u72ec\u5b89\u88c5\u7b2c\u4e09\u65b9\u4f9d\u8d56":2,"\u65e0\u9700\u63d0\u4f9b\u975e\u96f6\u5143\u7684\u503c":19,"\u65e0\u9700\u9644\u52a0\u5e8f\u5217\u4fe1\u606f":19,"\u65e5\u5fd7\u62a5\u9519\u4e3a\u7f51\u7edc\u901a\u4fe1\u7c7b\u9519\u8bef":9,"\u65e9\u9910":39,"\u65f6":[6,11,13,17,19,27,33,38,42],"\u65f6\u4f7f\u7528openblas\u6570\u5b66\u5e93":17,"\u65f6\u5019":39,"\u65f6\u5982\u4f55\u7ec4\u7ec7\u8f93\u5165\u6570\u636e":19,"\u65f6\u88ab\u8bad\u7ec3\u7684":6,"\u65f6\u8bbe\u5907id\u53f7\u7684\u5206\u914d":35,"\u65f6\u95f4":39,"\u65f6\u95f4\u6b65\u7684\u6982\u5ff5":39,"\u65f6\u987b\u4ece\u7b2c17\u5b57\u8282\u5f00\u59cb":13,"\u6613\u4e8e\u95ee\u9898\u7684\u590d\u73b0":2,"\u6620\u5c04\u4e3a":0,"\u6620\u5c04\u5230\u4e00\u4e2a\u7ef4\u5ea6\u4e3a":6,"\u662f":[3,8,39],"\u662f\u4e00\u4e2a\u51681\u7684\u5411\u91cf":6,"\u662f\u4e00\u4e2a\u5185\u7f6e\u7684\u5b9a\u65f6\u5668\u5c01\u88c5":37,"\u662f\u4e00\u4e2a\u52a8\u6001\u7a0b\u5e8f\u5206\u6790\u7684\u672f\u8bed":37,"\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"\u662f\u4e00\u4e2a\u53cc\u5c42\u7684\u5e8f\u5217":38,"\u662f\u4e00\u4e2a\u5c01\u88c5\u5bf9\u8c61":37,"\u662f\u4e00\u4e2a\u5f88\u6709\u7528\u7684\u53c2\u6570":35,"\u662f\u4e00\u4e2a\u65b9\u4fbf\u7684\u7a0b\u5e8f\u90e8\u7f72\u548c\u7ba1\u7406\u5de5\u5177":24,"\u662f\u4e00\u4e2a\u975e\u7ebf\u6027\u7684":6,"\u662f\u4e00\u4e2aunbound":41,"\u662f\u4e00\u6761\u65f6\u95f4\u5e8f\u5217":14,"\u662f\u4e00\u6b21\u9884\u6d4b\u63a5\u53d7\u7684\u6837\u672c\u6570\u76ee":19,"\u662f\u4e00\u79cd\u4efb\u610f\u590d\u6742\u7684rnn\u5355\u5143":41,"\u662f\u4e0d\u5305\u62ec\u6e90\u7801\u7684":26,"\u662f\u4f7f\u5f97\u8981\u5171\u4eab\u7684\u53c2\u6570\u4f7f\u7528\u540c\u6837\u7684":13,"\u662f\u4f7f\u7528mkl\u6570\u5b66\u5e93":17,"\u662f\u504f\u5dee":42,"\u662f\u5411\u91cf":6,"\u662f\u5426\u4e3a\u5f02\u6b65sgd\u66f4\u65b0\u6a21\u5f0f":21,"\u662f\u5426\u4ec5\u7f16\u8bd1capi":0,"\u662f\u5426\u4ee5\u9006\u5e8f\u5904\u7406\u8f93\u5165\u5e8f\u5217":41,"\u662f\u5426\u4f7f\u7528\u53cc\u7cbe\u5ea6\u6d6e\u70b9\u6570":0,"\u662f\u5426\u4f7f\u7528\u65e7\u7684remoteparameterupdat":33,"\u662f\u5426\u4f7f\u7528\u6743\u91cd":6,"\u662f\u5426\u4f7f\u7528mkl\u6570\u5b66\u5e93":0,"\u662f\u5426\u5185\u5d4cpython\u89e3\u91ca\u5668":0,"\u662f\u5426\u542f\u7528gpu\u8bad\u7ec3":21,"\u662f\u5426\u5c06\u5168\u5c40\u79cd\u5b50\u5e94\u7528\u4e8e\u672c\u5730\u7ebf\u7a0b\u7684\u968f\u673a\u6570":33,"\u662f\u5426\u5f00\u542f\u5355\u5143\u6d4b\u8bd5":0,"\u662f\u5426\u6253\u5370\u7248\u672c\u4fe1\u606f":33,"\u662f\u5426\u652f\u6301gpu":0,"\u662f\u5426\u663e\u793a":33,"\u662f\u5426\u7a00\u758f":6,"\u662f\u5426\u7f16\u8bd1\u4e2d\u82f1\u6587\u6587\u6863":0,"\u662f\u5426\u7f16\u8bd1\u542b\u6709avx\u6307\u4ee4\u96c6\u7684paddlepaddle\u4e8c\u8fdb\u5236\u6587\u4ef6":0,"\u662f\u5426\u7f16\u8bd1\u65f6\u8fdb\u884c\u4ee3\u7801\u98ce\u683c\u68c0\u67e5":0,"\u662f\u5426\u7f16\u8bd1go\u8bed\u8a00\u7684\u53ef\u5bb9\u9519paramet":0,"\u662f\u5426\u7f16\u8bd1python\u7684swig\u63a5\u53e3":0,"\u662f\u5426\u8fd0\u884c\u65f6\u52a8\u6001\u52a0\u8f7dcuda\u52a8\u6001\u5e93":0,"\u662f\u5426\u9700\u8981\u7b49\u5f85\u8be5\u8f6e\u6a21\u578b\u53c2\u6570":33,"\u662f\u56e0\u4e3a\u8fd9\u4e2a\u6d41\u7a0b\u6bd4\u5176\u4ed6\u65b9\u6cd5\u90fd\u66f4\u7b80\u4fbf":0,"\u662f\u5728paddlepaddle\u4e2d\u6784\u9020\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u65f6\u6700\u91cd\u8981\u7684\u6982\u5ff5":42,"\u662f\u5b58\u6709\u4e00\u7cfb\u5217\u53d8\u6362\u77e9\u9635\u7684\u6743\u91cd":6,"\u662f\u5b58\u6709\u504f\u7f6e\u5411\u91cf\u7684\u6743\u91cd":6,"\u662f\u5f00\u542favx\u7f16\u8bd1\u7684":1,"\u662f\u5f85\u6269\u5c55\u7684\u6570\u636e":38,"\u662f\u6210\u719f\u7684\u9ad8\u6027\u80fd\u5e76\u884c\u8ba1\u7b97\u6846\u67b6":24,"\u662f\u6211\u4eec":4,"\u662f\u6307\u4e00\u7cfb\u5217\u7684\u7279\u5f81\u6570\u636e":39,"\u662f\u6307recurrent_group\u7684\u591a\u4e2a\u8f93\u5165\u5e8f\u5217":39,"\u662f\u6570\u636e\u8f93\u5165":42,"\u662f\u6709\u610f\u4e49\u7684":39,"\u662f\u6784\u5efa\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u6700\u91cd\u8981\u7684\u5de5\u5177":42,"\u662f\u6ca1\u6709\u540d\u5b57\u7684":0,"\u662f\u7684":0,"\u662f\u77e9\u9635":6,"\u662f\u795e\u7ecf\u7f51\u7edc\u5b9a\u4e49\u65f6":19,"\u662f\u7f51\u7edc\u5c42\u5b9e\u4f8b\u7684\u540d\u5b57\u6807\u8bc6\u7b26":6,"\u662f\u7f51\u7edc\u5c42\u7684\u6807\u8bc6\u7b26":6,"\u662f\u7f51\u7edc\u5c42\u7684\u7c7b\u578b":6,"\u662f\u7f51\u7edc\u5c42\u8f93\u51fa\u7684\u5927\u5c0f":6,"\u662f\u8be5\u5c42\u7684\u6807\u8bc6\u7b26":6,"\u662f\u8be5\u5c42\u7684\u7c7b\u540d":6,"\u662f\u8be5\u7f51\u7edc\u5c42\u7684":6,"\u662f\u8f93\u5165":42,"\u662f\u8fd9\u4e00\u7c7b\u7684":12,"\u662f\u8fdb\u884c\u8ba1\u7b97\u7684\u57fa\u672c\u5355\u4f4d":19,"\u662f\u9700\u8981\u4e86\u89e3\u54ea\u4e9b\u6b65\u9aa4\u62d6\u6162\u4e86\u6574\u4f53":37,"\u662fdecoder\u7684\u6570\u636e\u8f93\u5165":41,"\u662fgoogle\u5f00\u6e90\u7684\u5bb9\u5668\u96c6\u7fa4\u7684\u8c03\u5ea6\u6846\u67b6":24,"\u662fnvidia\u6027\u80fd\u5206\u6790\u5de5\u5177":37,"\u662fpaddlepaddle\u4e2d\u5355\u5c42\u5e8f\u5217\u548c\u53cc\u5c42\u5e8f\u5217\u5b58\u50a8\u793a\u610f\u56fe":19,"\u662fpaddlepaddle\u652f\u6301\u7684\u4e00\u79cd\u4efb\u610f\u590d\u6742\u7684rnn\u5355\u5143":41,"\u662fpython\u5c01\u88c5\u7684\u7c7b\u540d":6,"\u662frnn\u72b6\u6001":42,"\u665a":39,"\u6682\u4e0d\u8003\u8651\u5728\u5185":11,"\u6682\u65e0":3,"\u6682\u65f6\u4e0d\u652f\u6301python3":3,"\u66f4\u522b\u63d0\u7b80\u5316\u95ee\u9898\u590d\u73b0\u5e26\u6765\u7684\u597d\u5904\u4e86":0,"\u66f4\u591a\u5173\u4e8edocker\u7684\u5b89\u88c5\u4e0e\u4f7f\u7528":8,"\u66f4\u597d\u5730\u5b8c\u6210\u4e00\u4e9b\u590d\u6742\u7684\u8bed\u8a00\u7406\u89e3\u4efb\u52a1":41,"\u66f4\u5feb":42,"\u66f4\u65b0":8,"\u66f4\u65b0\u53ef\u80fd\u5bfc\u81f4\u9700\u8981\u65b0\u7684\u5f00\u53d1\u5de5\u5177":0,"\u66f4\u65b0\u6a21\u5f0f":11,"\u66f4\u65b0\u7684\u6587\u6863\u4ee5pr\u7684\u5f62\u5f0f\u63d0\u4ea4\u5230github\u4e2d":7,"\u66f4\u65b0\u7f51\u7edc\u53c2\u6570\u65f6\u5e94\u7528":11,"\u66f4\u65b9\u4fbf\u7684\u8bbe\u7f6e\u65b9\u5f0f":13,"\u66f4\u8be6\u7ec6\u7684\u5b89\u88c5\u548c\u7f16\u8bd1\u65b9\u6cd5\u53c2\u8003":16,"\u66f4\u8fdb\u4e00\u6b65":41,"\u66f4\u9ad8":42,"\u66ff\u6211\u4eec\u5b8c\u6210\u4e86\u539f\u59cb\u8f93\u5165\u6570\u636e\u7684\u62c6\u5206":41,"\u6700":39,"\u6700\u4e3b\u8981\u7684\u5de5\u4f5c\u5c31\u662f\u89e3\u6790\u51fa":27,"\u6700\u540e":[1,4,6,21],"\u6700\u540e\u4e00\u4e2a":38,"\u6700\u540e\u4e00\u5c42cost\u4e2d\u8bb0\u5f55\u4e86\u795e\u7ecf\u7f51\u7edc\u7684\u6240\u6709\u62d3\u6251\u7ed3\u6784":14,"\u6700\u540e\u6211\u4eec\u4f7f\u7528\u94fe\u5f0f\u6cd5\u5219\u8ba1\u7b97":6,"\u6700\u540e\u7684\u6267\u884c\u811a\u672c\u7684\u547d\u4ee4":0,"\u6700\u540e\u7ed9\u51fa\u7ec6\u8282\u63cf\u8ff0":34,"\u6700\u540e\u8ba1\u7b97softmax":12,"\u6700\u5c11\u663e\u793a\u591a\u5c11\u4e2a\u8282\u70b9":33,"\u6700\u65b0\u7684\u4ee3\u7801":4,"\u6700\u65b0\u7684paddlepaddl":[1,8],"\u6700\u7ec8":6,"\u6700\u7ec8\u5b9e\u73b0\u4e00\u4e2a\u5c42\u6b21\u5316\u7684\u590d\u6742rnn":41,"\u6700\u7ec8\u6211\u4eec\u53ef\u4ee5\u8c03\u7528trainer\u7684train\u65b9\u6cd5\u542f\u52a8\u8bad\u7ec3":14,"\u6700\u7ec8\u7684\u8f93\u51fa\u7ed3\u679c":41,"\u6708\u6e56":39,"\u6709":39,"\u6709\u4e24\u79cd\u65b9\u6cd5":0,"\u6709\u4e9b\u5c42\u53ef\u80fd\u9700\u8981\u9ad8\u7cbe\u5ea6\u6765\u4fdd\u8bc1\u68af\u5ea6\u68c0\u67e5\u5355\u6d4b\u6b63\u786e\u6267\u884c":6,"\u6709\u4e9b\u5c42\u6216\u8005\u6fc0\u6d3b\u9700\u8981\u505a\u5f52\u4e00\u5316\u4ee5\u4fdd\u8bc1\u5b83\u4eec\u7684\u8f93\u51fa\u7684\u548c\u662f\u4e00\u4e2a\u5e38\u6570":6,"\u6709\u4e9b\u7279\u5f81\u7684\u53d6\u503c\u8fbe\u5230\u6570\u767e\u4e07":11,"\u6709\u4eba\u7528\u865a\u62df\u673a\u6765\u7c7b\u6bd4":0,"\u6709\u5173":39,"\u6709\u5173\u7ebf\u6027\u56de\u5f52\u7684\u5b9e\u9645\u5e94\u7528":14,"\u6709\u52a9\u4e8e\u5728\u8bad\u7ec3\u65f6\u89c2\u5bdf\u5177\u4f53\u6570\u503c":11,"\u6709\u52a9\u4e8e\u8bca\u65ad\u5206\u5e03\u5f0f\u9519\u8bef":23,"\u6709\u591a\u96be":0,"\u6709\u7684\u65f6\u5019\u7b80\u7b80\u5355\u5355\u7684\u6539\u53d8\u5c31\u80fd\u5728\u6027\u80fd\u4e0a\u4ea7\u751f\u660e\u663e\u7684\u4f18\u5316\u6548\u679c":37,"\u6709\u7684\u8bdd\u9700\u8981\u5148\u5378\u8f7d":8,"\u670d\u52a1":39,"\u670d\u52a1\u5458":39,"\u670d\u52a1\u5668\u4e4b\u95f4\u53ef\u4ee5\u901a\u8fc7\u5c40\u57df\u7f51":31,"\u672a\u6307\u5b9a\u6309\u7167double\u7cbe\u5ea6\u7f16\u8bd1":13,"\u672c\u4f8b\u4e2d\u7684\u539f\u59cb\u6570\u636e\u4e00\u5171\u670910\u4e2a\u6837\u672c":39,"\u672c\u5730":[3,8],"\u672c\u5730\u6d4b\u8bd5":32,"\u672c\u5730\u8bad\u7ec3":[19,32],"\u672c\u5730\u8bad\u7ec3\u4e0e\u9884\u6d4b":10,"\u672c\u5730\u8bad\u7ec3\u7684\u5b9e\u9a8c":35,"\u672c\u6559\u7a0b\u5c06\u6307\u5bfc\u4f60\u5982\u4f55\u5728":42,"\u672c\u6587\u4e2d\u6240\u6709\u7684\u4f8b\u5b50":39,"\u672c\u6587\u4e2d\u7684\u4f8b\u5b50\u91cc":0,"\u672c\u6587\u4e2d\u793a\u4f8b\u6240\u4f7f\u7528\u7684\u5355\u5143\u6d4b\u8bd5\u6587\u4ef6\u662f":39,"\u672c\u6587\u4ee5paddlepaddle\u7684\u53cc\u5c42rnn\u5355\u5143\u6d4b\u8bd5\u4e3a\u793a\u4f8b":39,"\u672c\u6587\u5c06\u4ecb\u7ecd\u5728kubernetes\u5bb9\u5668\u7ba1\u7406\u5e73\u53f0\u4e0a\u5feb\u901f\u6784\u5efapaddlepaddle\u5bb9\u5668\u96c6\u7fa4":27,"\u672c\u6587\u6863\u5bf9\u5173\u4e8epaddlepaddle\u7684\u4e00\u4e9b\u5e38\u89c1\u95ee\u9898\u63d0\u4f9b\u4e86\u89e3\u7b54":10,"\u672c\u6765":39,"\u672c\u6b21\u8bad\u7ec3\u6587\u4ef6\u6240\u5728\u76ee\u5f55":27,"\u672c\u6b21\u8bad\u7ec3\u7684yaml\u6587\u4ef6\u53ef\u4ee5\u5199\u6210":27,"\u672c\u6b21\u8bad\u7ec3\u8981\u6c42\u67093\u4e2apaddlepaddle\u8282\u70b9":27,"\u672c\u793a\u4f8b\u4e2d\u4f7f\u7528\u7684\u539f\u59cb\u6570\u636e\u5982\u4e0b":39,"\u672c\u793a\u4f8b\u610f\u56fe\u4f7f\u7528\u5355\u5c42rnn\u548c\u53cc\u5c42rnn\u5b9e\u73b0\u4e24\u4e2a\u5b8c\u5168\u7b49\u4ef7\u7684\u5168\u8fde\u63a5rnn":39,"\u672c\u8282\u5c06\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528paddlepaddle\u5728\u4e0d\u540c\u7684\u96c6\u7fa4\u6846\u67b6\u4e0b\u5b8c\u6210\u5206\u5e03\u5f0f\u8bad\u7ec3":22,"\u673a\u5668\u7684\u8bbe\u5907":35,"\u6743\u91cd\u66f4\u65b0\u7684\u68af\u5ea6":33,"\u6765":39,"\u6765\u4e3a\u4e00\u4e2a":19,"\u6765\u4ee3\u66ff":4,"\u6765\u4f20\u8f93\u7f51\u7edc\u914d\u7f6e\u6587\u4ef6\u4e2d\u5b9a\u4e49\u7684\u7f51\u7edc\u7ed3\u6784\u548c\u76f8\u5173\u53c2\u6570":20,"\u6765\u4f7f\u7528dropout":12,"\u6765\u4f7f\u7528dropout\u7684":12,"\u6765\u505a\u68af\u5ea6\u68c0\u67e5":6,"\u6765\u5206\u6790\u6267\u884c\u6587\u4ef6":37,"\u6765\u521d\u59cb\u5316\u53c2\u6570":13,"\u6765\u542f\u52a8\u548c":0,"\u6765\u5b58\u50a8":19,"\u6765\u5b58\u50a8\u6570\u636e":[19,20],"\u6765\u5b8c\u6210\u524d\u5411\u548c\u53cd\u5411\u8ba1\u7b97":20,"\u6765\u5b8c\u6210\u7f51\u7edc\u7684\u8bad\u7ec3":14,"\u6765\u5b9a\u4e49\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":42,"\u6765\u5b9e\u9645\u5b58\u50a8\u6570\u636e":[19,20],"\u6765\u5bf9\u6bd4\u5206\u6790\u4e24\u8005\u8bed\u4e49\u76f8\u540c\u7684\u539f\u56e0":39,"\u6765\u5f97\u5230\u67d0\u4e2a\u7279\u5b9a\u53c2\u6570\u7684\u68af\u5ea6\u77e9\u9635":6,"\u6765\u63cf\u8ff0\u8f93\u5165":19,"\u6765\u642d\u5efa\u795e\u7ecf\u7f51\u7edc":14,"\u6765\u6ce8\u518c\u8be5\u5c42":6,"\u6765\u6df7\u5408\u4f7f\u7528gpu\u548ccpu\u8ba1\u7b97\u7f51\u7edc\u5c42\u7684\u53c2\u6570":35,"\u6765\u6e05\u7406\u8fd9\u4e9b\u5185\u5bb9":0,"\u6765\u7279\u6307":19,"\u6765\u7279\u6307\u8c03\u7528paddlepaddl":20,"\u6765\u7279\u6307paddlepaddl":20,"\u6765\u7279\u6307paddlepaddle\u4e2d\u7684\u4e00\u7ef4\u6574\u578b\u6570\u7ec4":19,"\u6765\u7279\u6307paddlepaddle\u4e2d\u7684\u4e8c\u7ef4\u6d6e\u70b9\u578b\u77e9\u9635":19,"\u6765\u7279\u6307paddlepaddle\u4e2d\u795e\u7ecf\u7f51\u7edc\u8ba1\u7b97\u5c42\u4e00\u4e2a\u8f93\u5165":19,"\u6765\u786e\u5b9a\u7a00\u758f\u77e9\u9635\u7684\u5185\u5bb9":19,"\u6765\u83b7\u5f97\u8f93\u51fa\u7684\u68af\u5ea6":6,"\u6765\u8868\u793a":42,"\u6765\u8868\u793a\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":42,"\u6765\u89e3\u51b3\u4e0a\u9762\u7684\u95ee\u9898":11,"\u6765\u8ba1\u7b97\u68af\u5ea6":6,"\u6765\u8bb0\u5f55\u8f93\u5165":19,"\u6765\u8bb2\u89e3\u5982\u4f55\u4f7f\u7528\u53cc\u5c42rnn":39,"\u6765\u8bbe\u7f6e":13,"\u6765\u8bfb\u53d6\u4e00\u4e2a":19,"\u6765\u8c03\u6574":4,"\u6765\u8c03\u7528":0,"\u6765\u8fd0\u884c\u6027\u80fd\u5206\u6790\u548c\u8c03\u4f18":37,"\u6765\u8fd0\u884c\u955c\u50cf":1,"\u6765\u9884\u6d4b\u8fd9\u4e2a\u4e2d\u95f4\u7684\u8bcd":11,"\u676f\u5b50":39,"\u6784\u5efa\u7684\u955c\u50cf":0,"\u6784\u6210\u4e00\u4e2a\u5e8f\u5217":19,"\u6784\u6210\u4e86\u8f93\u51fa\u53cc\u5c42\u5e8f\u5217\u7684\u7b2ci\u4e2a":38,"\u6784\u9020":27,"\u67d0\u4e00\u4e2a\u795e\u7ecf\u5143\u7684\u4e00\u4e2a\u8f93\u5165\u4e3a\u4e0a\u4e00\u4e2a\u65f6\u95f4\u6b65\u7f51\u7edc\u4e2d\u67d0\u4e00\u4e2a\u795e\u7ecf\u5143\u7684\u8f93\u51fa":39,"\u67d0\u4e9b\u53c2\u6570\u53ea\u53ef\u7528\u4e8e\u7279\u5b9a\u7684\u5c42\u4e2d":32,"\u67e5\u627e\u7b54\u6848\u6216\u76f4\u63a5\u63d0":10,"\u67e5\u770b":4,"\u67e5\u770b\u5305\u7684\u5927\u5c0f":8,"\u67e5\u770b\u5f53\u524d\u72b6\u6001":4,"\u67e5\u770b\u5f53\u524d\u8fdc\u7a0b\u4ed3\u5e93\u7684\u540d\u5b57":4,"\u67e5\u770b\u6587\u4ef6\u5177\u4f53\u88ab\u4fee\u6539\u7684\u5185\u5bb9":4,"\u67e5\u770b\u662f\u5426\u662f\u5176\u4ed6\u9519\u8bef\u5f15\u53d1\u7684\u62a5\u9519":9,"\u67e5\u770bjob\u7684\u8be6\u7ec6\u60c5\u51b5":26,"\u6807\u51c6":3,"\u6807\u51c6\u5dee\u4e3a":13,"\u6807\u8bc6\u662f\u5426\u4e3a\u8fde\u7eed\u7684batch\u8ba1\u7b97":33,"\u6838\u4e00\u6837\u591a\u7684\u8fdb\u7a0b\u6765\u5e76\u884c\u7f16\u8bd1":0,"\u6838\u5fc3\u4ee3\u7801\u7f16\u8bd1\u6210\u94fe\u63a5\u5e93":17,"\u6839\u636e\u4f60\u7684\u4efb\u52a1":35,"\u6839\u636e\u524d\u6587\u7684\u63cf\u8ff0":27,"\u6839\u636e\u7f51\u7edc\u914d\u7f6e\u4e2d\u7684":33,"\u6839\u636e\u8fd9\u4e9b\u53c2\u6570\u7684\u4f7f\u7528\u573a\u5408":32,"\u6839\u636e\u9ed8\u8ba4\u503c\u9012\u589e":33,"\u6839\u636e\u9ed8\u8ba4\u7aef\u53e3\u53f7\u9012\u589e":33,"\u6839\u636ejob\u5bf9\u5e94\u7684pod\u4fe1\u606f":26,"\u6839\u636eport":21,"\u683c\u5f0f":33,"\u683c\u5f0f\u5b58\u50a8":19,"\u68af\u5ea6\u4f1a\u5c31\u5730":6,"\u68af\u5ea6\u4f1a\u6709\u566a\u58f0":22,"\u68af\u5ea6\u53c2\u6570\u7684\u5206\u5757\u6570\u76ee":33,"\u68af\u5ea6\u5c31\u53ef\u4ee5\u901a\u8fc7\u8fd9\u4e2a\u65b9\u7a0b\u8ba1\u7b97\u5f97\u5230":6,"\u68af\u5ea6\u670d\u52a1\u5668\u7684\u6570\u91cf":33,"\u68af\u5ea6\u68c0\u67e5\u5355\u5143\u6d4b\u8bd5\u901a\u8fc7\u6709\u9650\u5dee\u5206\u6cd5\u6765\u9a8c\u8bc1\u4e00\u4e2a\u5c42\u7684\u68af\u5ea6":6,"\u68af\u5ea6\u68c0\u67e5\u7684\u8f93\u5165\u6570\u636e\u7684\u6279\u6b21\u5927\u5c0f":6,"\u697c\u5c42":39,"\u6982\u5ff5\u4e0a":20,"\u6982\u5ff5\u4e0a\u53ef\u4ee5\u5c06":19,"\u6a21\u578b\u4e00\u76f4\u4e0d\u6536\u655b":11,"\u6a21\u578b\u4e2d\u6240\u6709\u53ef\u5b66\u4e60\u53c2\u6570\u4f1a\u88ab\u5b58\u4e3a\u4e00\u4e2a\u538b\u7f29\u6587\u4ef6":20,"\u6a21\u578b\u6587\u4ef6\u5c06\u88ab\u5199\u5165\u8282\u70b9":23,"\u6a21\u578b\u6765\u6307\u5bfc\u4f60\u5b8c\u6210\u8fd9\u4e9b\u6b65\u9aa4":42,"\u6a21\u578b\u6f14\u793a\u5982\u4f55\u914d\u7f6e\u590d\u6742\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u6a21\u578b":42,"\u6a21\u578b\u7684\u4ee3\u7801\u53ef\u4ee5\u5728":42,"\u6a21\u578b\u7684\u7f16\u7801\u5668\u90e8\u5206\u5982\u4e0b\u6240\u793a":42,"\u6a21\u578b\u7ed3\u6784":34,"\u6a21\u578b\u8bad\u7ec3\u7b49\u4efb\u52a1":14,"\u6a21\u578b\u914d\u7f6e":10,"\u6a21\u578b\u9884\u6d4bsdk\u9700\u8981\u5355\u72ec\u8bbe\u8ba1":18,"\u6a2a\u5411\u62fc\u63a5":11,"\u6b21":39,"\u6b22\u8fce\u5411paddlepaddle\u793e\u533a\u53cd\u9988\u95ee\u9898":2,"\u6b22\u8fce\u901a\u8fc7":4,"\u6b63\u5728\u7b49\u5f85\u672a\u5b8c\u6210\u7684\u4efb\u52a1":8,"\u6b63\u5e38\u60c5\u51b5\u4e0b\u662f75m":8,"\u6b63\u786e\u7684\u89e3\u51b3\u65b9\u6cd5\u662f":8,"\u6b63\u8d1f\u5bf9\u9a8c\u8bc1":32,"\u6b64\u5904\u90fd\u4e3a2":39,"\u6b64\u5916":[0,4,12],"\u6b64\u6559\u7a0b\u5c06\u5411\u60a8\u5206\u6b65\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528\u5185\u7f6e\u7684\u5b9a\u65f6\u5de5\u5177":37,"\u6b64\u65b9\u6cd5\u4e0d\u80fd\u83b7\u53d6":11,"\u6b64\u65f6\u53ef\u4ee5\u5728\u8c03\u7528infer\u63a5\u53e3\u65f6\u901a\u8fc7\u8bbe\u7f6e":11,"\u6b64\u65f6\u53ef\u4ee5\u8df3\u8fc7paddlepaddle\u6a21\u578b\u53c2\u6570\u6587\u4ef6\u7684\u5934\u4fe1\u606f":13,"\u6b64\u76ee\u5f55":20,"\u6b64\u793a\u4f8b":20,"\u6b64\u7c7b\u62a5\u9519\u901a\u5e38\u662f\u7531\u4e8e\u67d0\u4e00\u4e2a\u8282\u70b9\u7684\u9519\u8bef\u5bfc\u81f4\u8fd9\u4e2a\u8282\u70b9\u7684\u8bad\u7ec3\u8fdb\u7a0b\u9000\u51fa":9,"\u6b65\u9aa4":11,"\u6bb5\u843d\u53ef\u4ee5\u770b\u4f5c\u662f\u4e00\u4e2a\u5d4c\u5957\u7684\u53cc\u5c42\u7684\u5e8f\u5217":41,"\u6bb5\u843d\u662f\u7531\u53e5\u5b50\u6784\u6210\u7684\u5e8f\u5217":19,"\u6bcf\u4e00\u4e2a":20,"\u6bcf\u4e00\u4e2a\u5916\u5c42\u5e8f\u5217\u53c8\u542b\u6709\u82e5\u5e72\u4e2a\u5185\u5c42\u5e8f\u5217":19,"\u6bcf\u4e00\u4e2a\u5e8f\u5217\u5728\u6574\u4e2a":19,"\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65":39,"\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u4e4b\u95f4\u7684\u795e\u7ecf\u7f51\u7edc\u5177\u6709\u4e00\u5b9a\u7684\u76f8\u5173\u6027":39,"\u6bcf\u4e00\u4e2a\u8282\u70b9\u90fd\u6709\u76f8\u540c\u7684\u65e5\u5fd7\u7ed3\u6784":23,"\u6bcf\u4e00\u4e2a\u8f93\u5165":[19,20],"\u6bcf\u4e00\u4e2alayer\u8f93\u51fa\u77e9\u9635\u7684\u9ad8\u5ea6":11,"\u6bcf\u4e00\u7ec4\u5185\u7684\u6240\u6709\u53e5\u5b50\u548clabel":39,"\u6bcf\u4e00\u884c\u5143\u7d20\u5728":19,"\u6bcf\u4e2a\u5143\u7d20\u662f\u4e00\u4e2a0\u5c42\u5e8f\u5217":38,"\u6bcf\u4e2a\u5143\u7d20\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"\u6bcf\u4e2a\u5355\u5c42rnn":41,"\u6bcf\u4e2a\u53c2\u6570\u670d\u52a1\u5668\u53ea\u4fdd\u5b58\u6574\u4e2a\u795e\u7ecf\u7f51\u7edc\u6240\u6709\u53c2\u6570\u7684\u4e00\u90e8\u5206":22,"\u6bcf\u4e2a\u53e5\u5b50\u53c8\u662f\u5355\u8bcd\u7684\u6570\u7ec4":39,"\u6bcf\u4e2a\u53e5\u5b50\u90fd\u4ee5\u5f00\u59cb\u6807\u8bb0\u5f00\u5934":42,"\u6bcf\u4e2a\u53e5\u5b50\u90fd\u4ee5\u7ed3\u675f\u6807\u8bb0\u7ed3\u5c3e":42,"\u6bcf\u4e2a\u5b50\u5e8f\u5217\u957f\u5ea6\u53ef\u4ee5\u4e0d\u4e00\u81f4":39,"\u6bcf\u4e2a\u5c42\u5728\u5176":6,"\u6bcf\u4e2a\u6279\u6b21\u6570\u636e":33,"\u6bcf\u4e2a\u65f6\u95f4\u6b65\u4e4b\u5185\u7684\u8fd0\u7b97\u662f\u72ec\u7acb\u7684":41,"\u6bcf\u4e2a\u65f6\u95f4\u6b65\u90fd\u7528\u4e86\u4e0a\u4e00\u4e2a\u65f6\u95f4\u6b65\u7684\u8f93\u51fa\u7ed3\u679c":39,"\u6bcf\u4e2a\u6743\u91cd\u5bf9\u5e94\u4e00\u4e2a\u8f93\u5165":6,"\u6bcf\u4e2a\u6837\u672c\u7531\u4e24\u90e8\u5206\u7ec4\u6210":39,"\u6bcf\u4e2a\u6837\u672c\u95f4\u7528\u7a7a\u884c\u5206\u5f00":39,"\u6bcf\u4e2a\u72b6\u6001":41,"\u6bcf\u4e2a\u7ebf\u7a0b":33,"\u6bcf\u4e2a\u7ebf\u7a0b\u5206\u914d\u5230128\u4e2a\u6837\u672c\u7528\u4e8e\u8bad\u7ec3":33,"\u6bcf\u4e2a\u8bad\u7ec3\u8282\u70b9\u5fc5\u987b\u6307\u5b9a\u4e00\u4e2a\u552f\u4e00\u7684id\u53f7":33,"\u6bcf\u4e2a\u8f93\u5165\u90fd\u662f\u4e00\u4e2a":6,"\u6bcf\u4e2a\u8f93\u51fa\u8282\u70b9\u90fd\u8fde\u63a5\u5230\u6240\u6709\u7684\u8f93\u5165\u8282\u70b9\u4e0a":6,"\u6bcf\u4e2a\u90e8\u5206\u5206\u522b\u7ed9\u6bcf\u4e2atrainer\u4f7f\u7528":22,"\u6bcf\u4e2acommit\u53ea\u505a\u4e86\u5c11\u91cf\u7684\u4fee\u6539":4,"\u6bcf\u4e2apod\u5305\u542b\u4e00\u4e2apaddlepaddle\u5bb9\u5668":27,"\u6bcf\u4e2atrainer\u542f\u52a8\u540e\u8bfb\u53d6\u5207\u5206\u597d\u7684\u4e00\u90e8\u5206\u6570\u636e":22,"\u6bcf\u4e2atrainer\u7684\u552f\u4e00id":21,"\u6bcf\u4e2atrainer\u8fdb\u7a0b\u9700\u8981\u80fd\u591f\u8bfb\u53d6\u5c5e\u4e8e\u81ea\u5df1\u7684\u4e00\u4efd\u6570\u636e":21,"\u6bcf\u53f0\u670d\u52a1\u5668\u5177\u6709\u96c6\u7fa4\u4e2d\u552f\u4e00\u7684ip\u5730\u5740":31,"\u6bcf\u5c42\u4e0a\u53ea\u80fd\u4fdd\u5b58\u56fa\u5b9a\u6570\u76ee\u4e2a\u6700\u597d\u7684\u72b6\u6001":33,"\u6bcf\u5c42\u4f7f\u7528\u7684gpu\u53f7\u4f9d\u8d56\u4e8e\u53c2\u6570train":35,"\u6bcf\u6279\u6b21":33,"\u6bcf\u6b21\u63d0\u4ea4\u4ee3\u7801":4,"\u6bcf\u6b21\u63d0\u4ea4\u65f6":4,"\u6bcf\u884c\u8868\u793a\u4e00\u4e2a\u6279\u6b21\u4e2d\u7684\u5355\u4e2a\u8f93\u5165":6,"\u6bcf\u8f6e\u4f1a\u5c06\u6570\u636e\u96c6\u4e2d\u7684\u6240\u6709\u8bad\u7ec3\u6837\u672c\u4f7f\u7528\u4e00\u6b21":33,"\u6bcf\u8f6e\u7ed3\u675f\u65f6\u5bf9\u6240\u6709\u6d4b\u8bd5\u6570\u636e\u8fdb\u884c\u6d4b\u8bd5":33,"\u6bcf\u8f6e\u90fd\u4f1a\u4fdd\u5b58\u9884\u6d4b\u7ed3\u679c":33,"\u6bcf\u8fd0\u884c\u591a\u5c11\u4e2a\u6279\u6b21\u6267\u884c\u4e00\u6b21\u7a00\u758f\u53c2\u6570\u5206\u5e03\u7684\u68c0\u67e5":33,"\u6bcfdot":33,"\u6bcflog":33,"\u6bcfsave":33,"\u6bcftest":33,"\u6bd4\u5982":[0,1,4,9,11],"\u6bd4\u5982\u4e00\u53e5\u8bdd\u4e2d\u7684\u6bcf\u4e00\u4e2a\u5355\u8bcd":39,"\u6bd4\u5982\u5728":1,"\u6bd4\u5982\u5982\u679c\u8981build\u4e00\u4e2a\u4e0d\u4f9d\u8d56gpu":4,"\u6bd4\u5982\u8bbe\u7f6e\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42\u7684\u53c2\u6570\u521d\u59cb\u5316\u65b9\u5f0f\u548cbias\u521d\u59cb\u5316\u65b9\u5f0f":13,"\u6bd4\u5982cento":3,"\u6bd4\u5982fpe":9,"\u6bd4\u5982ide\u914d\u7f6e\u91cc":4,"\u6bd4\u5982pil\u5e93\u7b49":21,"\u6c34\u6e29":39,"\u6c49\u5ead":39,"\u6ca1":39,"\u6ca1\u6709\u627e\u5230\u548c\u5f53\u524d\u7cfb\u7edf\u5339\u914d\u7684paddlepaddle\u5b89\u88c5\u5305":[3,8],"\u6ce8":[0,1,19],"\u6ce8\u610f":[0,6,7,11,14,20,21,27,42],"\u6ce8\u610f\u4e0a\u8ff0\u547d\u4ee4\u4e2d":27,"\u6ce8\u610f\u4e8b\u9879":19,"\u6ce8\u610f\u5230\u6211\u4eec\u5df2\u7ecf\u5047\u8bbe\u673a\u5668\u4e0a\u67094\u4e2agpu":35,"\u6ce8\u610fnode":27,"\u6cf3\u6c60":39,"\u6d41":39,"\u6d41\u7a0b\u6765\u63d0\u4ea4\u4ee3\u7801":4,"\u6d44":39,"\u6d4b\u8bd5":4,"\u6d4b\u8bd5\u65f6\u6307\u5b9a\u7684\u5b58\u50a8\u6a21\u578b\u5217\u8868\u7684\u6587\u4ef6":33,"\u6d4b\u8bd5\u662f":4,"\u6d4b\u8bd5\u7684\u6a21\u578b\u5305\u62ec\u4ece\u7b2cm\u8f6e\u5230\u7b2cn":35,"\u6d4b\u8bd5model_list":32,"\u6d4b\u8bd5save_dir":32,"\u6d6e\u70b9\u578b\u7a00\u758f\u77e9\u9635":19,"\u6d6e\u70b9\u578b\u7a20\u5bc6\u77e9\u9635":19,"\u6d6e\u70b9\u5f02\u5e38\u901a\u5e38\u7684\u539f\u56e0\u662f\u6d6e\u70b9\u6570\u6ea2\u51fa":11,"\u6d6e\u70b9\u6570":19,"\u6d6e\u70b9\u6570\u5411\u91cf\u7b49":19,"\u6d6e\u70b9\u7a00\u758f\u6570\u636e":6,"\u6df1\u5ea6\u5b66\u4e60\u7b97\u6cd5\u7684\u5b9e\u73b0\u6709\u7740\u591a\u6837\u5316\u7684\u7279\u70b9":34,"\u6df7\u5408\u5f53\u524d\u8bcd\u5411\u91cf\u548cattention\u52a0\u6743\u7f16\u7801\u5411\u91cf":42,"\u6dfb\u52a0\u542f\u52a8\u811a\u672c":27,"\u6dfb\u52a0\u5e8f\u5217\u4fe1\u606f":19,"\u6e05\u7406\u548c\u7ed3\u675f":20,"\u6e05\u7406\u6389\u8001\u65e7\u7684paddlepaddle\u5b89\u88c5\u5305":8,"\u6e29\u99a8":39,"\u6e90\u4ee3\u7801\u683c\u5f0f":4,"\u6e90\u5e8f\u5217":42,"\u6e90\u7801\u4e2d\u6784\u5efa\u7528\u4e8e\u7f16\u8bd1paddlepaddle\u7684docker\u955c\u50cf":0,"\u6e90\u7801\u6811\u6839\u76ee\u5f55":0,"\u6f5c\u5728\u4f1a\u5f15\u8d77\u672a\u5b9a\u4e49\u884c\u4e3a":19,"\u6fc0\u6d3b":6,"\u6fc0\u6d3b\u65b9\u7a0b":6,"\u6fc0\u6d3b\u7684\u7c7b\u578b":6,"\u70b9\u51fb":3,"\u70b9\u51fb\u8fd9\u91cc":7,"\u70ed\u60c5":39,"\u7136\u540e":[23,37],"\u7136\u540e\u4e0b\u8f7d\u4f18\u5316\u66f4\u65b0\u540e\u7684\u795e\u7ecf\u7f51\u7edc\u53c2\u6570":22,"\u7136\u540e\u4ea4\u7ed9step\u51fd\u6570":41,"\u7136\u540e\u5355\u51fb":4,"\u7136\u540e\u53ef\u4ee5\u4ecehead\u8282\u70b9ssh\u65e0\u5bc6\u7801\u767b\u5f55\u5230openmpi\u7684\u6bcf\u4e2a\u8282\u70b9\u4e0a":28,"\u7136\u540e\u53ef\u4ee5\u4f7f\u7528\u547d\u4ee4\u884c\u5de5\u5177\u521b\u5efajob":27,"\u7136\u540e\u5728\u4e0b\u4e00\u4e2a\u65f6\u95f4\u6b65\u8f93\u5165\u7ed9\u53e6\u4e00\u4e2a\u795e\u7ecf\u5143":39,"\u7136\u540e\u5728\u6d4f\u89c8\u5668\u4e2d\u8f93\u5165\u4ee5\u4e0b\u7f51\u5740":1,"\u7136\u540e\u5728dataprovider\u91cc\u9762\u6839\u636e\u8be5\u5730\u5740\u52a0\u8f7d\u5b57\u5178":13,"\u7136\u540e\u5b89\u88c5paddle\u7684python\u73af\u5883":8,"\u7136\u540e\u5b9a\u4e49":42,"\u7136\u540e\u5c06\u6784\u5efa\u6210\u529f\u7684\u955c\u50cf\u4e0a\u4f20\u5230\u955c\u50cf\u4ed3\u5e93":27,"\u7136\u540e\u5c06\u8fd9\u4e9blayer\u7684\u53c2\u6570":12,"\u7136\u540e\u6240\u6709\u7528":4,"\u7136\u540e\u6253\u5370\u8f93\u51fa":14,"\u7136\u540e\u63d0\u4ea4\u65b0\u6dfb\u52a0\u7684":4,"\u7136\u540e\u70b9\u51fb":4,"\u7136\u540e\u7533\u660e\u4e00\u4e2a\u5b58\u50a8\u5377":27,"\u7136\u540e\u89c2\u5bdf\u5230\u8f93\u51fa\u7684\u53d8\u5316\u4e3a":6,"\u7136\u540e\u901a\u8fc7\u51fd\u6570":27,"\u7136\u540e\u901a\u8fc7\u81ea\u8eab\u7684ip\u5730\u5740\u5728":27,"\u7136\u540e\u91cd\u65b0cmake\u5373\u53ef":8,"\u7136\u800c":[33,42],"\u7248\u672c":[0,3],"\u7248\u672c\u4e3acpu_avx_mkl":1,"\u7248\u672c\u4e3acpu_avx_openbla":[3,16],"\u7248\u672c\u5728":4,"\u7248\u672c\u8bf4\u660e":3,"\u7279\u522b\u662f\u5728lstm\u7b49rnn\u4e2d":11,"\u7279\u6307":20,"\u73af\u5883\u51c6\u5907":22,"\u73af\u5883\u53d8\u91cf":21,"\u73af\u5883\u53d8\u91cf\u4e2d":17,"\u73af\u5883\u53d8\u91cf\u6765\u6307\u5b9a\u7279\u5b9a\u7684gpu":11,"\u7406\u89e3":0,"\u7406\u89e3\u4e3a\u4e00\u4e2a\u4e00\u7ef4\u7684\u6574\u578b\u6570\u7ec4":19,"\u751a\u81f3\u80fd\u89e3\u91ca\u4e3a\u4ec0\u4e48\u67d0\u4e2a\u64cd\u4f5c\u82b1\u4e86\u5f88\u957f\u65f6\u95f4":37,"\u751f\u4ea7\u955c\u50cf":4,"\u751f\u6210":27,"\u751f\u6210\u540e\u7684\u6587\u6863\u5206\u522b\u5b58\u50a8\u5728\u7f16\u8bd1\u76ee\u5f55\u7684":7,"\u751f\u6210\u5e8f\u5217\u7684\u6700\u5927\u957f\u5ea6":42,"\u751f\u6210\u7684\u6570\u636e\u5c06\u4f1a\u5b58\u50a8\u5728\u8fd9\u4e2avolume\u4e0b":27,"\u751f\u6210\u7684\u6570\u636e\u7f13\u5b58\u5728\u5185\u5b58\u91cc":11,"\u751f\u6210\u7f51\u7edc\u5c42\u914d\u7f6e":6,"\u751f\u6210\u81ea\u5df1\u76ee\u5f55\u4e0b\u7684\u4ed3\u5e93":4,"\u7528\u4e8e\u521d\u59cb\u5316\u53c2\u6570\u548c\u8bbe\u7f6e":6,"\u7528\u4e8e\u5c06\u53c2\u6570\u4f20\u9012\u7ed9\u7f51\u7edc\u914d\u7f6e":35,"\u7528\u4e8e\u6307\u5b9a\u5176\u8981\u5173\u8054\u7684layer":12,"\u7528\u4e8e\u6307\u5b9a\u7f51\u7edc\u914d\u7f6e\u6587\u4ef6":33,"\u7528\u4e8e\u7a00\u758f\u7c7b\u578b\u53c2\u6570\u901a\u4fe1\u7684\u7aef\u53e3\u4e2a\u6570":21,"\u7528\u4e8e\u7a00\u758f\u8bad\u7ec3\u4e2d":33,"\u7528\u4e8e\u83b7\u53d6\u7279\u5b9alayer\u4e0a\u4e00\u65f6\u95f4\u6b65\u7684\u8f93\u51fa":12,"\u7528\u4e8e\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898":18,"\u7528\u4e8e\u8ba1\u7b97\u7f16\u7801\u5411\u91cf\u7684\u52a0\u6743\u548c":42,"\u7528\u4e8e\u8bad\u7ec3\u795e\u7ecf\u7f51\u7edc\u7684\u6570\u636e":22,"\u7528\u53cc\u5411\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7f16\u7801":42,"\u7528\u591a\u5bf9\u6548\u679c\u5b8c\u5168\u76f8\u540c\u7684":39,"\u7528\u6237\u53ea\u9700\u5b9a\u4e49rnn\u5728\u4e00\u4e2a\u65f6\u95f4\u6b65\u5185\u5b8c\u6210\u7684\u8ba1\u7b97":41,"\u7528\u6237\u53ef\u4ee5\u5206\u522b\u67e5\u770b\u6700\u65b0\u7684":7,"\u7528\u6237\u53ef\u4ee5\u53c2\u8003sphinx\u6559\u7a0b\u8fdb\u884c\u4e66\u5199":7,"\u7528\u6237\u53ef\u4ee5\u81ea\u5b9a\u4e49beam":33,"\u7528\u6237\u53ef\u4ee5\u8bbe\u7f6e":35,"\u7528\u6237\u53ef\u5728\u8c03\u7528cmake\u7684\u65f6\u5019\u8bbe\u7f6e\u5b83\u4eec":0,"\u7528\u6237\u5728\u4f7f\u7528\u8fd9\u4e00\u7c7brecurr":12,"\u7528\u6237\u5728\u4f7f\u7528paddlepaddl":8,"\u7528\u6237\u5c06\u53c2\u6570\u8f7d\u5165":13,"\u7528\u6237\u5c06\u914d\u7f6e\u4e0e\u8bad\u7ec3\u6570\u636e\u5207\u5206\u597d\u653e\u5728\u5206\u5e03\u5f0f\u6587\u4ef6\u7cfb\u7edf\u9884\u5148\u5206\u914d\u597d\u7684\u76ee\u5f55\u4e2d":27,"\u7528\u6237\u5f3a\u5236\u6307\u5b9a\u7279\u5b9a\u7684python\u7248\u672c":8,"\u7528\u6237\u7684\u96c6\u7fa4\u73af\u5883\u4e0d\u5c3d\u76f8\u540c":24,"\u7528\u6237\u901a\u8fc7\u53c2\u6570":[12,13],"\u7528\u6237\u9700\u8981\u5728\u7f51\u7edc\u914d\u7f6e\u4e2d\u6307\u5b9a":35,"\u7528\u6237\u9700\u8981\u6307\u5b9a\u672c\u673a\u4e0apython\u7684\u8def\u5f84":8,"\u7528\u6765\u4ece\u53c2\u6570\u670d\u52a1\u5668\u9884\u53d6\u53c2\u6570\u77e9\u9635\u76f8\u5e94\u7684\u884c":6,"\u7528\u8fd9\u4e2a\u955c\u50cf\u521b\u5efa\u7684\u5bb9\u5668\u9700\u8981\u6709\u4ee5\u4e0b\u4e24\u4e2a\u529f\u80fd":27,"\u7531":[12,19,41],"\u7531\u4e8e":17,"\u7531\u4e8e\u5b83\u5185\u90e8\u5305\u542b\u4e86\u6bcf\u7ec4\u6570\u636e\u4e2d\u7684\u6240\u6709\u53e5\u5b50":39,"\u7531\u4e8e\u6211\u4eec\u60f3\u8981\u7684\u53d8\u6362\u662f\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217":39,"\u7531\u4e8e\u6211\u4eec\u652f\u6301\u8bad\u7ec3\u6570\u636e\u6709\u4e0d\u540c\u7684\u6279\u6b21\u5927\u5c0f":6,"\u7531\u4e8estep":41,"\u7531\u8bcd\u8bed\u6784\u6210\u7684\u53e5\u5b50":38,"\u7531\u94fe\u63a5\u65b9\u5f0f\u51b3\u5b9a":17,"\u7535\u8111":39,"\u767b\u5f55\u5230head\u8282\u70b9":28,"\u7684":[3,4,19,20,21,23,27,39],"\u7684\u4e00\u4e2a\u7b80\u5355\u8c03\u7528\u5982\u4e0b":41,"\u7684\u4e3a0":33,"\u7684\u4efb\u4e00\u4e00\u79cd":11,"\u7684\u4f7f\u7528\u793a\u4f8b\u5982\u4e0b":38,"\u7684\u504f\u7f6e\u5411\u91cf":6,"\u7684\u5171\u4eab\u5df2\u7ecf\u52a0\u8f7d\u7684\u9884\u6d4b\u6a21\u578b":20,"\u7684\u5185\u5b58":11,"\u7684\u5185\u5bb9\u6765\u5b9a\u5236imag":27,"\u7684\u5185\u6838block\u4f7f\u7528\u60c5\u51b5":37,"\u7684\u53c2\u6570\u4f7f\u4e4b\u652f\u6301\u5f02\u6b65sgd\u66f4\u65b0":21,"\u7684\u53cd\u5411\u4f20\u64ad\u5c06\u4f1a\u6253\u5370\u65e5\u5fd7\u4fe1\u606f":33,"\u7684\u53d8\u6362\u77e9\u9635":6,"\u7684\u540d\u79f0\u76f8\u540c":42,"\u7684\u5411\u91cf":6,"\u7684\u542f\u52a8\u53c2\u6570":27,"\u7684\u542f\u52a8\u53c2\u6570\u5e76\u6267\u884c\u8fdb\u7a0b":27,"\u7684\u5730\u65b9":4,"\u7684\u5747\u5300\u5206\u5e03":13,"\u7684\u591a\u79cd\u5b89\u88c5\u65b9\u5f0f":31,"\u7684\u5de5\u4f5c\u6d41\u7a0b\u5982\u56fe1\u6240\u793a":20,"\u7684\u5e73\u5747\u503c":38,"\u7684\u5e8f\u5217":19,"\u7684\u5e8f\u5217\u5f62\u72b6\u4e00\u81f4":39,"\u7684\u5f00\u53d1\u5de5\u4f5c\u90fd\u5e94\u8be5\u5728\u4e00\u4e2a\u65b0\u7684\u5206\u652f\u4e0a\u5b8c\u6210":4,"\u7684\u5f00\u53d1\u6d41\u7a0b":0,"\u7684\u63a5\u53e3\u8bf7\u67e5\u770b":19,"\u7684\u63cf\u8ff0\u8bf4\u660e\u4e2d":4,"\u7684\u6570\u76ee\u4e00\u81f4":38,"\u7684\u6587\u4ef6\u4e5f\u5e26\u5230\u65b0\u5206\u652f\u4e0a":4,"\u7684\u65b9\u7a0b":6,"\u7684\u65f6\u95f4\u6b65\u4fe1\u606f\u6210\u6b63\u6bd4":11,"\u7684\u66f4\u8be6\u7ec6\u51c6\u786e\u7684\u5b9a\u4e49":39,"\u7684\u6700\u5c0f\u503c":33,"\u7684\u6700\u65b0\u4ee3\u7801\u5e76\u66f4\u65b0\u5f53\u524d\u5206\u652f":4,"\u7684\u67b6\u6784\u7684\u793a\u4f8b":42,"\u7684\u6837\u5f0f":4,"\u7684\u6838\u5fc3\u662f\u8bbe\u8ba1step\u51fd\u6570\u7684\u8ba1\u7b97\u903b\u8f91":41,"\u7684\u6bb5\u843d\u5b9a\u4e49\u4e3a\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":41,"\u7684\u6bcf\u4e2a\u8fdb\u7a0b\u90fd\u53ef\u4ee5\u4ececeph\u8bfb\u53d6\u6570\u636e":26,"\u7684\u6e90\u7801\u4ee5\u53ca\u751f\u6210\u6587\u6863\u9700\u8981\u591a\u79cd\u5f00\u53d1\u5de5\u5177":4,"\u7684\u72b6\u6001":41,"\u7684\u77e9\u9635":[6,11],"\u7684\u7a20\u5bc6\u5411\u91cf\u4f5c\u4e3a\u8f93\u5165":6,"\u7684\u7a20\u5bc6\u77e9\u9635":19,"\u7684\u7a20\u5bc6\u77e9\u9635\u662f\u4e00\u4e2a\u7531":19,"\u7684\u7b2c\u4e00\u4e2a\u53c2\u6570":20,"\u7684\u7b2ci\u4e2a\u503c":6,"\u7684\u7b2cj\u4e2a\u503c":6,"\u7684\u7cfb\u7edf":0,"\u7684\u7f16\u5199":21,"\u7684\u8bdd":11,"\u7684\u8f93\u5165":41,"\u7684\u8f93\u51fa":[11,37],"\u7684\u8f93\u51fa\u4fe1\u606f\u5165\u624b\u662f\u4e2a\u4e0d\u9519\u7684\u9009\u62e9":37,"\u7684\u8f93\u51fa\u51fd\u6570\u8fd4\u56de\u7684\u662f\u4e0b\u4e00\u4e2a\u65f6\u523b\u8f93\u51fa\u8bcd\u7684":42,"\u7684\u8f93\u51fa\u683c\u5f0f":39,"\u7684\u8f93\u51fa\u88ab\u7528\u4f5c":42,"\u7684\u8f93\u51fab\u662f\u4e00\u4e2a":11,"\u7684\u8fd0\u884c\u73af\u5883":0,"\u7684\u8fdc\u7a0b\u4ed3\u5e93\u7684\u540d\u5b57":4,"\u7684\u96c6\u88c5\u7bb1\u6280\u672f":0,"\u7684\u9875\u9762\u5220\u9664\u8fdc\u7a0b\u4ed3\u5e93\u7684\u5206\u652f":4,"\u7684docker\u955c\u50cf":1,"\u7684linux\u670d\u52a1\u5668\u7ec4\u6210":31,"\u76d1\u542c\u7684\u7aef\u53e3\u4e2a\u6570":21,"\u76ee\u524d":41,"\u76ee\u524d\u4f7f\u7528":4,"\u76ee\u524d\u63d0\u4f9b\u4e09\u79cd\u94fe\u63a5\u65b9\u5f0f":17,"\u76ee\u524d\u652f\u6301\u4e24\u79cd":38,"\u76ee\u524d\u652f\u6301cento":16,"\u76ee\u524d\u652f\u6301fail":33,"\u76ee\u524d\u8be5\u53c2\u6570\u4ec5\u7528\u4e8eaucvalidationlayer\u548cpnpairvalidationlayer\u5c42":33,"\u76ee\u524d\u8fd8\u672a\u652f\u6301":41,"\u76ee\u524dpaddlepaddle\u7684develop\u5206\u652f\u7684\u6587\u6863\u662f\u81ea\u52a8\u89e6\u53d1\u66f4\u65b0\u7684":7,"\u76ee\u5f55":[0,1,23,26,27],"\u76ee\u5f55\u4e0b":[6,23],"\u76ee\u5f55\u4e0b\u7684\u4ee3\u7801\u793a\u4f8b":20,"\u76ee\u5f55\u4e0b\u7684python\u5305":8,"\u76ee\u5f55\u4e2d":[17,20,23],"\u76ee\u5f55\u4e2d\u7684":37,"\u76ee\u5f55\u4e2dpaddl":27,"\u76ee\u5f55\u5c31\u6210\u4e3a\u4e86\u5171\u4eab\u5b58\u50a8":27,"\u76ee\u6807\u5411\u91cf":42,"\u76f4\u5230\u8bad\u7ec3\u6536\u655b\u4e3a\u6b62":13,"\u76f4\u63a5\u5347\u7ea7\u5230\u66f4\u65b0\u7684\u7248\u672c":0,"\u76f4\u63a5\u8fd0\u884c":1,"\u76f8\u540c\u540d\u5b57\u7684\u53c2\u6570":13,"\u76f8\u5bf9":39,"\u76f8\u5f53":39,"\u76f8\u6bd4\u4e8e\u6a21\u578b\u8bad\u7ec3":18,"\u770b\u5f53\u524dmpi\u96c6\u7fa4\u662f\u5426\u652f\u6301resourc":9,"\u77a7":16,"\u77e9\u9635":32,"\u77e9\u9635\u4e2d\u6bcf\u4e2a\u5143\u7d20\u7684\u503c\u968f\u673a\u751f\u6210":19,"\u77e9\u9635\u662f\u5426\u662f\u4e00\u4e2a\u5e8f\u5217":19,"\u77e9\u9635\u7684\u9ad8\u5ea6":19,"\u77e9\u9635\u91cc\u7684\u5143\u7d20\u662f\u6d6e\u70b9\u6570":19,"\u786e\u4fdd\u7f16\u8bd1\u5668\u9009\u9879":4,"\u78c1\u76d8\u4e0d\u591f":0,"\u78c1\u76d8\u7a7a\u95f4\u4e0d\u8db3\u7b49":9,"\u793a\u4f8b":[11,13,20],"\u793a\u4f8b3\u5bf9\u4e8e\u5355\u5c42rnn\u548c\u53cc\u5c42rnn\u6570\u636e\u5b8c\u5168\u76f8\u540c":39,"\u793a\u4f8b3\u7684\u914d\u7f6e\u4f7f\u7528\u4e86\u5355\u5c42rnn\u548c\u53cc\u5c42rnn":39,"\u793a\u4f8b3\u7684\u914d\u7f6e\u5206\u522b\u4e3a":39,"\u793a\u4f8b\u4ee3\u7801\u5982\u4e0b":[11,20],"\u793a\u4f8b\u5982\u4e0b":13,"\u793a\u4f8b\u7a0b\u5e8f":21,"\u795e\u7ecf\u7f51\u7edc\u4e2d\u4e00\u4e2a\u8ba1\u7b97\u5c42\u7684\u8f93\u5165":19,"\u795e\u7ecf\u7f51\u7edc\u4e2d\u4e00\u4e2a\u8ba1\u7b97\u5c42\u7684\u8f93\u5165\u8f93\u51fa\u88ab\u7ec4\u7ec7\u4e3a\u4e00\u4e2a":20,"\u795e\u7ecf\u7f51\u7edc\u4e5f\u9700\u8981\u4e00\u4e9b\u7279\u5b9a\u7684layer\u4f5c\u4e3a\u8f93\u5165\u63a5\u53e3":14,"\u795e\u7ecf\u7f51\u7edc\u53c2\u6570\u4ee5\u53ca\u8fed\u4ee3\u65b9\u7a0b":14,"\u795e\u7ecf\u7f51\u7edc\u5728\u8bad\u7ec3\u7684\u65f6\u5019":11,"\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u7ed3\u6784\u548c\u8bad\u7ec3\u597d\u7684\u6a21\u578b\u5c06\u88ab\u5e8f\u5217\u5316\u5408\u5e76\u5165\u4e00\u4e2a\u6587\u4ef6":20,"\u795e\u7ecf\u7f51\u7edc\u7684\u7f51\u7edc\u7ed3\u6784\u4e2d\u5177\u6709\u6709\u5411\u73af\u7ed3\u6784":39,"\u795e\u7ecf\u7f51\u7edc\u7684\u8bad\u7ec3\u672c\u8eab\u662f\u4e00\u4e2a\u975e\u5e38\u6d88\u8017\u5185\u5b58\u548c\u663e\u5b58\u7684\u5de5\u4f5c":11,"\u79bb":39,"\u79f0\u4e3a":[4,42],"\u79f0\u4e4b\u4e3a":19,"\u79f0\u4e4b\u4e3a\u53cc\u5c42\u5e8f\u5217\u7684\u4e00\u4e2a\u5b50\u5e8f\u5217":38,"\u79f0\u4e4b\u4e3a\u96c6\u675f\u5927\u5c0f":33,"\u79fb\u52a8\u7aef\u9884\u6d4b":19,"\u7a00\u758f\u6570\u636e\u7684\u683c\u5f0f":6,"\u7a00\u758f\u66f4\u65b0\u7684\u7aef\u53e3\u6570\u91cf":27,"\u7a00\u758f\u768401\u5411\u91cf":14,"\u7a00\u758f\u7684\u5411\u91cf":14,"\u7a00\u758f\u77e9\u9635":19,"\u7a00\u758f\u77e9\u9635\u4f7f\u7528":19,"\u7a00\u758f\u77e9\u9635\u53ca\u76f8\u5173\u7684\u63a5\u53e3":19,"\u7a00\u758f\u77e9\u9635\u5b58\u50a8\u793a\u610f\u56fe":19,"\u7a00\u758f\u77e9\u9635\u7684\u4e58\u79ef\u5e94\u7528\u4e8e\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b":35,"\u7a0b\u5e8f\u4ece\u6b64\u76ee\u5f55\u62f7\u8d1d\u6587\u4ef6\u5230\u5bb9\u5668\u5185\u8fdb\u884c\u8bad\u7ec3":27,"\u7a0b\u5e8f\u4f9d\u8d56":21,"\u7a0b\u5e8f\u505c\u6b62":33,"\u7a0b\u5e8f\u76f4\u63a5\u9000\u51fa":33,"\u7a20\u5bc6\u5411\u91cf":6,"\u7a20\u5bc6\u66f4\u65b0\u7684\u7aef\u53e3\u6570\u91cf":27,"\u7a20\u5bc6\u7684\u6d6e\u70b9\u6570\u5411\u91cf":14,"\u7a20\u5bc6\u77e9\u9635":19,"\u7a97\u6237":39,"\u7acb\u523b\u9000\u51fa":11,"\u7acb\u5373\u6267\u884c\u5355\u5143\u6d4b\u8bd5":0,"\u7aef\u53e3":9,"\u7aef\u6570\u636e\u7c7b\u578b":19,"\u7aef\u8bfb\u53d6\u6570\u636e":11,"\u7b2c":39,"\u7b2c\u4e00\u4e2a":4,"\u7b2c\u4e00\u4e2a\u6837\u672c\u540c\u65f6encode\u4e24\u6761\u6570\u636e\u6210\u4e24\u4e2a\u5411\u91cf":39,"\u7b2c\u4e00\u5929":39,"\u7b2c\u4e00\u6b65\u9700\u8c03\u7528":20,"\u7b2c\u4e00\u7ae0\u8282":14,"\u7b2c\u4e09\u65b9\u4f9d\u8d56\u5e93\u9700\u8981\u6309\u7167\u4e0e\u65b9\u5f0f2\u540c\u6837\u65b9\u6cd5\u663e\u793a\u5730\u8fdb\u884c\u94fe\u63a5":17,"\u7b2c\u4e09\u65b9\u94fe\u63a5\u5e93\u548c\u5934\u6587\u4ef6":17,"\u7b2c\u4e8c\u4e2a":11,"\u7b2c\u4e8c\u7c7b":12,"\u7b2ci\u884c\u7b2cj\u5217\u7684\u6570\u503c":6,"\u7b49":[9,20],"\u7b49\u4e8e\u6837\u672c\u6570":11,"\u7b49\u53c2\u6570":27,"\u7b49\u7b2c\u4e09\u65b9\u5e93":17,"\u7b80\u5355\u6765\u8bf4":37,"\u7b80\u5355\u7684\u5168\u8fde\u63a5\u7f51\u7edc":13,"\u7b80\u5355\u7684\u6027\u80fd\u5206\u6790":37,"\u7b80\u5355\u7684yaml\u6587\u4ef6\u5982\u4e0b":26,"\u7b80\u76f4":39,"\u7b97\u6cd5":[11,42],"\u7b97\u6cd5\u4e2d\u7684beam\u5927\u5c0f":42,"\u7ba1\u7406\u4e86\u6bcf\u4e2a\u8ba1\u7b97\u5c42\u8f93\u51fa\u7684\u5b58\u50a8\u7a7a\u95f4":20,"\u7ba1\u7406\u7684\u65b9\u6cd5":24,"\u7c7b\u4f3c":38,"\u7c7b\u4f5c\u4e3a\u53c2\u6570\u7684\u62bd\u8c61":6,"\u7c7b\u522b\u4e2d\u7684\u53c2\u6570\u53ef\u7528\u4e8e\u6240\u6709\u573a\u5408":32,"\u7c7b\u522b\u6807\u7b7e\u4e4b\u4e00":20,"\u7c7b\u522b\u6807\u7b7e\u5c42":20,"\u7c7b\u578b":[19,33],"\u7c7b\u578b\u53ef\u4ee5\u662fpaddlepaddle\u652f\u6301\u7684\u4efb\u610f\u8f93\u5165\u6570\u636e\u7c7b\u578b":38,"\u7c7b\u578b\u662fnumpy\u7684ndarrai":11,"\u7c7b\u578b\u662fsparse_binary_vector":14,"\u7c7b\u578b\u662fsparse_float_vector":14,"\u7c7b\u578b\u7684":39,"\u7c7b\u578b\u8fd8\u662f":19,"\u7c7b\u7684\u5bf9\u8c61":20,"\u7c7b\u7684\u6784\u9020\u51fd\u6570\u548c\u6790\u6784\u51fd\u6570":6,"\u7c7b\u9700\u8981\u5b9e\u73b0\u521d\u59cb\u5316":6,"\u7cfb\u7edf\u4f1a\u63d0\u4f9b\u4e00\u4e2a\u5206\u5e03\u5f0f\u5b58\u50a8\u670d\u52a1":21,"\u7ebf\u7a0bid\u53f7":35,"\u7ec6\u8282\u63cf\u8ff0":34,"\u7ecf\u5e38\u4f1a\u6d88\u8017\u657010gb\u7684\u5185\u5b58\u548c\u6570gb\u7684\u663e\u5b58":11,"\u7ecf\u8fc7\u6a21\u578b\u5904\u7406\u4e4b\u540e":18,"\u7ed3\u675f\u6807\u8bb0":42,"\u7ed3\u675f\u9884\u6d4b\u4e4b\u540e":20,"\u7ed3\u6784\u4f53":[19,20],"\u7ed3\u679c\u4f1a\u5199\u5165\u5f53\u524d\u8fd0\u884c\u76ee\u5f55\u4e0b\u7684":20,"\u7ed9":39,"\u7ed9\u4e2a\u7b80\u5355\u7684":4,"\u7ed9\u5b9aencoder\u8f93\u51fa\u548c\u5f53\u524d\u8bcd":41,"\u7ef4\u57fa\u767e\u79d1\u4e2d\u6587\u9875\u9762":39,"\u7ef4\u57fa\u767e\u79d1\u9875\u9762":39,"\u7ef4\u7a7a\u95f4":42,"\u7ef4\u7a7a\u95f4\u5b8c\u6210":42,"\u7f13\u5b58\u6c60\u7684\u51cf\u5c0f":11,"\u7f16\u5199":1,"\u7f16\u5199\u4e86\u4e00\u4e2apaddlepaddle\u7684\u7a0b\u5e8f":1,"\u7f16\u5199\u5b8cyaml\u6587\u4ef6\u540e":27,"\u7f16\u5199\u672c\u6b21\u8bad\u7ec3\u7684yaml\u6587\u4ef6":27,"\u7f16\u5199\u7684\u90e8\u5206":3,"\u7f16\u53f7\u4ece0\u5f00\u59cb":11,"\u7f16\u7801\u5411\u91cf":42,"\u7f16\u7801\u5668\u8f93\u51fa":42,"\u7f16\u7801\u6e90\u5e8f\u5217":42,"\u7f16\u8bd1":[1,4],"\u7f16\u8bd1\u51fa\u7684paddlepaddle\u9884\u6d4b\u5e93\u548c\u5934\u6587\u4ef6":17,"\u7f16\u8bd1\u53ca\u5b89\u88c5":2,"\u7f16\u8bd1\u540e\u7684\u6587\u4ef6\u5c06\u88ab\u5b58\u50a8\u5728\u5de5\u4f5c\u76ee\u5f55":7,"\u7f16\u8bd1\u5b89\u88c5\u4e0e\u5355\u5143\u6d4b\u8bd5":10,"\u7f16\u8bd1\u5b8c\u6210\u4e4b\u540e":7,"\u7f16\u8bd1\u5b8c\u6210\u540e\u4f1a\u5728build":0,"\u7f16\u8bd1\u6210\u529f\u540e\u5728":17,"\u7f16\u8bd1\u6210\u52a8\u6001\u5e93":33,"\u7f16\u8bd1\u751f\u6210":7,"\u7f16\u8bd1paddlepaddl":0,"\u7f51\u7edc\u5c42\u53ef\u4ee5\u6709\u591a\u4e2a\u8f93\u5165":6,"\u7f51\u7edc\u5c42\u7684\u6807\u8bc6\u7b26\u4e3a":6,"\u7f51\u7edc\u5c42\u7684\u7c7b\u578b":6,"\u7f51\u7edc\u5c42\u7684\u7ec6\u8282\u53ef\u4ee5\u901a\u8fc7\u4e0b\u9762\u8fd9\u4e9b\u4ee3\u7801\u7247\u6bb5\u6765\u6307\u5b9a":6,"\u7f51\u7edc\u5c42\u7684\u8f93\u51fa\u662f\u7ecf\u8fc7\u6fc0\u6d3b\u51fd\u6570\u4e4b\u540e\u7684\u503c":33,"\u7f51\u7edc\u5c42\u914d\u7f6e\u5305\u542b\u4ee5\u4e0b\u51e0\u9879":6,"\u7f51\u7edc\u63a5\u53d7\u4e00\u5e45\u56fe\u7247\u4f5c\u4e3a\u8f93\u5165":20,"\u7f51\u7edc\u7ed3\u6784\u7684\u5e8f\u5217\u5316\u7ed3\u679c\u548c\u6a21\u578b\u53c2\u6570\u5b58\u50a8\u76ee\u5f55":20,"\u7f51\u7edc\u901a\u4fe1":6,"\u7f51\u901f\u6216ssl\u94fe\u63a5\u539f\u56e0":8,"\u800c":[12,42],"\u800c\u4e0d\u662f\u5728layer\u91cc\u5b9e\u73b0":12,"\u800c\u4e0d\u662f\u6e90\u7801\u76ee\u5f55\u91cc":8,"\u800c\u4e0d\u662f\u7279\u5f81\u7684\u96c6\u5408":39,"\u800c\u4e0d\u662f\u76f8\u5bf9":19,"\u800c\u4e0d\u662fc":19,"\u800c\u4e14\u4e2a\u6570\u5e76\u4e0d\u786e\u5b9a":21,"\u800c\u4e14\u5305\u542b\u4e86c":3,"\u800c\u4e14cento":3,"\u800c\u4e4b\u524d\u7684\u53c2\u6570\u5c06\u4f1a\u88ab\u5220\u9664":33,"\u800c\u4ece\u5e94\u7528\u7684\u89d2\u5ea6":37,"\u800c\u4f18\u5316\u6027\u80fd\u7684\u9996\u8981\u4efb\u52a1":37,"\u800c\u5176\u4ed6\u5c42\u4f7f\u7528cpu\u8ba1\u7b97":35,"\u800c\u53cc\u5c42rnn\u662f\u53ef\u4ee5\u5904\u7406\u8fd9\u79cd\u8f93\u5165\u6570\u636e\u7684\u7f51\u7edc\u7ed3\u6784":39,"\u800c\u53ea\u9700\u8981\u83b7\u5f97recurr":12,"\u800c\u5b89\u88c5\u5305":[3,8],"\u800c\u5b89\u88c5\u5305\u662f":[3,8],"\u800c\u5bf9\u4e8e\u53cc\u5c42\u5e8f\u5217":39,"\u800c\u5bf9\u4e8e\u6bcf\u4e00\u4e2a\u5185\u5c42\u7279\u5f81\u6570\u636e\u800c\u8a00":39,"\u800c\u5c06\u8fd9\u4e2a\u6bb5\u843d\u7684\u6bcf\u4e00\u53e5\u8bdd\u7528lstm\u7f16\u7801\u6210\u4e00\u4e2a\u5411\u91cf":39,"\u800c\u5f53\u524d\u5df2\u7ecf\u67095":37,"\u800c\u662f\u5c06\u8f93\u5165":[19,20],"\u800c\u662f\u76f4\u63a5\u4ece\u5185\u5b58\u7684\u7f13\u5b58\u91cc\u8bfb\u53d6\u6570\u636e":11,"\u800c\u66f4\u6df1\u5165\u7684\u5206\u6790":37,"\u800c\u6709\u4e9b\u53c2\u6570\u9700\u8981\u5728\u96c6\u7fa4\u591a\u673a\u8bad\u7ec3\u4e2d\u4f7f\u7528\u7b49":32,"\u800c\u6e90\u5e8f\u5217\u7684\u7f16\u7801\u5411\u91cf\u53ef\u4ee5\u88ab\u65e0\u8fb9\u754c\u7684memory\u8bbf\u95ee":42,"\u800c\u795e\u7ecf\u7f51\u7edc\u662f\u6211\u4eec\u8981\u642d\u5efa\u7684\u5b9d\u5854":14,"\u800c\u7a00\u758f\u66f4\u65b0\u5728\u53cd\u5411\u4f20\u64ad\u4e4b\u540e\u7684\u6743\u91cd\u66f4\u65b0\u65f6\u8fdb\u884c":35,"\u800c\u8fd9\u4e00\u53e5\u8bdd\u5c31\u53ef\u4ee5\u8868\u793a\u6210\u8fd9\u4e9b\u4f4d\u7f6e\u7684\u6570\u7ec4":39,"\u800c\u8fd9\u6bcf\u4e00\u4e2a\u6570\u7ec4\u5143\u7d20":39,"\u800c\u975e\u76f4\u63a5\u56de\u590d\u7684\u65b9\u5f0f":4,"\u800c\u975e\u9759\u6001\u52a0\u8f7dcuda\u52a8\u6001\u5e93":0,"\u800crnn\u662f\u6700\u6d41\u884c\u7684\u9009\u62e9":41,"\u800ctrainer\u9700\u8981\u8bfb\u53d6\u8bad\u7ec3\u6570\u636e\u8fdb\u884c\u8bad\u7ec3":14,"\u800cy_predict\u662f\u63a5\u6536x\u4f5c\u4e3a\u8f93\u5165":14,"\u8054\u901a":31,"\u80fd\u591f\u5904\u7406\u53cc\u5c42\u5e8f\u5217":41,"\u80fd\u591f\u5bf9\u53cc\u5411\u5e8f\u5217\u8fdb\u884c\u5904\u7406\u7684\u6709":41,"\u80fd\u591f\u8bb0\u5f55\u4e0a\u4e00\u4e2asubseq":41,"\u80fd\u591f\u9488\u5bf9cpu\u548cgpu\u7684\u8ba1\u7b97\u505a\u66f4\u591a\u4f18\u5316":12,"\u80fd\u83b7\u53d6":23,"\u811a\u672c":[0,20],"\u811a\u672c\u5f00\u59cb\u65f6":27,"\u811a\u672c\u96c6\u6210\u4e86\u5e8f\u5217\u5316\u795e\u7ecf\u7f51\u7edc\u7ed3\u6784\u7684\u8fc7\u7a0b":20,"\u81ea\u52a8\u5173\u95ed\u5bf9\u5e94\u7684":4,"\u81ea\u52a8\u5730\u5c06\u8fd9\u4e9b\u9009\u9879\u5e94\u7528\u5230":23,"\u81ea\u52a8\u5b8c\u6210\u8fd9\u4e00\u8fc7\u7a0b":41,"\u81ea\u52a8\u751f\u6210":7,"\u81ea\u52a8\u83b7\u53d6\u4e0a\u4e00\u4e2a\u751f\u6210\u7684\u8bcd":42,"\u81ea\u7136\u4e5f\u5c31\u6709\u7ba1\u7406\u5458\u6743\u9650":0,"\u81ea\u7136\u8bed\u8a00\u4e2d\u7684\u53e5\u5b50\u662f\u4e00\u4e2a\u5e8f\u5217":19,"\u81ea\u7136\u8bed\u8a00\u4e2d\u7684\u6bb5\u843d\u662f\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":19,"\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7b49":35,"\u81f3\u5c11\u5305\u542bgcc_3":3,"\u81f3\u5c11\u5305\u542bglibcxx_3":3,"\u81f3\u6b64":[4,39],"\u8212\u9002":39,"\u8282\u70b9":[28,31],"\u82e5":6,"\u82e5\u5728paddlepaddle\u7f16\u8bd1\u65f6":13,"\u82e5\u5e72\u4e2a\u53e5\u5b50\u6784\u6210\u4e00\u4e2a\u6bb5\u843d":38,"\u82e5\u6709\u4e0d\u4e00\u81f4\u4e4b\u5904":37,"\u82e5\u6709\u5fc5\u8981":6,"\u82e5\u8981\u5bf9\u8fd9\u51e0\u4e2alayer\u4f7f\u7528dropout":12,"\u82e5\u8f93\u51fa\u662f\u5355\u5c42\u5e8f\u5217":38,"\u82e5\u8f93\u51fa\u662f\u53cc\u5c42\u5e8f\u5217":38,"\u82f1\u6587\u6587\u6863":7,"\u82f1\u6587\u6587\u6863\u76ee\u5f55":7,"\u8303\u56f4":35,"\u83b7\u53d6":4,"\u83b7\u53d6\u53ef\u9009\u7684tag":1,"\u83b7\u53d6\u5f53\u524d\u7cfb\u7edf\u652f\u6301\u7684\u5b89\u88c5\u5305\u683c\u5f0f":3,"\u83b7\u53d6\u5f53\u524d\u7cfb\u7edf\u652f\u6301\u7684python\u5305\u7684\u540e\u7f00":8,"\u83b7\u53d6\u6e90\u7801":0,"\u83b7\u53d6\u8f93\u51fa\u65f6":20,"\u83b7\u53d6trainer":27,"\u83b7\u5f97\u53c2\u6570\u5c3a\u5bf8":6,"\u83b7\u5f97\u5728\u6a21\u578b\u914d\u7f6e\u4e2d\u67d0\u4e00\u5c42\u7684name":11,"\u83b7\u5f97\u57fa\u672c\u7684docker\u5b89\u88c5\u548c\u4f7f\u7528\u65b9\u6cd5":1,"\u83b7\u5f97\u5f53\u524dmini":11,"\u83b7\u5f97\u7684\u503c\u7c7b\u578b\u5747\u4e3a":11,"\u83b7\u5f97\u8ba1\u7b97\u7ed3\u679c":20,"\u83b7\u5f97\u8fd9\u4e9b\u8282\u70b9\u7684ip\u5730\u5740":23,"\u83b7\u5f97head\u548cnode\u8282\u70b9\u7684ip\u5730\u5740":28,"\u865a\u62df\u673a\u4e0a":0,"\u867d\u7136\u5f02\u6b65sgd\u65b9\u5f0f\u4f1a\u63d0\u9ad8\u53c2\u6570\u66f4\u65b0\u5e76\u884c\u5ea6":22,"\u867d\u7136paddle\u770b\u8d77\u6765\u5305\u542b\u4e86\u4f17\u591a\u53c2\u6570":32,"\u884c":19,"\u884c\u504f\u79fb":19,"\u8865\u5145\u4e0a\u6b21\u7684commit":4,"\u8868\u660e\u4e86\u8fd9\u4e9b\u884c\u7684\u6807\u53f7":6,"\u8868\u660e\u8fd9\u4e2a\u5c42\u7684\u4e00\u4e2a\u5b9e\u4f8b\u662f\u5426\u9700\u8981\u504f\u7f6e":6,"\u8868\u793a\u4e3adeviceid":35,"\u8868\u793a\u5c06\u5916\u5c42\u7684outer_mem\u4f5c\u4e3a\u5185\u5c42memory\u7684\u521d\u59cb\u72b6\u6001":39,"\u8868\u793a\u5f53\u524d\u96c6\u7fa4\u4f5c\u4e1a\u7684\u8282\u70b9":23,"\u8868\u793a\u7684\u504f\u79fb\u662f\u4ee5":19,"\u8868\u793a\u8bcd\u8bed\u5728\u8bcd\u5178\u4e2d\u7684\u5e8f\u53f7":19,"\u8868\u793a\u8bfb\u8005\u6240\u4f7f\u7528\u7684docker\u955c\u50cf\u4ed3\u5e93\u5730\u5740":27,"\u8868\u793a\u8fd9\u4e2ajob\u7684\u540d\u5b57":27,"\u88ab":4,"\u88ab\u5207\u5206\u6210\u591a\u4e2a\u90e8\u5206":22,"\u88ab\u6269\u5c55\u4e3a\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"\u88ab\u653e\u5728":6,"\u88ab\u79f0\u4e3a":42,"\u8981\u4f7f\u7528\u547d\u4ee4\u884c\u5206\u6790\u5de5\u5177":37,"\u8981\u5728\u5df2\u6709\u7684kubernetes\u96c6\u7fa4\u4e0a\u8fdb\u884cpaddlepaddle\u7684\u5206\u5e03\u5f0f\u8bad\u7ec3":27,"\u8981\u6c42\u5355\u5c42\u5e8f\u5217\u542b\u6709\u5143\u7d20\u7684\u6570\u76ee":38,"\u8981\u751f\u6210\u7684\u76ee\u6807\u5e8f\u5217":41,"\u8981\u8c03\u7528":6,"\u89c6\u9891\u7b49":19,"\u89e3\u51b3\u529e\u6cd5\u662f":8,"\u89e3\u51b3\u65b9\u6848\u662f":13,"\u89e3\u6790\u73af\u5883\u53d8\u91cf\u5f97\u5230":27,"\u89e3\u7801\u5668\u4f7f\u7528":42,"\u89e3\u7801\u5668\u57fa\u4e8e\u7f16\u7801\u6e90\u5e8f\u5217\u548c\u6700\u540e\u751f\u6210\u7684\u76ee\u6807\u8bcd\u9884\u6d4b\u4e0b\u4e00\u76ee\u6807\u8bcd":42,"\u89e3\u7801\u5668\u662f\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":42,"\u8ba1\u7b97":[22,42],"\u8ba1\u7b97\u504f\u7f6e\u7684\u68af\u5ea6":6,"\u8ba1\u7b97\u53cd\u5411rnn\u7684\u7b2c\u4e00\u4e2a\u5b9e\u4f8b":42,"\u8ba1\u7b97\u53d8\u6362\u77e9\u9635\u7684\u5927\u5c0f\u548c\u683c\u5f0f":6,"\u8ba1\u7b97\u5f53\u524d\u5c42\u6743\u91cd\u7684\u68af\u5ea6":6,"\u8ba1\u7b97\u6548\u7387\u66f4\u9ad8":12,"\u8ba1\u7b97\u6bcf\u4e2a\u8bcd\u7684\u8bcd\u5411\u91cf":42,"\u8ba1\u7b97\u6fc0\u6d3b\u51fd\u6570\u7684\u68af\u5ea6":6,"\u8ba1\u7b97\u7684\u7ec6\u8282\u5c06\u5728\u4e0b\u9762\u7684\u5c0f\u8282\u7ed9\u51fa":6,"\u8ba1\u7b97\u8282\u70b9":22,"\u8ba1\u7b97\u8282\u70b9\u4e4b\u95f4\u4e5f\u4e0d\u4f1a\u76f8\u4e92\u4f9d\u8d56":22,"\u8ba1\u7b97\u8f6c\u6362\u77e9\u9635\u548c\u8f93\u5165\u7684\u68af\u5ea6":6,"\u8ba1\u7b97\u8f93\u5165\u548c\u53c2\u6570\u7684\u68af\u5ea6":6,"\u8ba1\u7b97\u8f93\u5165\u5c42\u7684\u504f\u5dee":6,"\u8ba1\u7b97\u8f93\u51fa":6,"\u8ba1\u7b97\u96c6\u7fa4\u901a\u5e38\u7531\u4e00\u7ec4":31,"\u8bad\u7ec3":32,"\u8bad\u7ec3\u5931\u8d25\u65f6\u53ef\u4ee5\u68c0\u67e5\u9519\u8bef\u65e5\u5fd7":23,"\u8bad\u7ec3\u597d\u4e00\u4e2a\u6df1\u5c42\u795e\u7ecf\u7f51\u7edc\u901a\u5e38\u8981\u8017\u8d39\u975e\u5e38\u957f\u7684\u65f6\u95f4":37,"\u8bad\u7ec3\u597d\u7684\u6a21\u578b\u9ed8\u8ba4\u4fdd\u5b58\u5728\u5f53\u524d\u8fd0\u884c\u76ee\u5f55\u4e0b\u7684":20,"\u8bad\u7ec3\u6570\u636e\u6709\u95ee\u9898":11,"\u8bad\u7ec3\u6570\u636e\u683c\u5f0f\u548c\u8bad\u7ec3\u7a0b\u5e8f\u7684":21,"\u8bad\u7ec3\u65f6":27,"\u8bad\u7ec3\u6a21\u578b\u540e":42,"\u8bad\u7ec3\u7a0b\u5e8f":21,"\u8bad\u7ec3\u7b56\u7565\u7b49\u7b49\u8fd9\u4e9b\u90fd\u662f\u5e38\u89c1\u7684\u53d8\u5316\u56e0\u7d20":34,"\u8bad\u7ec3\u7ed3\u675f\u540e\u67e5\u770b\u8f93\u51fa\u7ed3\u679c":27,"\u8bad\u7ec3\u8282\u70b9\u6570\u91cf":27,"\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u8ddd\u79bb":11,"\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u53c2\u6570\u6216\u8005\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u68af\u5ea6\u5c3a\u5ea6\u8fc7\u5927":11,"\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6d4b\u8bd5test_period":32,"\u8bad\u7ec3\u8fc7\u7a0b\u662f\u5426\u4e3a\u672c\u5730\u6a21\u5f0f":33,"\u8bad\u7ec3\u8fc7\u7a0b\u662f\u5426\u4f7f\u7528gpu":33,"\u8bad\u7ec3\u914d\u7f6e\u4e2d\u7684\u8bbe\u5907\u5c5e\u6027\u5c06\u4f1a\u65e0\u6548":33,"\u8bad\u7ec3dot_period":32,"\u8bb0\u5fc6\u6a21\u5757":42,"\u8bbe\u4e3a\u5df2\u90e8\u7f72\u7684\u5de5\u4f5c\u7a7a\u95f4\u76ee\u5f55":23,"\u8bbe\u4e3a\u672c\u5730":23,"\u8bbe\u5b9a":12,"\u8bbe\u7f6e":[0,11,12,21],"\u8bbe\u7f6e\u4e3a":6,"\u8bbe\u7f6e\u4e3a\u4e0d\u540c\u7684\u503c":12,"\u8bbe\u7f6e\u4e3atrue\u4f7f\u7528\u672c\u5730\u8bad\u7ec3\u6216\u8005\u4f7f\u7528\u96c6\u7fa4\u4e0a\u7684\u4e00\u4e2a\u8282\u70b9":33,"\u8bbe\u7f6e\u4e3atrue\u4f7f\u7528gpu\u6a21\u5f0f":33,"\u8bbe\u7f6e\u4e86\u76f8\u540c\u7684\u53d6\u503c":12,"\u8bbe\u7f6e\u5176\u53c2\u6570\u5c5e\u6027":13,"\u8bbe\u7f6e\u53c2\u6570\u7684\u540d\u5b57":13,"\u8bbe\u7f6e\u547d\u4ee4\u884c\u53c2\u6570":11,"\u8bbe\u7f6e\u5b66\u4e60\u7387\u8870\u51cf\u56e0\u5b50\u5206\u6bb5\u51fd\u6570":13,"\u8bbe\u7f6e\u5e8f\u5217\u4fe1\u606f\u7684\u63a5\u53e3":19,"\u8bbe\u7f6e\u6210":13,"\u8bbe\u7f6e\u6210\u4e00\u4e2a\u5c0f\u4e00\u4e9b\u7684\u503c":11,"\u8bbe\u7f6e\u8f93\u51fa\u7684\u5c3a\u5bf8":6,"\u8bbe\u7f6e\u9ed8\u8ba4\u8bbe\u5907\u53f7\u4e3a0":35,"\u8bbe\u7f6egpu":33,"\u8bbf\u95eekubernetes\u7684\u63a5\u53e3\u6765\u67e5\u8be2\u6b64job\u5bf9\u5e94\u7684\u6240\u6709pod\u4fe1\u606f":27,"\u8bc4\u5ba1\u4eba\u4e00\u822c\u4e0d\u505a\u8bc4\u5ba1":4,"\u8bc4\u5ba1\u4eba\u7684\u6bcf\u4e2a\u610f\u89c1\u90fd\u5fc5\u987b\u56de\u590d":4,"\u8bc4\u5ba1\u4eba\u9700\u8981\u9010\u4e00\u67e5\u770b\u6bcf\u4e2acommit\u624d\u80fd\u77e5\u9053\u505a\u4e86\u54ea\u4e9b\u4fee\u6539":4,"\u8bc4\u8bba\u6846\u4e2d\u52a0\u4e0a":4,"\u8bd5\u7740\u8ba9\u8f93\u51fa\u7684\u5206\u6790\u6570\u636e\u548c\u7406\u8bba\u503c\u5bf9\u5e94":37,"\u8be5\u53c2\u6570\u5728\u7f51\u7edc\u914d\u7f6e\u7684output":33,"\u8be5\u53c2\u6570\u5728\u96c6\u7fa4\u63d0\u4ea4\u73af\u5883\u4e2d\u81ea\u52a8\u8bbe\u7f6e":33,"\u8be5\u53c2\u6570\u5df2\u7ecf\u5728\u96c6\u7fa4\u63d0\u4ea4\u73af\u5883\u4e2d\u5b8c\u6210\u8bbe\u7f6e":33,"\u8be5\u53c2\u6570\u5fc5\u987b\u80fd\u88abflag":33,"\u8be5\u53c2\u6570\u6307\u793a\u662f\u5426\u6253\u5370\u65e5\u5fd7\u622a\u65ad\u4fe1\u606f":33,"\u8be5\u53c2\u6570\u6307\u793a\u662f\u5426\u6253\u5370\u9519\u8bef\u622a\u65ad\u65e5\u5fd7":33,"\u8be5\u53c2\u6570\u7528\u4e8e\u6307\u5b9a\u52a8\u6001\u5e93\u8def\u5f84":33,"\u8be5\u53c2\u6570\u7684\u610f\u601d\u662f\u8bad\u7ec3num":33,"\u8be5\u53c2\u6570\u9ed8\u8ba4\u4e3anull":33,"\u8be5\u5c42\u4ec5\u9700\u8981\u8fd9\u4e9b\u975e\u96f6\u6837\u672c\u4f4d\u7f6e\u6240\u5bf9\u5e94\u7684\u53d8\u6362\u77e9\u9635\u7684\u90a3\u4e9b\u884c":6,"\u8be5\u622a\u65ad\u4f1a\u5f71\u54cd":33,"\u8be5\u6279\u6b21\u7684\u8f93\u5165\u4e2d\u4ec5\u6709\u4e00\u4e2a\u5b50\u96c6\u662f\u975e\u96f6\u7684":6,"\u8be5\u63a5\u53e3\u53ef\u7528\u4e8e\u9884\u6d4b\u548c\u5b9a\u5236\u5316\u8bad\u7ec3":0,"\u8be5\u63a5\u53e3\u63a5\u53d7\u4e24\u4e2a\u53c2\u6570":20,"\u8be5\u6570\u76ee\u662f\u63d0\u524d\u5b9a\u4e49\u597d\u7684":33,"\u8be5\u65b9\u5f0f\u5177\u6709\u4ee5\u4e0b\u4f18\u52bf":2,"\u8be5\u6a21\u578b\u7684\u8bf4\u660e\u5982\u4e0b\u56fe\u6240\u793a":42,"\u8be5\u7c7b\u7684\u5b9e\u73b0\u7ec6\u8282\u5728":6,"\u8be5\u8bed\u53e5\u4f1a\u4e3a\u6bcf\u4e2a\u5c42\u521d\u59cb\u5316\u5176\u6240\u9700\u8981\u7684\u53d8\u91cf\u548c\u8fde\u63a5":6,"\u8be5layer\u662f\u901a\u8fc7\u53c2\u6570":12,"\u8be6\u7ec6\u4ecb\u7ecd\u53ef\u4ee5\u53c2\u8003":39,"\u8be6\u7ec6\u4fe1\u606f\u8bf7\u68c0\u67e5":23,"\u8be6\u7ec6\u53c2\u8003":0,"\u8be6\u7ec6\u53ef\u53c2\u8003":4,"\u8be6\u7ec6\u6587\u6863\u53c2\u8003":11,"\u8be6\u7ec6\u7684cmake\u4f7f\u7528\u65b9\u6cd5\u53ef\u4ee5\u53c2\u8003":0,"\u8be6\u7ec6\u89c1":38,"\u8be6\u7ec6\u89e3\u91ca\u8fd9\u4e9b\u53c2\u6570\u7684\u5c5e\u6027\u548c\u610f\u4e49":34,"\u8be6\u7ec6\u8bf7\u4e86\u89e3":24,"\u8bf4\u660e":[0,3,19],"\u8bf4\u660e\u63d0\u4ea4\u7684\u4ee3\u7801\u5b58\u5728\u95ee\u9898":4,"\u8bf4\u660e\u8fd9\u4e2a\u5c42\u7684\u8f93\u5165":6,"\u8bf7\u4e0d\u8981\u5fd8\u8bb0\u63d0\u524d\u5728\u7269\u7406\u673a\u4e0a\u5b89\u88c5gpu\u6700\u65b0\u9a71\u52a8":1,"\u8bf7\u4fdd\u8bc1travi":4,"\u8bf7\u5148\u5c1d\u8bd5\u5728\u4e0b\u9762\u7684\u9875\u9762\u5bfb\u627e\u7b54\u6848":2,"\u8bf7\u53c2\u7167\u4ee5\u4e0b\u6559\u7a0b":2,"\u8bf7\u53c2\u7167\u7f51\u7edc\u914d\u7f6e\u7684\u6587\u6863\u4e86\u89e3\u66f4\u8be6\u7ec6\u7684\u4fe1\u606f":35,"\u8bf7\u53c2\u8003":[6,8,11,14,20,39],"\u8bf7\u53c2\u8003\u6b64":20,"\u8bf7\u53c2\u89c1":4,"\u8bf7\u53c2\u9605":42,"\u8bf7\u5728\u8be5pull":4,"\u8bf7\u5728\u8f93\u5165\u65f6\u8fdb\u884c\u5408\u6cd5\u6027\u68c0\u67e5":19,"\u8bf7\u60a8\u5230":10,"\u8bf7\u60a8\u6bcf\u6b21\u63d0\u4ea4\u4ee3\u7801\u65f6":4,"\u8bf7\u60a8\u9075\u5b88\u4ee5\u4e0b\u7ea6\u5b9a":4,"\u8bf7\u6307\u5b9a\u7684paddlepaddle\u5de5\u4f5c\u76ee\u5f55\u7ed9\u73af\u5883\u53d8\u91cf":7,"\u8bf7\u6307\u5b9a\u8be5\u76ee\u5f55":33,"\u8bf7\u6839\u636e\u673a\u5668\u914d\u7f6e\u548c\u7cfb\u7edf\u9009\u62e9\u5bf9\u5e94\u7684\u5b89\u88c5\u5305":2,"\u8bf7\u68c0\u67e5python\u7248\u672c\u662f\u5426\u4e3a2":3,"\u8bf7\u6ce8\u610f":[26,42],"\u8bf7\u6ce8\u610f\u662f\u5426\u9700\u8981\u4fee\u6539\u7f51\u7edc\u7ed3\u6784":20,"\u8bf7\u6ce8\u610f\u6bcf\u4e2acommit\u7684\u540d\u79f0":4,"\u8bf7\u6ce8\u610fcommit\u7684\u6570\u91cf":4,"\u8bf7\u76f4\u63a5\u586b\u51450":13,"\u8bf7\u770b\u4e0b\u9762\u7684\u4f8b\u5b50":35,"\u8bf7\u786e\u4fdd":4,"\u8bf7\u7ed9\u51fa\u603b\u4f53\u7684\u4fee\u6539\u60c5\u51b5":4,"\u8bf7\u7ed9\u51fa\u60a8\u81ea\u5df1\u7684\u53cd\u9a73\u7406\u7531":4,"\u8bf7\u9009\u62e9\u5408\u9002\u7684\u8bcd\u6c47":4,"\u8bf7\u9009\u62e9\u6b63\u786e\u7684\u7248\u672c":8,"\u8bf7\u9075\u5b88":4,"\u8bf7\u91c7\u7528":4,"\u8bf7\u9605\u8bfb\u4ee5\u4e0b\u6307\u5357":24,"\u8bf8\u5982\u56fe\u50cf\u5206\u7c7b":35,"\u8bfb\u53d6\u9700\u8981\u7684\u7ed3\u679c\u5373\u53ef":19,"\u8bfb\u53d6volume\u4e2d\u7684\u6570\u636e\u8fdb\u884c\u8fd9\u6b21\u5206\u5e03\u5f0f\u8bad\u7ec3":27,"\u8bfb\u8005\u53ef\u4ee5\u67e5\u770b":27,"\u8bfb\u8005\u9700\u8981\u66ff\u6362\u6210\u81ea\u5df1\u4f7f\u7528\u7684\u4ed3\u5e93\u5730\u5740":27,"\u8c03\u7528":[6,20],"\u8c03\u7528\u7684\u4e00\u4e9b\u7528\u6237\u5b9a\u4e49\u7684\u5e93\u51fd\u6570":21,"\u8c03\u7528\u8be5\u51fd\u6570\u540e":6,"\u8c03\u7528c":[19,20],"\u8d21\u732e\u6587\u6863":7,"\u8d77":39,"\u8d77\u59cb\u5b58\u50a8\u5730\u5740\u4ee5\u6570\u636e\u7684\u5b58\u50a8\u5927\u5c0f\u4e3a\u5355\u4f4d\u7684\u504f\u79fb":19,"\u8df3\u8f6c\u5230":4,"\u8df3\u8fc7":11,"\u8f6c\u5316\u65b9\u6cd5\u5728\u76f8\u5e94\u7684\u9886\u57df\u90fd\u6709\u901a\u7528\u89e3\u51b3\u65b9\u6848":19,"\u8f83":39,"\u8f93\u5165":[18,20,38,42],"\u8f93\u5165\u548c\u8f93\u51fa\u90fd\u662f\u5355\u5c42\u5e8f\u5217":41,"\u8f93\u5165\u548c\u8f93\u51fa\u90fd\u662f\u53cc\u5c42\u5e8f\u5217":41,"\u8f93\u5165\u5e8f\u5217\u4e2d\u5143\u7d20\u7684\u603b\u6570":11,"\u8f93\u5165\u6570\u636e\u4e3a\u4e00\u4e2a\u5b8c\u6574\u7684\u65f6\u95f4\u5e8f\u5217":39,"\u8f93\u5165\u6570\u636e\u4e3a\u5728\u5355\u5c42rnn\u6570\u636e\u91cc\u9762":39,"\u8f93\u5165\u6570\u636e\u53ef\u5206\u4e3a":19,"\u8f93\u5165\u6570\u636e\u6574\u4f53\u4e0a\u662f\u4e00\u4e2a\u65f6\u95f4\u5e8f\u5217":39,"\u8f93\u5165\u6570\u636e\u7684\u5b57\u5178\u7ef4\u6570\u662f1\u767e\u4e07":35,"\u8f93\u5165\u6570\u636e\u7c7b\u578b":19,"\u8f93\u5165\u662f\u5426\u662f\u8f6c\u7f6e\u7684":6,"\u8f93\u5165\u662f\u7531\u4e00\u4e2alist\u4e2d\u7684\u7f51\u7edc\u5c42\u5b9e\u4f8b\u7684\u540d\u5b57\u7ec4\u6210\u7684":6,"\u8f93\u5165\u7684\u540d\u5b57":6,"\u8f93\u5165\u7684\u5927\u5c0f":6,"\u8f93\u5165\u7684\u7c7b\u578b":6,"\u8f93\u5165\u9700\u8981\u9884\u6d4b\u7684\u5411\u91cf\u7ec4":14,"\u8f93\u51fa":[20,38,42],"\u8f93\u51fa\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":41,"\u8f93\u51fa\u4e00\u4e2a\u53cc\u5c42\u5e8f\u5217":41,"\u8f93\u51fa\u4fe1\u606f\u6709\u673a\u5730\u7ec4\u7ec7\u5728\u4e00\u8d77":19,"\u8f93\u51fa\u51fd\u6570":42,"\u8f93\u51fa\u521b\u5efa":[19,20],"\u8f93\u51fa\u5e8f\u5217\u7684\u7c7b\u578b":38,"\u8f93\u51fa\u5e8f\u5217\u7684\u8bcd\u8bed\u6570\u548c\u8f93\u5165\u5e8f\u5217\u4e00\u81f4":41,"\u8f93\u51fa\u6240\u643a\u5e26\u7684\u5e8f\u5217\u4fe1\u606f":19,"\u8f93\u51fa\u6570\u636e\u662f\u5728\u4e0a\u6587\u4ecb\u7ecd\u7684":19,"\u8f93\u51fa\u6570\u636e\u6709\u673a\u5730\u7ec4\u7ec7\u5728\u4e00\u8d77":20,"\u8f93\u51fa\u6570\u636e\u7ec4\u7ec7":[18,20],"\u8f93\u51fa\u7531":19,"\u8f93\u51fa\u7684\u5e8f\u5217\u4fe1\u606f":19,"\u8f93\u51fa\u7684\u68af\u5ea6":33,"\u8f93\u51fa\u7ed3\u679c\u53ef\u80fd\u4f1a\u968f\u7740\u5bb9\u5668\u7684\u6d88\u8017\u800c\u88ab\u5220\u9664":26,"\u8f93\u51fa\u88ab\u7ec4\u7ec7\u4e3a":19,"\u8f93\u51fa\u88ab\u7ec4\u7ec7\u4e3a\u4e00\u4e2a":19,"\u8f93\u51fa\u90fd\u4f1a\u5bf9\u5e94\u6709\u81ea\u5df1\u7684":[19,20],"\u8fc7\u4e86\u4e00\u4e2a\u5f88\u7b80\u5355\u7684recurrent_group":39,"\u8fc7\u5b8c\u6240\u6709\u8bad\u7ec3\u6570\u636e\u5373\u4e3a\u4e00\u4e2apass":11,"\u8fd0\u884c":20,"\u8fd0\u884c\u4e00\u4e2a":0,"\u8fd0\u884c\u5931\u8d25":35,"\u8fd0\u884c\u5b8c\u6210\u540e":23,"\u8fd0\u884c\u65e5\u5fd7":23,"\u8fd0\u884c\u65f6\u4f1a\u81ea\u52a8\u627e\u5230\u7cfb\u7edf\u4e2d\u5b89\u88c5\u7684cuda\u548ccudnn\u5e93\u8fdb\u884c\u7f16\u8bd1\u548c\u6267\u884c":0,"\u8fd0\u884c\u65f6c":20,"\u8fd0\u884c\u73af\u5883":34,"\u8fd0\u884c\u7684\u4e00\u4e9b\u53c2\u6570\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u4f20\u9012\u5230\u5bb9\u5668\u5185":27,"\u8fd0\u884c\u9636\u6bb5":34,"\u8fd1":39,"\u8fd4\u56de\u7684\u662f":14,"\u8fd4\u56de\u7b2ci\u4e2a\u8f93\u5165\u77e9\u9635":6,"\u8fd8\u4f1a":39,"\u8fd8\u4f1a\u4e0b\u8f7dmkl":0,"\u8fd8\u4f1a\u8f93\u51fa\u4e00\u4e2a":4,"\u8fd8\u53ef\u4ee5\u901a\u8fc7\u51cf\u5c0f\u5b66\u4e60\u7387\u6216\u8005\u5bf9\u6570\u636e\u8fdb\u884c\u5f52\u4e00\u5316\u5904\u7406\u6765\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898":11,"\u8fd8\u662f":39,"\u8fd8\u662f\u865a\u62df\u673a":0,"\u8fd8\u6709":39,"\u8fd8\u9700\u8981\u5728\u8282\u70b9\u4e0a\u5b89\u88c5\u5bf9\u5e94\u7684gpu\u9a71\u52a8\u4ee5\u53cacuda":31,"\u8fd9":[11,39],"\u8fd98\u79cdlearning_rate_schedule\u53ca\u5176\u5bf9\u5e94\u5b66\u4e60\u7387\u8ba1\u7b97\u65b9\u5f0f\u5982\u4e0b":13,"\u8fd9\u4e00\u4e2a\u5e93":17,"\u8fd9\u4e00\u5757\u7684\u8017\u65f6\u6bd4\u4f8b\u771f\u7684\u592a\u9ad8":37,"\u8fd9\u4e00\u8282\u5bf9\u56fe1\u4e2d\u9884\u6d4b\u4ee3\u7801\u7f16\u5199\u76845\u4e2a\u6b65\u9aa4\u8fdb\u884c\u4ecb\u7ecd\u548c\u8bf4\u660e":20,"\u8fd9\u4e00\u8ba1\u7b97\u5355\u5143":12,"\u8fd9\u4e00\u8fc7\u7a0b\u5bf9\u7528\u6237\u662f\u5b8c\u5168\u900f\u660e\u7684":41,"\u8fd9\u4e2a":[3,39],"\u8fd9\u4e2a\u4efb\u52a1\u7684\u914d\u7f6e\u4e3a":11,"\u8fd9\u4e2a\u4efb\u52a1\u7684dataprovider\u4e3a":11,"\u8fd9\u4e2a\u51fd\u6570\u7684":42,"\u8fd9\u4e2a\u51fd\u6570\u8fdb\u884c\u53d8\u6362":39,"\u8fd9\u4e2a\u51fd\u6570\u9700\u8981\u8bbe\u7f6e":42,"\u8fd9\u4e2a\u5730\u5740\u6765\u8868\u793a\u6b64\u6b65\u9aa4\u6240\u6784\u5efa\u51fa\u7684\u955c\u50cf":27,"\u8fd9\u4e2a\u57fa\u7c7b":6,"\u8fd9\u4e2a\u5e8f\u5217\u7684\u6bcf\u4e2a\u5143\u7d20\u53c8\u662f\u4e00\u4e2a\u5e8f\u5217":41,"\u8fd9\u4e2a\u60c5\u51b5\u4e0b\u6240\u6709\u7684\u6587\u4ef6\u4f1a\u5b58\u5728\u6574\u7406\u8fc7\u7684\u7684\u6587\u4ef6\u76ee\u5f55":7,"\u8fd9\u4e2a\u6570\u636e\u4e5f\u88ab\u5355\u5c42rnn\u7f51\u7edc\u76f4\u63a5\u4f7f\u7528":39,"\u8fd9\u4e2a\u662f\u76ee\u524d\u63a8\u8350\u7684\u4f7f\u7528\u65b9\u6cd5":7,"\u8fd9\u4e2a\u793a\u4f8b":20,"\u8fd9\u4e2a\u795e\u7ecf\u7f51\u7edc\u5355\u5143\u5c31\u53ebmemori":39,"\u8fd9\u4e2a\u7c7b\u7684\u53c2\u6570\u5305\u62ec":6,"\u8fd9\u4e2a\u7c7b\u9700\u8981\u7ee7\u627f":6,"\u8fd9\u4e2a\u811a\u672c\u8c03\u7528":0,"\u8fd9\u4e2a\u8fc7\u7a0b\u5bf9\u7528\u6237\u4e5f\u662f\u900f\u660e\u7684":41,"\u8fd9\u4e2a\u8fc7\u7a0b\u9664\u4e86\u7f16\u8bd1paddlepaddle\u4e3a":4,"\u8fd9\u4e2a\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u751f\u6210\u4e00\u7cfb\u5217\u6743\u91cd":42,"\u8fd9\u4e2aissu":0,"\u8fd9\u4e2ajob\u624d\u7b97\u6210\u529f\u7ed3\u675f":27,"\u8fd9\u4e2alayer\u7684\u8f93\u51fa\u4f1a\u4f5c\u4e3a\u6574\u4e2a":41,"\u8fd9\u4e5f\u4f1a\u6781\u5927\u51cf\u5c11\u6570\u636e\u8bfb\u5165\u7684\u8017\u65f6":11,"\u8fd9\u4e9b\u53c2\u6570\u53ef\u4ee5\u901a\u8fc7":21,"\u8fd9\u4e9b\u53c2\u6570\u7684\u5177\u4f53\u63cf\u8ff0":27,"\u8fd9\u4e9b\u540d\u5b57\u5fc5\u987b\u8981\u5199\u5bf9":6,"\u8fd9\u4e9b\u6570\u636e\u4f1a\u88ab\u7528\u6765\u66f4\u65b0\u53c2\u6570":11,"\u8fd9\u4e9b\u6570\u636e\u4f7f\u7528\u7684\u5185\u5b58\u4e3b\u8981\u548c\u4e24\u4e2a\u53c2\u6570\u6709\u5173\u7cfb":11,"\u8fd9\u4e9b\u7279\u5f81\u6570\u636e\u4e4b\u95f4\u7684\u987a\u5e8f\u662f\u6709\u610f\u4e49\u7684":39,"\u8fd9\u4efd\u6559\u7a0b\u5c55\u793a\u4e86\u5982\u4f55\u5728paddlepaddle\u4e2d\u5b9e\u73b0\u4e00\u4e2a\u81ea\u5b9a\u4e49\u7684\u7f51\u7edc\u5c42":6,"\u8fd9\u4f1a\u63d0\u793a\u5f53\u524d\u76ee\u5f55\u7684\u4e00\u4e9b\u53d8\u5316":4,"\u8fd9\u4f1a\u7ed9\u8bc4\u5ba1\u4eba\u5e26\u6765\u5f88\u5927\u56f0\u6270":4,"\u8fd9\u4f1a\u81ea\u52a8\u8fdb\u884c\u7f51\u7edc\u914d\u7f6e\u4e2d\u58f0\u660e\u7684\u6fc0\u6d3b\u64cd\u4f5c":6,"\u8fd9\u4fbf\u662f\u4e00\u79cd\u53cc\u5c42rnn\u7684\u8f93\u5165\u6570\u636e":39,"\u8fd9\u51e0\u4e2a\u7f16\u8bd1\u9009\u9879\u7684\u8bbe\u7f6e":0,"\u8fd9\u53ef\u4ee5\u5e2e\u60a8\u7701\u6389\u82b1\u4e00\u5c0f\u65f6\u5b89\u88c5\u548c\u914d\u7f6e\u5404\u79cd\u5f00\u53d1\u5de5\u5177":0,"\u8fd9\u53ef\u4ee5\u8ba9\u5176\u4ed6\u4eba\u77e5\u9053\u8fd9\u6b21\u63d0\u4ea4\u505a\u4e86\u54ea\u4e9b\u6539\u53d8":4,"\u8fd9\u53ef\u4ee5\u901a\u8fc7":4,"\u8fd9\u548c\u5355\u5c42rnn\u7684\u914d\u7f6e\u662f\u7b49\u4ef7\u7684":39,"\u8fd9\u56db\u4e2a\u5e8f\u5217\u53c8\u5206\u522b\u542b\u67093":19,"\u8fd9\u56db\u6761\u6570\u636e\u540c\u65f6\u5904\u7406\u7684\u53e5\u5b50\u6570\u91cf\u4e3a":39,"\u8fd9\u5728\u6784\u9020\u975e\u5e38\u590d\u6742\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u65f6\u662f\u6709\u7528\u7684":42,"\u8fd9\u610f\u5473\u7740":42,"\u8fd9\u610f\u5473\u7740\u9664\u4e86\u6307\u5b9adevic":35,"\u8fd9\u65f6":[11,20],"\u8fd9\u65f6\u5728\u4f7f\u7528":13,"\u8fd9\u65f6\u7684":19,"\u8fd9\u65f6\u7684\u9700\u8981\u540c\u65f6\u63d0\u4f9b":19,"\u8fd9\u65f6\u884c\u504f\u79fb\u548c\u5217\u53f7\u6307\u5b9a\u7684\u5143\u7d20\u9ed8\u8ba4\u5176\u503c\u4e3a1":19,"\u8fd9\u65f6\u8fdb\u884c\u77e9\u9635\u4e58\u6cd5\u8fd0\u7b97\u5c31\u53ef\u80fd\u5bfc\u81f4\u6d6e\u70b9\u6570\u6ea2\u51fa":11,"\u8fd9\u65f6\u9700\u8981\u8c03\u7528\u521b\u5efa\u5e8f\u5217\u4fe1\u606f\u548c\u4e3a":19,"\u8fd9\u662f\u4e00\u79cd\u6309\u5df2\u8bad\u7ec3\u6837\u672c\u6570\u5206\u6bb5\u53d6\u503c\u7684\u5b66\u4e60\u7387\u9000\u706b\u65b9\u6cd5":13,"\u8fd9\u662f\u4e00\u79cd\u6309\u5df2\u8bad\u7ec3pass\u6570\u5206\u6bb5\u53d6\u503c\u7684\u5b66\u4e60\u7387\u9000\u706b\u65b9\u6cd5":13,"\u8fd9\u662f\u4e00\u79cd\u975e\u5e38\u7075\u6d3b\u7684\u6570\u636e\u7ec4\u7ec7\u65b9\u5f0f":38,"\u8fd9\u662f\u5f00\u6e90\u793e\u533a\u7684\u57fa\u672c\u793c\u8c8c":4,"\u8fd9\u662f\u666e\u901a\u7684\u5355\u5c42\u65f6\u95f4\u5e8f\u5217\u7684dataprovider\u4ee3\u7801":39,"\u8fd9\u662f\u6700\u4fbf\u6377\u7684\u5b89\u88c5\u65b9\u5f0f":2,"\u8fd9\u662f\u76ee\u524dcmake\u5bfb\u627epython\u7684\u903b\u8f91\u5b58\u5728\u7f3a\u9677":8,"\u8fd9\u6837":22,"\u8fd9\u6837\u4fdd\u5b58\u5728\u5206\u5e03\u5f0f\u5b58\u50a8\u4e2d\u7684\u6570\u636e\u53ef\u4ee5\u88ab\u96c6\u7fa4\u4e2d\u7684\u6bcf\u4e2a\u8282\u70b9\u8bfb\u53d6\u5230":21,"\u8fd9\u6837\u4fdd\u8bc1\u8fd0\u884c\u7ed3\u675f\u4e4b\u540e\u7684":0,"\u8fd9\u6837\u505a\u53ef\u4ee5\u6781\u5927\u7684\u51cf\u5c11\u5185\u5b58\u5360\u7528":11,"\u8fd9\u6837\u53ef\u4ee5\u514d\u53bb\u5355\u72ec\u5b89\u88c5\u7f16\u8bd1\u4f9d\u8d56\u7684\u6b65\u9aa4":0,"\u8fd9\u6837\u53ef\u4ee5\u51cf\u5c0fgpu\u5185\u5b58":35,"\u8fd9\u6837\u5982\u679c\u9047\u5230\u95ee\u9898":0,"\u8fd9\u6837\u5bb9\u5668\u7684":27,"\u8fd9\u6837\u5f53\u8be5pull":4,"\u8fd9\u6837\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u8ba1\u7b97\u7684\u5e76\u884c\u6027":22,"\u8fd9\u6837\u7684\u88c5\u9970\u5668":6,"\u8fd9\u6837\u7684\u8bdd":26,"\u8fd9\u6837\u8bad\u7ec3\u6587\u4ef6\u7684\u4e2a\u6570\u4f1a\u6bd4\u8f83\u591a":21,"\u8fd9\u6b63\u662f\u5b83\u4eec\u901f\u5ea6\u5feb\u7684\u539f\u56e0":37,"\u8fd9\u7528\u4e8e\u5728\u591a\u7ebf\u7a0b\u548c\u591a\u673a\u4e0a\u66f4\u65b0\u53c2\u6570":6,"\u8fd9\u79cd\u521d\u59cb\u5316\u65b9\u5f0f\u5728\u4e00\u822c\u60c5\u51b5\u4e0b\u4e0d\u4f1a\u4ea7\u751f\u5f88\u5dee\u7684\u7ed3\u679c":13,"\u8fd9\u79cd\u5b89\u88c5\u65b9\u5f0f\u4f1a\u6d89\u53ca\u5230\u4e00\u4e9b\u7b2c\u4e09\u65b9\u5e93\u7684\u4e0b\u8f7d":2,"\u8fd9\u79cd\u60c5\u51b5\u4e0b\u4e0d\u9700\u8981\u91cd\u5199\u8be5\u51fd\u6570":6,"\u8fd9\u79cd\u60c5\u51b5\u591a\u51fa\u73b0\u5728\u4f7f\u7528\u591a\u7ebf\u7a0b\u9884\u6d4b\u65f6":20,"\u8fd9\u79cd\u60c5\u51b5\u5e38\u5e38\u53d1\u751f\u5728":11,"\u8fd9\u79cd\u65b9\u5f0f\u5bf9\u5185\u5b58\u6d88\u8017\u8f83\u5927":12,"\u8fd9\u79cd\u65b9\u5f0f\u5fc5\u987b\u4f7f\u7528paddle\u5b58\u50a8\u7684\u6a21\u578b\u8def\u5f84\u683c\u5f0f":35,"\u8fd9\u79cd\u65b9\u5f0f\u6700\u4e3a\u7b80\u4fbf":17,"\u8fd9\u79cd\u751f\u6210\u6280\u672f\u53ea\u7528\u4e8e\u7c7b\u4f3c\u89e3\u7801\u5668\u7684\u751f\u6210\u8fc7\u7a0b":42,"\u8fd9\u79cd\u7c7b\u578b\u7684\u8f93\u5165\u5fc5\u987b\u901a\u8fc7":41,"\u8fd9\u79cd\u94fe\u63a5\u65b9\u5f0f\u4e3b\u8981\u7528\u4e8e\u79fb\u52a8\u7aef\u9884\u6d4b":17,"\u8fd9\u79cd\u96c6\u7fa4\u8282\u70b9\u7ba1\u7406\u65b9\u5f0f\u4f1a\u5728\u5c06\u6765\u4f7f\u7528":27,"\u8fd9\u7bc7":1,"\u8fd9\u7bc7\u6587\u6863":4,"\u8fd9\u7bc7\u6587\u6863\u4e4b\u540e\u90e8\u5206\u4f1a\u4f7f\u7528":20,"\u8fd9\u7bc7\u6587\u6863\u4e4b\u540e\u90e8\u5206\u4f1a\u7edf\u4e00\u4f7f\u7528":19,"\u8fd9\u7bc7\u6587\u6863\u4e4b\u540e\u90e8\u5206\u5c06\u4f1a\u7edf\u4e00\u4f7f\u7528":19,"\u8fd9\u7bc7\u6587\u6863\u4ecb\u7ecd":20,"\u8fd9\u7bc7\u6587\u6863\u4ecb\u7ecd\u5728\u4f7f\u7528":19,"\u8fd9\u7bc7\u6587\u6863\u4ecb\u7ecd\u57fa\u4e8e":0,"\u8fd9\u7bc7\u6587\u6863\u7684\u4e4b\u540e\u90e8\u5206\u4f1a\u4f7f\u7528":20,"\u8fd9\u7bc7\u6587\u7ae0":0,"\u8fd9\u7ec4\u8bed\u4e49\u76f8\u540c\u7684\u793a\u4f8b\u914d\u7f6e\u5982\u4e0b":39,"\u8fd9\u901a\u8fc7\u83b7\u5f97\u53cd\u5411\u5faa\u73af\u7f51\u7edc\u7684\u7b2c\u4e00\u4e2a\u5b9e\u4f8b":42,"\u8fd9\u91cc":[0,4,13,27,42],"\u8fd9\u91cc\u4ecb\u7ecdc":20,"\u8fd9\u91cc\u4f7f\u7528\u4e86paddlepaddle\u9884\u5b9a\u4e49\u597d\u7684rnn\u5904\u7406\u51fd\u6570":39,"\u8fd9\u91cc\u4f7f\u7528\u7b80\u5355\u7684":11,"\u8fd9\u91cc\u5c06\u4ecb\u7ecdpaddlepaddle\u7684\u57fa\u672c\u4f7f\u7528\u6982\u5ff5":14,"\u8fd9\u91cc\u6211\u4eec\u5c55\u793a\u4e00\u4efd\u7b80\u5316\u8fc7\u7684\u4ee3\u7801":6,"\u8fd9\u91cc\u6211\u4eec\u901a\u8fc7\u5728kubernetes\u96c6\u7fa4\u4e0a\u542f\u52a8\u4e00\u4e2ajob\u6765\u4e0b\u8f7d\u5e76\u5207\u5272\u6570\u636e":27,"\u8fd9\u91cc\u6709\u4e24\u79cd\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6cd5":11,"\u8fd9\u91cc\u68c0\u9a8c\u8fd0\u884c\u65f6\u95f4\u6a21\u578b\u7684\u6536\u655b":23,"\u8fdb\u4e3b\u4ed3\u5e93\u540e":4,"\u8fdb\u5165\u5bb9\u5668":26,"\u8fdb\u5165\u5bf9\u5e94\u7684\u76ee\u5f55":8,"\u8fdb\u7a0b\u542f\u52a8\u7684\u5fc5\u8981\u53c2\u6570":27,"\u8fdb\u7a0b\u7684":23,"\u8fdb\u7a0b\u7684\u542f\u52a8\u53c2\u6570":27,"\u8fdb\u7a0b\u7684\u8fd0\u884c\u73af\u5883":27,"\u8fdb\u7a0b\u9700\u8981\u7684":27,"\u8fdb\u884c\u4e86":39,"\u8fdb\u884c\u5206\u5e03\u5f0f\u8bad\u7ec3\u7684\u65b9\u6848":27,"\u8fdb\u884c\u5206\u5e03\u5f0f\u8bad\u7ec3\u7684\u65b9\u6cd5":27,"\u8fdb\u884c\u524d\u5411\u8ba1\u7b97":20,"\u8fdb\u884c\u56de\u590d":4,"\u8fdb\u884c\u5e8f\u5217\u5316":20,"\u8fdb\u884c\u5f00\u53d1":4,"\u8fdb\u884c\u62c6\u89e3":39,"\u8fdb\u884c\u6fc0\u6d3b\u64cd\u4f5c":6,"\u8fdb\u884c\u8bad\u7ec3":20,"\u8fdb\u884c\u90e8\u7f72":24,"\u8fdb\u884c\u94fe\u63a5":17,"\u8fdb\u884c\u9884\u6d4b\u4f9d\u8d56\u4e8e\u5c06":17,"\u8fdb\u884c\u9884\u6d4b\u65f6":20,"\u8fdb\u9636\u4f7f\u7528":43,"\u8fdb\u9636\u6307\u5357":14,"\u8fde\u63a5":41,"\u8fde\u63a5\u5230pserver\u7684\u7aef\u53e3":21,"\u8fde\u63a5\u5230pserver\u7684\u7aef\u53e3\u4e2a\u6570":21,"\u9000\u51fa\u5bb9\u5668":26,"\u9002\u4e2d":39,"\u9009":39,"\u9009\u62e9":39,"\u9009\u62e9\u4e0b\u8f7d\u4f7f\u7528\u4e0d\u540c\u7684blas\u5e93\u7684docker\u955c\u50cf":1,"\u9009\u62e9\u76ee\u6807\u5206\u652f":4,"\u9009\u62e9\u8def\u5f84\u6765\u52a8\u6001\u52a0\u8f7dnvidia":33,"\u9009\u9879":0,"\u9012\u5f52\u795e\u7ecf\u7f51\u7edc":32,"\u901a\u5e38\u4f1a\u4f7f\u7528\u73af\u5883\u53d8\u91cf\u914d\u7f6ejob\u7684\u914d\u7f6e\u4fe1\u606f":27,"\u901a\u5e38\u4f1a\u4f7f\u7528mapreduce\u4efb\u52a1\u7684\u8f93\u51fa\u7ed3\u679c\u4f5c\u4e3a\u8bad\u7ec3\u7ed3\u679c":21,"\u901a\u5e38\u4f7f\u7528\u7a00\u758f\u8bad\u7ec3\u6765\u52a0\u901f\u8ba1\u7b97\u8fc7\u7a0b":35,"\u901a\u5e38\u4f7f\u7528cento":3,"\u901a\u5e38\u505a\u6cd5\u662f\u4ece\u4e00\u4e2a\u6bd4\u8f83\u5927\u7684learning_rate\u5f00\u59cb\u8bd5":13,"\u901a\u5e38\u540d\u5b57\u662f":4,"\u901a\u5e38\u60c5\u51b5\u4e0b":37,"\u901a\u5e38\u6211\u4eec\u4f1a\u5b89\u88c5ceph\u7b49\u5206\u5e03\u5f0f\u6587\u4ef6\u7cfb\u7edf\u6765\u5b58\u50a8\u8bad\u7ec3\u6570\u636e":26,"\u901a\u5e38\u7528\u4e8e\u8868\u793a\u79bb\u6563\u7684\u7c7b\u522b\u6807\u7b7e":19,"\u901a\u5e38\u7684\u505a\u6cd5\u662f\u4f7f\u7528":42,"\u901a\u5e38\u7684\u505a\u6cd5\u662f\u5c06\u914d\u7f6e\u5b58\u4e8e":6,"\u901a\u5e38\u8981\u6c42\u65f6\u95f4\u6b65\u4e4b\u95f4\u5177\u6709\u4e00\u4e9b\u4f9d\u8d56\u6027":39,"\u901a\u5e38\u90fd\u4f1a\u4f7f\u7528\u4e0b\u9762\u8fd9\u4e9b\u547d\u4ee4\u884c\u53c2\u6570":35,"\u901a\u5e38\u9700\u8981\u53bb\u6389\u7f51\u7edc\u4e2d\u7684":20,"\u901a\u7528":32,"\u901a\u77e5":39,"\u901a\u8fc7":[4,6,11,19,39],"\u901a\u8fc7\u4e24\u4e2a\u5d4c\u5957\u7684":41,"\u901a\u8fc7\u4f7f\u7528":0,"\u901a\u8fc7\u51fd\u6570":27,"\u901a\u8fc7\u547d\u4ee4\u884c\u53c2\u6570":11,"\u901a\u8fc7\u591a\u4e2a\u7ebf\u7a0b\u5171\u4eab\u540c\u4e00\u4e2a\u6a21\u578b\u6765\u51cf\u5c11\u5185\u5b58\u5f00\u9500":20,"\u901a\u8fc7\u5f15\u7528memory\u5f97\u5230\u8fd9\u4e2alayer\u4e0a\u4e00\u4e2a\u65f6\u523b\u7684\u8f93\u51fa":41,"\u901a\u8fc7\u5f15\u7528memory\u5f97\u5230\u8fd9\u4e2alayer\u4e0a\u4e00\u4e2a\u65f6\u523b\u8f93\u51fa":41,"\u901a\u8fc7\u6240\u6709\u5355\u5143\u6d4b\u8bd5":4,"\u901a\u8fc7\u7075\u6d3b\u4f7f\u7528\u4ee5\u4e0a\u4e24\u4e2a\u63a5\u53e3":20,"\u901a\u8fc7\u7ec4\u5408\u4e0d\u540c\u7684layer":14,"\u901a\u8fc7\u7f51\u7edc\u5c42\u7684\u6807\u8bc6\u7b26\u6765\u6307\u5b9a":6,"\u901a\u8fc7\u8ba1\u7b97\u8282\u70b9\u548c\u53c2\u6570\u670d\u52a1\u5668\u7684\u5206\u5e03\u5f0f\u534f\u4f5c":22,"\u901a\u8fc7\u8c03\u7528":[19,20],"\u901a\u8fc7\u8c03\u7528\u4ee5\u4e0b\u63a5\u53e3\u521b\u5efa\u7a00\u758f\u77e9\u9635":19,"\u901a\u8fc7data":41,"\u903b\u8f91\u4e0a\u9ad8\u4e8e\u4e8c\u7ef4\u7684\u6570\u636e":19,"\u9047\u5230\u8be5\u9519\u8bef\u65f6":12,"\u9053\u6b49":39,"\u9069":39,"\u9075\u5b88\u4ee5\u4e0b\u7ea6\u5b9a":4,"\u90a3\u4e48":[6,41],"\u90a3\u4e480\u5c42\u5e8f\u5217\u5373\u4e3a\u4e00\u4e2a\u8bcd\u8bed":41,"\u90a3\u4e48\u4f1a\u643a\u5e26\u6709":19,"\u90a3\u4e48\u53ef\u4ee5\u8ba4\u4e3a\u8bad\u7ec3\u4e0d\u6536\u655b":13,"\u90a3\u4e48\u5982\u4f55\u5224\u65ad\u8bad\u7ec3\u4e0d\u6536\u655b\u5462":13,"\u90a3\u4e48\u5e38\u6570\u8f93\u51fa\u6240\u80fd\u8fbe\u5230\u7684\u6700\u5c0fcost\u662f":13,"\u90a3\u4e48\u6211\u4eec\u53ef\u4ee5\u5224\u65ad\u4e3a\u8bad\u7ec3\u4e0d\u6536\u655b":13,"\u90a3\u4e48\u63a8\u8350\u4f7f\u7528":42,"\u90a3\u4e48\u63a8\u8350\u4f7f\u7528\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u65b9\u6cd5":42,"\u90a3\u4e48\u6536\u655b\u53ef\u80fd\u5f88\u6162":13,"\u90a3\u4e48\u6700\u597d\u5c06\u6570\u636e\u6587\u4ef6\u5728\u6bcf\u6b21\u8bfb\u53d6\u4e4b\u524d\u505a\u4e00\u6b21shuffl":11,"\u90a3\u4e48\u7528\u6237\u9700\u8981\u62c9\u53d6\u6240\u6709\u7684\u8fdc\u7a0b\u5206\u652f\u5230\u672c\u673a":8,"\u90a3\u4e48\u8bad\u7ec3\u6709\u53ef\u80fd\u4e0d\u6536\u655b":13,"\u90a3\u4e48\u8be5\u4f18\u5316\u7b97\u6cd5\u81f3\u5c11\u9700\u8981":11,"\u90a3\u4e48fc1\u548cfc2\u5c42\u5c06\u4f1a\u4f7f\u7528\u7b2c1\u4e2agpu\u6765\u8ba1\u7b97":35,"\u90a3\u4e5f\u5c31\u4e0d\u9700\u8981\u6025\u7740\u4f18\u5316\u6027\u80fd\u5566":37,"\u90a3\u4f30\u8ba1\u8fd9\u91cc\u7684\u6f5c\u529b\u5c31\u6ca1\u5565\u597d\u6316\u7684\u4e86":37,"\u90a3\u51cf\u5c11\u5b66\u4e60\u738710\u500d\u7ee7\u7eed\u8bd5\u9a8c":13,"\u90a3\u6211\u4f1a\u671f\u671b\u5206\u6790\u5de5\u5177\u7edf\u8ba1\u5230\u901f\u5ea6\u662f100gb":37,"\u90a3\u7a0b\u5e8f\u5206\u6790\u5de5\u5177\u662f\u5fc5\u4e0d\u53ef\u5c11\u7684\u5229\u5668":37,"\u90fd":39,"\u90fd\u4e0d\u9700\u8981":0,"\u90fd\u4f1a\u4ea7\u751f\u5f53\u524d\u5c42\u72b6\u6001\u7684\u6240\u6709\u7ee7\u627f\u7ed3\u679c":33,"\u90fd\u4f1a\u7ba1\u7406\u7ef4\u62a4\u4e00\u4efd\u8bad\u7ec3\u597d\u7684\u6a21\u578b":20,"\u90fd\u4f1a\u9020\u6210\u8bad\u7ec3\u4e2d\u7684\u6570\u636e\u4ecec":11,"\u90fd\u4f7f\u7528":19,"\u90fd\u53ea\u662f\u4ecb\u7ecd\u53cc\u5c42rnn\u7684api\u63a5\u53e3":39,"\u90fd\u53ef\u4ee5\u8fd0\u884c":0,"\u90fd\u53ef\u4ee5\u901a\u8fc7\u8c03\u7528\u4e0b\u9762\u7684\u63a5\u53e3\u4e3a\u539f\u6709\u7684\u6570\u636e\u8f93\u5165\u9644\u52a0\u4e0a\u5e8f\u5217\u4fe1\u606f":19,"\u90fd\u5e94\u4f7f\u7528c":19,"\u90fd\u662f\u5bf9layer1\u5143\u7d20\u7684\u62f7\u8d1d":38,"\u90fd\u662f\u5c06\u6bcf\u4e00\u53e5\u5206\u597d\u8bcd\u540e\u7684\u53e5\u5b50":39,"\u90fd\u7528":4,"\u90fd\u9700\u8981\u5199\u63d0\u4ea4\u8bf4\u660e":4,"\u90fd\u9700\u8981\u8c03\u7528\u4e00\u6b21":6,"\u914d\u5236\u7f16\u8bd1\u9009\u9879":17,"\u914d\u7f6e\u6253\u5f00":37,"\u914d\u7f6e\u6587\u4ef6\u63a5\u53e3\u662ffc_layer":6,"\u914d\u7f6e\u6587\u4ef6\u91cc\u52a0\u4e24\u884c":0,"\u914d\u7f6e\u7b80\u5355\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u4f8b\u5b50":42,"\u914d\u7f6e\u7f51\u7edc\u5c42\u7684\u8f93\u5165":6,"\u914d\u7f6eapi":38,"\u9152\u5e97":39,"\u91c7\u7528\u5747\u5300\u5206\u5e03\u6216\u8005\u9ad8\u65af\u5206\u5e03\u521d\u59cb\u5316":33,"\u91c7\u7528multi":13,"\u91cc\u53ef\u4ee5\u6807\u51c6\u5316\u7f16\u8bd1\u73af\u5883":0,"\u91cc\u7684":0,"\u91cc\u7684\u65e5\u5fd7":23,"\u91cc\u8fd0\u884c\u7684\u7f16\u8bd1\u5de5\u5177\u5b9e\u9645\u4e0a\u90fd\u662f\u5728\u672c\u673a\u7684":0,"\u91ccstep\u7684\u5185\u5bb9":11,"\u91cd\u65b0\u7f16\u8bd1paddlepaddl":37,"\u9488\u5bf9\u4efb\u52a1\u8fd0\u884c\u5b8c\u6210\u540e\u5bb9\u5668\u81ea\u52a8\u9000\u51fa\u7684\u573a\u666f":26,"\u9488\u5bf9\u5185\u5b58\u548c\u663e\u5b58":11,"\u94fe\u63a5":17,"\u94fe\u63a5\u4e2d\u627e\u5230":3,"\u94fe\u63a5\u4f55\u79cdblas\u5e93\u7b49":0,"\u94fe\u63a5\u76f8\u5bf9\u5bb9\u6613":17,"\u94fe\u63a5\u9009\u9879":17,"\u94fe\u63a5\u9759\u6001\u5e93":17,"\u9519\u8bef":8,"\u9519\u8bef\u7684define_py_data_sources2\u7c7b\u4f3c":13,"\u952e\u6765\u542f\u52a8\u7f16\u8bd1\u4e86":0,"\u955c\u50cf\u91cc":0,"\u955c\u50cf\u91cc\u6709paddlepaddle\u7684\u6e90\u7801\u4e0edemo":26,"\u957f\u5ea6":11,"\u95e8\u63a7\u5faa\u73af\u5355\u5143\u5355\u6b65\u51fd\u6570\u548c\u8f93\u51fa\u51fd\u6570":42,"\u95e8\u63a7\u5faa\u73af\u5355\u5143\u7684\u8f93\u51fa\u88ab\u7528\u4f5c\u8f93\u51famemori":42,"\u9644\u52a0\u4e0a\u5e8f\u5217\u4fe1\u606f":19,"\u9650\u5236\u5957\u63a5\u5b57\u53d1\u9001\u7f13\u51b2\u533a\u7684\u5927\u5c0f":33,"\u9650\u5236\u5957\u63a5\u5b57\u63a5\u6536\u7f13\u51b2\u533a\u7684\u5927\u5c0f":33,"\u9664\u4e86\u53ef\u4ee5\u81ea\u52a8\u7f16\u8bd1\u6587\u6863":7,"\u9664\u4e86boot_lay":39,"\u9664\u6b64\u4e4b\u5916":11,"\u9664\u96f6\u7b49\u95ee\u9898":11,"\u968f\u540e\u53ef\u4ee5\u7528\u8fd9\u4e2a\u5f00\u53d1\u955c\u50cf\u5f00\u59cbbuild":4,"\u968f\u673a\u6570\u7684\u79cd\u5b50":33,"\u968f\u673a\u6570seed":32,"\u9694\u5f00":21,"\u96c6\u675f\u641c\u7d22\u4f7f\u7528\u5e7f\u5ea6\u4f18\u5148\u641c\u7d22\u7684\u65b9\u5f0f\u6784\u5efa\u67e5\u627e\u6811":33,"\u96c6\u7fa4\u4e0a\u542f\u52a8\u4e00\u4e2a\u5355\u673a\u4f7f\u7528cpu\u7684paddlepaddle\u8bad\u7ec3\u4f5c\u4e1a":26,"\u96c6\u7fa4\u4e2d\u7684\u6bcf\u53f0\u8ba1\u7b97\u673a\u901a\u5e38\u88ab\u6210\u4e3a\u4e00\u4e2a":31,"\u96c6\u7fa4\u4efb\u52a1":23,"\u96c6\u7fa4\u4f5c\u4e1a\u5c06\u4f1a\u5728\u51e0\u79d2\u540e\u542f\u52a8":23,"\u96c6\u7fa4\u6d4b\u8bd5":32,"\u96c6\u7fa4\u8bad\u7ec3":32,"\u96c6\u7fa4\u8bad\u7ec3\u4e0e\u9884\u6d4b":10,"\u96c6\u7fa4\u8fdb\u7a0b":23,"\u9700\u5728nvvp\u754c\u9762\u4e2d\u9009\u4e0a\u624d\u80fd\u5f00\u542f":37,"\u9700\u6307\u5b9a":17,"\u9700\u63d0\u4f9b\u975e\u96f6\u5143\u7684\u503c":19,"\u9700\u6ce8\u610f":17,"\u9700\u8981":[0,20],"\u9700\u8981\u4f7f\u7528":11,"\u9700\u8981\u4f7f\u7528\u5176\u5236\u5b9a\u7684\u65b9\u5f0f\u6302\u8f7d\u540e\u5e76\u5bfc\u5165\u6570\u636e":27,"\u9700\u8981\u4f7f\u7528\u6700\u65b0\u7684pip":3,"\u9700\u8981\u4f7f\u7528\u8005\u81ea\u5df1\u4e86\u89e3\u5e76\u5b8c\u6210\u8f6c\u5316":19,"\u9700\u8981\u4fdd\u6301\u5f53\u524d\u5206\u652f\u76ee\u5f55":4,"\u9700\u8981\u521b\u5efa\u5e76\u586b\u5199":19,"\u9700\u8981\u5347\u7ea7pip\u7248\u672c\u5230\u6700\u65b0":[3,8],"\u9700\u8981\u5355\u72ec":1,"\u9700\u8981\u540c\u65f6\u63d0\u4f9b\u6bcf\u4e00\u4e2a\u5185\u5c42\u5e8f\u5217\u5728\u6574\u4e2a":19,"\u9700\u8981\u540c\u6b65\u539f\u4ed3\u5e93":4,"\u9700\u8981\u542f\u52a8\u7684\u8282\u70b9\u4e2a\u6570\u4ee5\u53ca":27,"\u9700\u8981\u54ea\u4e9b\u5c42\u7684\u8ba1\u7b97\u7ed3\u679c\u4f5c\u4e3a\u8f93\u51fa":20,"\u9700\u8981\u5728\u521b\u5efa\u5bb9\u5668\u524d\u6302\u8f7d\u5377\u4ee5\u4fbf\u6211\u4eec\u4fdd\u5b58\u8bad\u7ec3\u7ed3\u679c":26,"\u9700\u8981\u5728\u7cfb\u7edf\u91cc\u5148\u5b89\u88c5\u597ddocker\u5de5\u5177\u5305":7,"\u9700\u8981\u5c06\u5176parameter\u8bbe\u7f6e\u6210":11,"\u9700\u8981\u5c06\u7f51\u7edc\u7ed3\u6784\u4f7f\u7528":20,"\u9700\u8981\u5c06cuda\u76f8\u5173\u7684\u5e93\u8bbe\u7f6e\u5230":17,"\u9700\u8981\u5c06paddl":17,"\u9700\u8981\u5f3a\u8c03\u7684\u662f":0,"\u9700\u8981\u601d\u8003\u5b8c\u6210\u4ee5\u4e0b\u5de5\u4f5c":[19,20],"\u9700\u8981\u624b\u52a8\u8fdb\u884c\u89e3\u538b":20,"\u9700\u8981\u6267\u884c":[0,3,16],"\u9700\u8981\u6307\u5b9a":17,"\u9700\u8981\u6307\u5b9a\u4e0e\u67d0\u4e00\u4e2a\u8f93\u5165\u7684\u5e8f\u5217\u4fe1\u606f\u662f\u4e00\u81f4\u7684":39,"\u9700\u8981\u6307\u5b9alayer\u7684\u8f93\u5165\u6765\u6e90":14,"\u9700\u8981\u63d0\u9192\u7684\u662f":2,"\u9700\u8981\u660e\u786e\u6307\u5b9a":33,"\u9700\u8981\u663e\u5f0f\u5730\u94fe\u63a5":17,"\u9700\u8981\u663e\u793a\u5730\u94fe\u63a5":17,"\u9700\u8981\u663e\u793a\u5730\u94fe\u63a5mkl\u7684\u52a8\u6001\u5e93":17,"\u9700\u8981\u6839\u636e\u4e0d\u540c\u7684\u5206\u5e03\u5f0f\u5b58\u50a8\u6765\u7ed1\u5b9a\u4e00\u4e2a":27,"\u9700\u8981\u6ce8\u610f\u7684\u662f":[11,33],"\u9700\u8981\u6ce8\u610f\u7684\u662f\u68af\u5ea6\u68c0\u67e5\u4ec5\u4ec5\u9a8c\u8bc1\u4e86\u68af\u5ea6\u7684\u8ba1\u7b97":6,"\u9700\u8981\u6ce8\u610f\u7684\u662fpaddlepaddle\u76ee\u524d\u53ea\u652f\u6301\u5b50\u5e8f\u5217\u6570\u76ee\u4e00\u6837\u7684\u591a\u8f93\u5165\u53cc\u5c42rnn":39,"\u9700\u8981\u7528\u6237\u663e\u5f0f\u8bbe\u5b9a":12,"\u9700\u8981\u81ea\u5df1\u94fe\u63a5mkl\u94fe\u63a5\u5e93":17,"\u9700\u8981\u8bf7\u7ba1\u7406\u5458\u5b89\u88c5\u548c\u914d\u7f6e\u597d":0,"\u9700\u8981\u9075\u5faa\u4ee5\u4e0b\u7ea6\u5b9a":41,"\u9700\u9644\u52a0\u53cc\u5c42\u5e8f\u5217\u4fe1\u606f":19,"\u9700\u9644\u52a0\u5e8f\u5217\u4fe1\u606f":19,"\u975e\u5e38\u6570":6,"\u975e\u5e8f\u5217\u8f93\u5165\u4e0d\u643a\u5e26":19,"\u975e\u5e8f\u5217\u8f93\u5165\u65e0\u9700\u6784\u9020":19,"\u975e\u96f6\u5143\u4e2a\u6570":19,"\u975e\u96f6\u5143\u7d20\u7684\u503c":19,"\u975e\u96f6\u5143\u7d20\u7684\u5217\u53f7":19,"\u975e\u96f6\u6570\u5b57\u7684\u4e2a\u6570":6,"\u9879\u76ee\u5728\u52aa\u529b\u5f00\u59cb\u652f\u6301\u5176\u4ed6\u4e0d\u9700\u8981":0,"\u987a\u5e8f":39,"\u9884\u63d0\u4ea4\u94a9\u5b50":4,"\u9884\u6d4b\u4e0d\u9700\u8981\u6807\u7b7e":18,"\u9884\u6d4b\u4e0d\u9700\u8981\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u53cd\u5411\u4f20\u64ad\u548c\u53c2\u6570\u66f4\u65b0\u7684\u90e8\u5206":18,"\u9884\u6d4b\u4ee3\u7801\u66f4\u591a\u8be6\u7ec6\u793a\u4f8b\u4ee3\u7801\u8bf7\u53c2\u8003":20,"\u9884\u6d4b\u4f7f\u7528\u7684\u7f51\u7edc\u7ed3\u6784\u5f80\u5f80\u4e0d\u540c\u4e8e\u8bad\u7ec3":20,"\u9884\u6d4b\u5c31\u662f\u51c6\u5907\u8f93\u5165\u6570\u636e":18,"\u9884\u6d4b\u5f88\u591a\u65f6\u5019\u9700\u8981\u548c\u7528\u6237\u7cfb\u7edf\u6574\u5408\u5728\u4e00\u8d77":18,"\u9884\u6d4b\u65f6":20,"\u9884\u6d4b\u65f6\u53ea\u9700\u52a0\u8f7d\u4e00\u4e2a\u6587\u4ef6\u4fbf\u4e8e\u53d1\u5e03":20,"\u9884\u6d4b\u6709\u5982\u4e0b\u7279\u70b9":18,"\u9884\u6d4b\u7a0b\u5e8f\u5f00\u53d1\u4e24\u5927\u90e8\u5206":20,"\u9884\u6d4bsdk\u4e0d\u5305\u542b\u53cd\u5411\u4f20\u64ad\u548c\u53c2\u6570\u66f4\u65b0\u90e8\u5206":18,"\u9884\u6d4bsdk\u9700\u8981\u63d0\u4f9b\u4e00\u4e2a\u7b80\u6d01\u7684\u7528\u6237\u63a5\u53e3":18,"\u9996\u5148":[6,39,42],"\u9996\u5148\u4ee5\u51e0\u4e2a\u5b9e\u9645\u573a\u666f\u4e3a\u4f8b":34,"\u9996\u5148\u5728\u7cfb\u7edf\u8def\u5f84":0,"\u9996\u5148\u5b89\u88c5\u5e76\u5728\u5f53\u524d\u76ee\u5f55\u8fd0\u884c\u5b83":4,"\u9996\u5148\u5bf9\u8f93\u5165\u505a\u4e00\u4e2a\u5c0f\u7684\u6270\u52a8":6,"\u9996\u5148\u6211\u4eec\u9700\u8981\u63a8\u5bfc\u8be5\u7f51\u7edc\u5c42\u7684":6,"\u9996\u5148\u6784\u9020\u5934\u4fe1\u606f":13,"\u9996\u5148\u901a\u8fc7":4,"\u9996\u5148\u9700\u8981\u52a0\u8f7d\u76f8\u5e94\u7684python\u5e93":14,"\u9a71\u52a8":7,"\u9ad8\u4eae\u90e8\u5206":39,"\u9ad8\u5ea6":19,"\u9ad8\u5ea6\u652f\u6301\u7075\u6d3b\u548c\u9ad8\u6548\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u914d\u7f6e":42,"\u9ad8\u65af\u5206\u5e03":13,"\u9ed8\u8ba4":33,"\u9ed8\u8ba40":21,"\u9ed8\u8ba41":21,"\u9ed8\u8ba4127":21,"\u9ed8\u8ba47164":21,"\u9ed8\u8ba4\u4e0d\u663e\u793a":33,"\u9ed8\u8ba4\u4e0d\u8bbe\u7f6e":41,"\u9ed8\u8ba4\u4e3a0":[33,35],"\u9ed8\u8ba4\u4e3a1":[19,35],"\u9ed8\u8ba4\u4e3a100":35,"\u9ed8\u8ba4\u4e3a4096mb":33,"\u9ed8\u8ba4\u4e3a\u7b2c\u4e00\u4e2a\u8f93\u5165":41,"\u9ed8\u8ba4\u4e3anull":33,"\u9ed8\u8ba4\u4f1a\u5c06a\u548cb":11,"\u9ed8\u8ba4\u4f7f\u7528concurrentremoteparameterupdat":33,"\u9ed8\u8ba4\u4f7f\u7528mkl":0,"\u9ed8\u8ba4\u503c":[0,35,38],"\u9ed8\u8ba4\u521d\u59cb\u72b6\u4e3a0":41,"\u9ed8\u8ba4\u60c5\u51b5\u4e0b":[13,23],"\u9ed8\u8ba4\u60c5\u51b5\u4e0b\u6309\u7167float\u7cbe\u5ea6\u8ba1\u7b97":13,"\u9ed8\u8ba4\u6307\u5b9a\u7b2c\u4e00\u4e2a\u8f93\u5165":39,"\u9ed8\u8ba4\u662f\u4f7f\u7528mkl\u7684\u955c\u50cf":1,"\u9ed8\u8ba4\u6ca1\u6709\u5b89\u88c5vim":1,"\u9ed8\u8ba4\u7684paddlepaddle\u751f\u4ea7\u73af\u5883\u955c\u50cf":26,"\u9ed8\u8ba4\u8bbe\u7f6e\u4e3a\u771f":35,"\u9ed8\u8ba4\u8c03\u7528":0,"\u9ed8\u8ba4fals":21,"api\u4e0d\u4f1a\u76f4\u63a5\u52a0\u8f7d":20,"api\u4e2d":19,"api\u4e2d\u7684\u4e00\u7ef4\u6570\u7ec4":19,"api\u4e2d\u7684\u77e9\u9635\u6765\u8868\u793a":19,"api\u4e2d\u795e\u7ecf\u7f51\u7edc\u7684\u4e00\u4e2a\u8f93\u5165":20,"api\u4f7f\u7528\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u6982\u5ff5":20,"api\u4f7f\u7528\u6d41\u7a0b":18,"api\u4f7f\u7528\u6d41\u7a0b\u793a\u610f\u56fe":20,"api\u4f7f\u7528\u793a\u4f8b":20,"api\u521b\u5efa\u7684gradientmachine\u7c7b\u7684\u5bf9\u8c61":20,"api\u53ef\u4ee5\u901a\u8fc7\u5206\u522b\u6307\u5b9a\u5e8f\u5217\u5316\u540e\u7684\u7f51\u7edc\u7ed3\u6784\u6587\u4ef6\u548c\u53c2\u6570\u76ee\u5f55\u6765\u52a0\u8f7d\u8bad\u7ec3\u597d\u7684\u6a21\u578b":20,"api\u53ef\u4ee5\u901a\u8fc7\u6307\u5b9a":20,"api\u5b8c\u6210\u5206\u5e03\u5f0f\u8bad\u7ec3":21,"api\u5bf9\u6bd4\u4ecb\u7ecd":40,"api\u5f00\u53d1\u5305\u5e76\u5b89\u88c5":3,"api\u5f00\u53d1\u9884\u6d4b\u7a0b\u5e8f\u65f6":17,"api\u5f00\u53d1\u9884\u6d4b\u7a0b\u5e8f\u9700\u8981\u4e00\u4e2a\u8bad\u7ec3\u597d\u7684\u6a21\u578b":20,"api\u6240\u9700\u7684\u4f9d\u8d56":17,"api\u63d0\u4f9b\u7684":20,"api\u652f\u6301\u7684\u6240\u6709\u8f93\u5165\u6570\u636e\u7c7b\u578b\u548c\u4ed6\u4eec\u7684\u7ec4\u7ec7\u65b9\u5f0f":20,"api\u65f6\u4e3a\u8f93\u51fa":20,"api\u7684\u4f7f\u7528":18,"api\u76f8\u5173\u63a5\u53e3":19,"api\u8bad\u7ec3":20,"api\u9700\u8981\u521b\u5efa\u7684\u6570\u636e\u7c7b\u578b":19,"api\u9884\u6d4b\u5e93":36,"api\u9884\u6d4b\u65f6":20,"async_sgd\u8fdb\u884c\u8bad\u7ec3\u65f6":13,"avx\u662f\u4e00\u79cdcpu\u6307\u4ee4\u96c6":1,"avx\u7248\u672c":1,"avx\u7684\u955c\u50cf":1,"batch\u4e2d\u5305\u542b":11,"batch\u7684\u6743\u91cd":11,"batches\u4e2a\u6279\u6b21\u4fdd\u5b58\u4e00\u6b21\u53c2\u6570":33,"batches\u6b21":33,"book\u4e00\u5b9a\u662f\u60a8\u6700\u597d\u7684\u9009\u62e9":1,"book\u662f\u4e3a\u7528\u6237\u548c\u5f00\u53d1\u8005\u5236\u4f5c\u7684\u4e00\u4e2a\u4ea4\u4e92\u5f0f\u7684jupyt":1,"book\u7684":14,"book\u7684docker\u955c\u50cf":1,"byte":13,"case":[25,37],"cells\u7b49":12,"class":[6,13],"cmake\u4e2d\u5c06":37,"cmake\u627e\u5230\u7684python\u5e93\u548cpython\u89e3\u91ca\u5668\u7248\u672c\u53ef\u80fd\u6709\u4e0d\u4e00\u81f4\u73b0\u8c61":8,"cmake\u7f16\u8bd1\u65f6":0,"cmake\u914d\u7f6e\u4e2d\u5c06":37,"const":6,"container\u4e2d":26,"cost\u63a5\u6536y_predict\u4e0ey\u4f5c\u4e3a\u8f93\u5165":14,"cost\u8fd8\u5927\u4e8e\u8fd9\u4e2a\u6570":13,"count\u4e2agpu\u4e0a\u4f7f\u7528\u6570\u636e\u5e76\u884c\u6765\u8ba1\u7b97\u67d0\u4e00\u5c42":35,"count\u548cgpu":35,"csr\u5b58\u50a8\u683c\u5f0f\u901a\u8fc7":19,"cuda\u5e93":33,"cuda\u76f8\u5173\u5e93\u4f1a\u5728\u9884\u6d4b\u7a0b\u5e8f\u8fd0\u884c\u65f6\u52a8\u6001\u88c5\u8f7d":17,"cudnn\u5e93":[0,33],"data\u76ee\u5f55\u4e2d\u5b58\u653e\u5207\u5206\u597d\u7684\u6570\u636e":27,"dataprovider\u5171\u8fd4\u56de\u4e24\u4e2a\u6570\u636e":39,"dataprovider\u5171\u8fd4\u56de\u4e24\u7ec4\u6570\u636e":39,"dataprovider\u7f13\u51b2\u6c60\u5185\u5b58":11,"decoder\u5faa\u73af\u5c55\u5f00\u7684\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u4f1a\u5f15\u7528\u5168\u90e8\u7ed3\u679c":41,"decoder\u63a5\u53d7\u4e24\u4e2a\u8f93\u5165":41,"decoder\u6bcf\u6b21\u9884\u6d4b\u4ea7\u751f\u4e0b\u4e00\u4e2a\u6700\u53ef\u80fd\u7684\u8bcd\u8bed":41,"decoer\u67b6\u6784":41,"default":[25,26,27,35],"dist\u76ee\u5f55\u4e0b\u751f\u6210\u8f93\u51fa\u7684whl\u5305":0,"dnn\u6570\u5b66\u5e93":0,"docker\u5b89\u88c5\u65b9\u5f0f\u53ef\u4ee5\u8fdb\u5165docker\u5bb9\u5668\u6267\u884c":31,"docker\u5b89\u88c5\u8bf7\u53c2\u8003":7,"docker\u5b89\u88c5\u8bf7\u53c2\u8003docker\u7684\u5b98\u7f51":7,"docker\u5b98\u7f51":1,"docker\u7684\u5b98\u7f51":7,"docker\u955c\u50cf":1,"docker\u955c\u50cf\u4e3a\u4e86\u51cf\u5c0f\u4f53\u79ef":1,"docker\u955c\u50cf\u9ed8\u8ba4":1,"dockerhub\u7f51\u7ad9":1,"double\u7c7b\u578b\u65f6\u4e3a8":13,"dropout\u7684\u6bd4\u4f8b":6,"encode\u6210\u7684\u6700\u540e\u4e00\u4e2a\u5411\u91cf":39,"encoder\u548cdecoder\u53ef\u4ee5\u662f\u80fd\u591f\u5904\u7406\u5e8f\u5217\u7684\u4efb\u610f\u795e\u7ecf\u7f51\u7edc\u5355\u5143":41,"encoder\u8f93\u51fa":41,"entropy\u4f5c\u4e3acost":13,"export":[1,7,8,21],"f\u4ee3\u8868\u4e00\u4e2a\u6d6e\u70b9\u6570":14,"fc1\u548cfc2\u5c42\u5728gpu\u4e0a\u8ba1\u7b97":35,"fc3\u5c42\u4f7f\u7528cpu\u8ba1\u7b97":35,"float":[19,37],"float\u7b49":35,"forward\u7684output\u7684\u503c":11,"full\u53c2\u6570\u63d0\u4ea4":9,"function":42,"function\u4f7f\u7528":12,"git\u6d41\u5206\u652f\u6a21\u578b":4,"github\u9996\u9875":4,"glibc\u81f3\u5c11\u5305\u542bglibc_2":3,"gpu\u4e8c\u8fdb\u5236\u6587\u4ef6":0,"gpu\u5219\u8fd8\u9700\u8981\u9ad8\u5e76\u884c\u6027":37,"gpu\u6027\u80fd\u8c03\u4f18":36,"gpu\u6838\u5728\u8bad\u7ec3\u914d\u7f6e\u4e2d\u6307\u5b9a":33,"gpu\u7684docker\u955c\u50cf\u7684\u65f6\u5019":8,"group\u6559\u7a0b":40,"group\u7684\u5b9e\u73b0\u65b9\u5f0f":12,"gru\u6216lstm":42,"html\u5373\u53ef\u8bbf\u95ee\u672c\u5730\u6587\u6863":7,"i\u4ee3\u8868\u4e00\u4e2a\u6574\u6570":14,"id\u6307\u5b9a\u4f7f\u7528\u54ea\u4e2agpu\u6838":33,"id\u6307\u5b9a\u7684gpu":35,"id\u65e0\u6548":33,"image\u91cc":26,"imikolov\u6570\u636e\u96c6":21,"import":[3,14,16,20,21,25],"infer\u63a5\u53e3\u7684\u8fd4\u56de\u503c\u662f\u4e00\u4e2apython":11,"int":[6,19,21,35,39],"issue\u7f16\u53f7":4,"job\u662f\u672c\u6b21\u8bad\u7ec3\u5bf9\u5e94\u7684job":27,"job\u7684\u540d\u5b57":27,"kubernetes\u4e3a\u8fd9\u6b21\u8bad\u7ec3\u521b\u5efa\u4e863\u4e2apod\u5e76\u4e14\u8c03\u5ea6\u5230\u4e863\u4e2anode\u4e0a\u8fd0\u884c":27,"kubernetes\u5206\u5e03\u5f0f\u8bad\u7ec3":24,"kubernetes\u5355\u673a\u8bad\u7ec3":24,"kubernetes\u53ef\u4ee5\u901a\u8fc7yaml\u6587\u4ef6\u6765\u521b\u5efa\u76f8\u5173\u5bf9\u8c61":27,"kubernetes\u5c31\u4f1a\u521b\u5efa3\u4e2apod\u4f5c\u4e3apaddlepaddle\u8282\u70b9\u7136\u540e\u62c9\u53d6\u955c\u50cf":27,"kubernetes\u6709job\u7c7b\u578b\u7684\u8d44\u6e90\u6765\u652f\u6301":26,"label\u662f\u539f\u59cb\u6570\u636e\u4e2d\u5bf9\u4e8e\u6bcf\u4e00\u53e5\u8bdd\u7684\u5206\u7c7b\u6807\u7b7e":39,"labels\u662f\u6bcf\u7ec4\u5185\u6bcf\u4e2a\u53e5\u5b50\u7684\u6807\u7b7e":39,"layer1\u5fc5\u987b\u662f\u4e00\u4e2a0\u5c42\u5e8f\u5217":38,"layer1\u5fc5\u987b\u662f\u4e00\u4e2a\u5355\u5c42\u5e8f\u5217":38,"layer\u4f5c\u4e3a\u4e00\u4e2a\u6574\u4f53\u6765\u5b9e\u73b0":12,"layer\u62ff\u5230\u7684\u7528\u6237\u8f93\u5165":41,"layer\u65f6":12,"layer\u662f\u6211\u4eec\u7684\u79ef\u6728":14,"layer\u7c7b\u53ef\u4ee5\u81ea\u52a8\u8ba1\u7b97\u4e0a\u9762\u7684\u5bfc\u6570":6,"layer\u8ba1\u7b97\u7684\u8f93\u51fa":12,"linux\u4e2d":1,"list\u4e2d":20,"list\u5982\u4e0b\u6240\u793a":35,"list\u6307\u5b9a\u6d4b\u8bd5\u7684\u6a21\u578b\u5217\u8868":35,"memory\u4e0d\u80fd\u72ec\u7acb\u5b58\u5728":41,"memory\u4e5f\u53ef\u4ee5\u5177\u6709":42,"memory\u4e5f\u53ef\u4ee5\u662f\u5e8f\u5217":42,"memory\u53ea\u80fd\u5728":41,"memory\u53ef\u4ee5\u7f13\u5b58\u4e0a\u4e00\u4e2a\u65f6\u523b\u67d0\u4e00\u4e2a\u795e\u7ecf\u5143\u7684\u8f93\u51fa":39,"memory\u6307\u5411\u4e00\u4e2alay":41,"memory\u662f\u5728\u5355\u6b65\u51fd\u6570\u4e2d\u5faa\u73af\u4f7f\u7528\u7684\u72b6\u6001":42,"memory\u662fpaddlepaddle\u5b9e\u73b0rnn\u65f6\u5019\u4f7f\u7528\u7684\u4e00\u4e2a\u6982\u5ff5":39,"memory\u7684":42,"memory\u7684\u521d\u59cb\u72b6\u6001":41,"memory\u7684\u65f6\u95f4\u5e8f\u5217\u957f\u5ea6\u4e00\u81f4\u7684\u60c5\u51b5":39,"memory\u7684\u66f4\u591a\u8ba8\u8bba\u8bf7\u53c2\u8003\u8bba\u6587":41,"memory\u7684\u8f93\u51fa\u5b9a\u4e49\u5728":42,"memory\u7684i":41,"memory\u9ed8\u8ba4\u521d\u59cb\u5316\u4e3a0":41,"mnist\u624b\u5199\u6570\u5b57\u8bc6\u522b\u76ee\u5f55":20,"name\u7ec4\u5408\u53ef\u4ee5\u627e\u5230\u672c\u6b21\u8bad\u7ec3\u9700\u8981\u7684\u6587\u4ef6\u8def\u5f84":27,"new":[4,6,25],"null":[6,19,33],"num\u51b3\u5b9a\u603b\u7aef\u53e3\u4e2a\u6570":21,"num_gradient_servers\u53c2\u6570":27,"num_samples_processed\u4e3a\u5df2\u8bad\u7ec3\u6837\u672c\u6570":13,"only\u7684\u4e8c\u8fdb\u5236":0,"org\u5de5\u5177\u7684\u8be6\u7ec6\u4fe1\u606f":7,"outer_mem\u662f\u4e00\u4e2a\u5b50\u53e5\u7684\u6700\u540e\u4e00\u4e2a\u5411\u91cf":39,"output\u6587\u4ef6\u5939\u5b58\u653e\u8bad\u7ec3\u7ed3\u679c\u4e0e\u65e5\u5fd7":27,"packages\u91cc\u9762":8,"packages\u91cc\u9762\u7684python\u5305":8,"paddepaddle\u901a\u8fc7\u7f16\u8bd1\u65f6\u6307\u5b9a\u8def\u5f84\u6765\u5b9e\u73b0\u5f15\u7528\u5404\u79cdbla":0,"paddle\u4e2d\u7ecf\u5e38\u4f1a\u5c06\u65f6\u95f4\u5e8f\u5217\u6210\u4e3a":39,"paddle\u4e8c\u8fdb\u5236\u5728\u8fd0\u884c\u65f6\u6355\u83b7\u4e86\u6d6e\u70b9\u6570\u5f02\u5e38":11,"paddlepaddle\u4e2d":[38,41],"paddlepaddle\u4e2d\u4e00\u4e2a\u8ba1\u7b97\u5c42\u7684\u8f93\u51fa\u6570\u636e\u7ec4\u7ec7\u65b9\u5f0f\u548c\u8f93\u5165\u6570\u636e\u7ec4\u7ec7\u65b9\u5f0f\u5b8c\u5168\u76f8\u540c":19,"paddlepaddle\u4e2d\u7684\u8bb8\u591alayer\u5e76\u4e0d\u5728\u610f\u8f93\u5165\u662f\u5426\u662f\u65f6\u95f4\u5e8f\u5217":39,"paddlepaddle\u4e2d\u795e\u7ecf\u7f51\u7edc\u8ba1\u7b97\u5c42\u8f93\u5165":19,"paddlepaddle\u4e2d\u8fd8\u5305\u542b":12,"paddlepaddle\u4e2d\u901a\u8fc7reader\u6765\u52a0\u8f7d\u6570\u636e":14,"paddlepaddle\u4e3a\u6df1\u5ea6\u5b66\u4e60\u7814\u7a76\u4eba\u5458\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684api":14,"paddlepaddle\u4e3ano":1,"paddlepaddle\u4f1a\u81ea\u52a8\u8bbe\u5b9a":12,"paddlepaddle\u4f7f\u7528\u540c\u6b65\u5c4f\u969c":22,"paddlepaddle\u4f7f\u7528\u5747\u503c0":13,"paddlepaddle\u4f7f\u7528avx":8,"paddlepaddle\u4fdd\u5b58\u7684\u6a21\u578b\u53c2\u6570\u6587\u4ef6\u5185\u5bb9\u753116\u5b57\u8282\u5934\u4fe1\u606f\u548c\u7f51\u7edc\u53c2\u6570\u4e24\u90e8\u5206\u7ec4\u6210":13,"paddlepaddle\u4fdd\u5b58\u7684\u6a21\u578b\u53c2\u6570\u6587\u4ef6\u524d16\u5b57\u8282\u4e3a\u5934\u4fe1\u606f":13,"paddlepaddle\u53d1\u5e03\u7684\u5b89\u88c5\u5305\u4f1a\u5c3d\u91cf\u5bf9\u9f50":3,"paddlepaddle\u53ef\u4ee5\u4f7f\u7528\u5e38\u7528\u7684python\u5305\u7ba1\u7406\u5de5\u5177":3,"paddlepaddle\u53ef\u4ee5\u4f7f\u7528cudnn":0,"paddlepaddle\u53ef\u4ee5\u540c\u65f6\u652f\u6301\u540c\u6b65\u968f\u673a\u68af\u5ea6\u4e0b\u964d":22,"paddlepaddle\u53ef\u4ee5\u6bd4\u8f83\u7b80\u5355\u7684\u5224\u65ad\u54ea\u4e9b\u8f93\u51fa\u662f\u5e94\u8be5\u8de8\u8d8a\u65f6\u95f4\u6b65\u7684":39,"paddlepaddle\u53ef\u4ee5\u901a\u8fc7\u8be5\u673a\u5236\u5224\u65ad\u662f\u5426\u5df2\u7ecf\u6536\u96c6\u9f50\u6240\u6709\u7684\u68af\u5ea6":6,"paddlepaddle\u5728\u5b9e\u73b0rnn\u7684\u65f6\u5019":39,"paddlepaddle\u5728\u6fc0\u6d3b\u51fd\u6570\u91cc\u5b9e\u73b0dropout":12,"paddlepaddle\u5728\u7f16\u8bd1\u65f6":0,"paddlepaddle\u5b58\u7684\u662f\u6709\u503c\u4f4d\u7f6e\u7684\u7d22\u5f15":14,"paddlepaddle\u5b89\u88c5\u5305\u7531\u4e8e\u4e0d\u4ec5\u4ec5\u5305\u542b":3,"paddlepaddle\u63d0\u4f9b\u4e86\u57fa\u4e8edocker\u7684\u5b89\u88c5\u65b9\u5f0f":2,"paddlepaddle\u63d0\u4f9b\u4e86\u591a\u79cdpython":2,"paddlepaddle\u63d0\u4f9b\u4e86c":18,"paddlepaddle\u63d0\u4f9b\u7684":12,"paddlepaddle\u652f\u6301":0,"paddlepaddle\u652f\u6301\u4e0d\u540c\u7c7b\u578b\u7684\u8f93\u5165\u6570\u636e":14,"paddlepaddle\u652f\u6301\u4f7f\u7528pip\u5feb\u901f\u5b89\u88c5":16,"paddlepaddle\u652f\u6301\u7528\u6237\u7075\u6d3b\u5730\u8bbe\u7f6e\u5404\u79cd\u547d\u4ee4\u884c\u53c2\u6570":34,"paddlepaddle\u652f\u6301\u975e\u5e38\u591a\u7684\u4f18\u5316\u7b97\u6cd5":11,"paddlepaddle\u652f\u6301sparse\u7684\u8bad\u7ec3":11,"paddlepaddle\u6587\u6863\u4f7f\u7528":7,"paddlepaddle\u662f\u6e90\u4e8e\u767e\u5ea6\u7684\u4e00\u4e2a\u6df1\u5ea6\u5b66\u4e60\u5e73\u53f0":14,"paddlepaddle\u7684":26,"paddlepaddle\u7684\u5185\u5b58\u5360\u7528\u4e3b\u8981\u5206\u4e3a\u5982\u4e0b\u51e0\u4e2a\u65b9\u9762":11,"paddlepaddle\u7684\u53c2\u6570\u4f7f\u7528\u540d\u5b57":13,"paddlepaddle\u7684\u5404\u7248\u672c\u955c\u50cf\u53ef\u4ee5\u53c2\u8003":26,"paddlepaddle\u7684\u5b89\u88c5\u53ef\u4ee5\u53c2\u8003":31,"paddlepaddle\u7684\u6240\u6709layer\u90fd\u6709\u552f\u4e00\u7684nam":12,"paddlepaddle\u7684\u6587\u6863\u5305\u62ec\u82f1\u6587\u6587\u6863":7,"paddlepaddle\u7684\u6587\u6863\u6784\u5efa\u6709\u4e09\u79cd\u65b9\u5f0f":7,"paddlepaddle\u7684\u6e90\u7801":4,"paddlepaddle\u7684\u7f16\u8bd1\u9009\u9879":0,"paddlepaddle\u7684bas":6,"paddlepaddle\u7684dock":26,"paddlepaddle\u7684softmax\u4e0d\u80fd\u6307\u5b9a\u8ba1\u7b97\u7ef4\u5ea6":12,"paddlepaddle\u76ee\u524d\u53ea\u652f\u6301\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u4e2d":39,"paddlepaddle\u76ee\u524d\u63d0\u4f9b\u4e24\u79cd\u53c2\u6570\u521d\u59cb\u5316\u7684\u65b9\u5f0f":13,"paddlepaddle\u76ee\u524d\u652f\u63018\u79cdlearning_rate_schedul":13,"paddlepaddle\u793e\u533a":10,"paddlepaddle\u7f16\u8bd1\u9700\u8981\u4f7f\u7528\u5230\u4e0b\u9762\u7684\u4f9d\u8d56":0,"paddlepaddle\u8d1f\u8d23\u5b8c\u6210\u4fe1\u606f\u548c\u68af\u5ea6\u5728\u65f6\u95f4\u5e8f\u5217\u4e0a\u7684\u4f20\u64ad":41,"paddlepaddle\u8d1f\u8d23\u5b8c\u6210\u4fe1\u606f\u548c\u8bef\u5dee\u5728\u65f6\u95f4\u5e8f\u5217\u4e0a\u7684\u4f20\u64ad":41,"paddlepaddle\u9488\u5bf9\u4e0d\u540c\u7684\u7528\u6237\u7fa4\u4f53\u63d0\u4f9b\u4e86\u591a\u79cd\u5b89\u88c5\u65b9\u5f0f":2,"paddlepaddle\u955c\u50cf\u9700\u8981\u63d0\u4f9b":27,"paddlepaddle\u9700\u8981\u4f7f\u7528docker\u73af\u5883\u5b8c\u6210\u7f16\u8bd1":0,"pass\u4e2a\u6a21\u578b\u5230\u7b2c":33,"pass\u5c06\u4e0d\u8d77\u4f5c\u7528":33,"pass\u8f6e\u5f00\u59cb\u8bad\u7ec3":33,"pass\u8f6e\u7684\u6a21\u578b\u7528\u4e8e\u6d4b\u8bd5":33,"passes\u8f6e":33,"path\u6307\u5b9a\u6d4b\u8bd5\u7684\u6a21\u578b":35,"period\u4e2a\u6279\u6b21\u5bf9\u6240\u6709\u6d4b\u8bd5\u6570\u636e\u8fdb\u884c\u6d4b\u8bd5":33,"period\u4e2a\u6279\u6b21\u6253\u5370\u65e5\u5fd7\u8fdb\u5ea6":33,"period\u4e2a\u6279\u6b21\u8f93\u51fa\u53c2\u6570\u7edf\u8ba1":33,"period\u4e2a\u6279\u6b21\u8f93\u51fa\u7b26\u53f7":33,"period\u6574\u9664":33,"period\u8f6e\u4fdd\u5b58\u8bad\u7ec3\u53c2\u6570":33,"pserver\u5730\u5740\u7b49\u53c2\u6570\u4f7ftrainer\u53ef\u4ee5\u6b63\u786e\u8fde\u63a5\u5230pserv":21,"pserver\u76d1\u542c\u7684\u8d77\u59cb\u7aef\u53e3":21,"public":[6,25,26],"pwd\u53d8\u91cf\u4f1a\u5c55\u5f00\u4e3a\u5f53\u524d\u8def\u5f84\u7684\u7edd\u5bf9\u8def\u5f84":1,"py\u7a0b\u5e8f":3,"pydataprovider\u4f7f\u7528\u7684\u662f\u5f02\u6b65\u52a0\u8f7d":11,"pypi\u5b89\u88c5\u5305\u53ef\u4ee5\u5728":3,"python\u5b89\u88c5\u5305\u652f\u6301linux":8,"python\u5c01\u88c5\u7684\u5b9e\u73b0\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u5728\u914d\u7f6e\u6587\u4ef6\u4e2d\u4f7f\u7528\u65b0\u5b9e\u73b0\u7684\u7f51\u7edc\u5c42":6,"recommendation\u6587\u4ef6\u5939\u5185\u5b58\u653e\u8bad\u7ec3\u6587\u4ef6":27,"request\u524d":4,"request\u7684":4,"request\u88ab\u5408\u5e76\u540e":4,"return":[6,11,13,14,25,27,39,42],"rnn\u5373\u65f6\u95f4\u9012\u5f52\u795e\u7ecf\u7f51\u7edc":39,"rnn\u5bf9\u4e8e\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u901a\u8fc7\u4e86\u4e00\u4e2alstm\u7f51\u7edc":39,"rnn\u603b\u662f\u5f15\u7528\u4e0a\u4e00\u65f6\u523b\u9884\u6d4b\u51fa\u7684\u8bcd\u7684\u8bcd\u5411\u91cf":41,"rnn\u6a21\u578b":36,"rnn\u914d\u7f6e":40,"root\u66ff\u6362\u4e3apaddlepaddle\u9884\u6d4b\u5e93\u7684\u5b89\u88c5\u8def\u5f84":17,"sdk\u7684\u63a5\u53e3\u9700\u8981\u662f\u6ee1\u8db3c\u6807\u51c6\u7684\u63a5\u53e3":18,"search\u7684\u65b9\u6cd5":33,"sentences\u662f\u53cc\u5c42\u65f6\u95f4\u5e8f\u5217\u7684\u6570\u636e":39,"seq\u53c2\u6570\u5fc5\u987b\u4e3afals":41,"server\u4e2a\u6279\u6b21\u6253\u5370\u65e5\u5fd7\u8fdb\u5ea6":33,"simd\u6307\u4ee4\u63d0\u9ad8cpu\u6267\u884c\u6548\u7387":8,"size\u4e3a512":33,"size\u53ef\u80fd\u4f1a\u5bf9\u8bad\u7ec3\u7ed3\u679c\u4ea7\u751f\u5f71\u54cd":11,"size\u672c\u8eab\u662f\u795e\u7ecf\u7f51\u7edc\u7684\u8d85\u53c2\u6570":11,"softmax\u6fc0\u6d3b\u7684\u8f93\u51fa\u7684\u548c\u603b\u662f1":6,"sparse\u8bad\u7ec3\u9700\u8981\u8bad\u7ec3\u7279\u5f81\u662f":11,"static":25,"step\u51fd\u6570\u4e2d\u7684memori":41,"step\u51fd\u6570\u5185\u90e8\u53ef\u4ee5\u81ea\u7531\u7ec4\u5408paddlepaddle\u652f\u6301\u7684\u5404\u79cdlay":41,"subseq\u7684\u6bcf\u4e2a\u5143\u7d20\u662f\u4e00\u4e2a0\u5c42\u5e8f\u5217":38,"super":6,"switch":25,"tests\u7684paddlepaddl":4,"tflops\u4e86":37,"throw":25,"trainer\u542f\u52a8\u9700\u8981\u4f20\u5165\u7aef\u53e3":21,"trainer\u63a5\u6536\u4e09\u4e2a\u53c2\u6570":14,"trainer\u8282\u70b9\u4e2a\u6570":21,"trainer\u9700\u8981\u548cpserver\u4fdd\u6301\u7f51\u7edc\u8054\u901a\u4ee5\u5b8c\u6210\u8bad\u7ec3":21,"true":[6,11,13,19,20,21,25,27,35,39,42],"true\u8868\u793a\u53cd\u5411\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":42,"try":8,"unit\u5728\u4e00\u4e2a\u65f6\u95f4\u6b65\u5185\u8ba1\u7b97\u5f97\u5230\u7684\u4e2d\u95f4\u503c":12,"update\u53c2\u6570\u65f6\u624d\u6709\u6548":33,"v1\u7248\u672c":8,"var":7,"void":6,"wheel\u5305":2,"while":27,"wmt14\u6570\u636e\u7684\u63d0\u4f9b\u6587\u4ef6\u5728":42,"words\u5373\u4e3a\u8fd9\u4e2a\u6570\u636e\u4e2d\u7684\u5355\u5c42\u65f6\u95f4\u5e8f\u5217":39,"words\u662f\u539f\u59cb\u6570\u636e\u4e2d\u7684\u6bcf\u4e00\u53e5\u8bdd":39,"x86_64\u548cmaco":8,"x\u4e0ey\u4e3a\u4e4b\u524d\u63cf\u8ff0\u7684\u8f93\u5165\u5c42":14,"y\u8868\u793a\u8f93\u5165\u6570\u636e\u662f\u4e00\u4e2a\u7ef4\u5ea6\u4e3a1\u7684\u7a20\u5bc6\u5411\u91cf":14,"yaml\u6587\u4ef6\u4e2d\u5404\u4e2a\u5b57\u6bb5\u7684\u5177\u4f53\u542b\u4e49":27,"yaml\u6587\u4ef6\u63cf\u8ff0\u4e86\u8fd9\u6b21\u8bad\u7ec3\u4f7f\u7528\u7684docker\u955c\u50cf":27,"zero\u4e09\u79cd\u64cd\u4f5c":33,AGE:[25,26],AWS:[24,29,30],And:25,But:8,For:37,IDE:0,Into:25,Its:25,QoS:26,TLS:25,The:[6,19,25,27],Then:25,There:25,Use:25,VPS:25,Will:14,Yes:1,___embedding_0__:27,___embedding_1__:27,___fc_layer_0__:25,__init__:6,__rnn_step__:42,__square_error_cost_0__:27,_recurrent_group:42,_source_language_embed:42,_target_language_embed:42,aaaaaaaaaaaaa:25,about:25,abov:[25,37],abs:11,absolut:11,accessmod:25,act:[11,14,16,39,42],action:25,activ:[11,14,16,42],adadelta:11,adagrad:21,adam:13,add:[4,8],add_input:6,add_test:6,add_to:12,add_unittest_without_exec:6,addbia:6,added:4,address:19,addrow:6,after:[4,25],against:25,age:27,agg_level:[38,39],aggreg:25,aggregatelevel:[38,39],agre:[11,14],alexnet_pass1:35,alexnet_pass2:35,algo_hrnn_demo:39,all:[8,11,14,25,27,41],allow:25,allow_only_one_model_on_one_gpu:[32,33,35],alreadi:[8,25],also:37,alwai:[25,27],amazon:[25,26],amazonaw:25,amazonec2fullaccess:25,amazonelasticfilesystemfullaccess:25,amazonroute53domainsfullaccess:25,amazonroute53fullaccess:25,amazons3fullaccess:25,amazonvpcfullaccess:25,amd64:25,amend:4,among:25,andd:25,ani:[11,14,25],anoth:25,ans:25,answer:25,anyth:25,apach:[11,14],api:[3,17,18,19,20,21,25,27,37],api_pydataprovider2:11,api_trainer_config_helpers_lay:42,api_v2:38,apiserv:25,apivers:[25,26,27],append:[21,27,39,42],appleyard:37,applic:[11,14,25,26,37],apt:1,archiv:17,arg:[13,27],argpars:27,args_ext:27,argument:[17,19,20,27],argumentpars:27,arn:25,around:25,arrai:[11,13,14,19],arrari:19,articl:4,artifact:25,assign:[19,25],async:32,async_count:[32,33],async_lagged_grad_discard_ratio:[21,33],async_lagged_ratio_default:[32,33],async_lagged_ratio_min:[32,33],attr:[11,12,13,42],auc:32,authent:25,author:[11,14,25],auto:[0,6,37],autom:25,automat:25,avail:25,averag:11,average_test_period:[32,33],avg:[37,38],avoid:37,avx:1,await:26,awar:25,awk:28,aws_account_id:25,awsaccountid:25,awskeymanagementservicepowerus:25,axi:11,b363:26,b8561f5c79193550d64fa47418a9e67ebdd71546186e840f88de5026b8097465:26,ba5f:25,backward:6,backward_first:42,backwardactiv:6,baidu:26,balanc:25,bare:26,barrier:22,barrierstatset:37,base:25,basematrix:6,bash:[0,1,4,25,26,27,31],basi:[11,14],batch:[11,14,19,25,26,27],batch_id:[11,14],batch_siz:[11,14],batchsiz:6,beam:42,beam_gen:42,beam_search:[41,42],beam_siz:[32,33,35,42],becaus:[25,39],befor:[8,11,25],beginn:42,below:25,besteffort:26,better:25,between:25,bia:6,bias_attr:[11,13,39,42],biases_:6,biasparameter_:6,biassiz:6,bidi:26,bidirectional_lstm:12,bilinearfwdbwd:37,bin:[0,1,20,21,25,26,27,31],binari:[20,25],blank:25,blob:0,book:[1,7,42],bool:[6,19,33,35],boot:[41,42],boot_lay:[39,42],bos_id:42,both:25,branch:4,broken:4,browser:25,bucket_nam:25,bug:[4,25],build:[0,4,7,8,17,25,27,29,30,31],build_doc:7,built:37,button:25,c703c041:4,c99e:25,cach:11,cache_pass_in_mem:11,cachetyp:11,caffe_poli:13,call:[25,27,37],callback:6,caller:25,calrnn:39,can:[25,37],cannot:8,capac:25,capi:17,capi_priv:17,cat:[1,27,28],categoryfil:26,caution:25,cbla:17,cento:3,certif:[8,25],cfg:26,chang:[4,25],channel:37,check:[4,6,8,13,19,25,33],check_eq:6,check_l:6,check_sparse_distribution_batch:[32,33],check_sparse_distribution_in_pserv:[32,33],check_sparse_distribution_ratio:[32,33],check_sparse_distribution_unbalance_degre:[32,33],checkgrad:33,checkgrad_ep:33,checkout:4,chmod:25,claim:25,claimnam:[25,27],clang:4,classification_cost:[11,39],claster:25,clean:[4,8],cli:25,click:25,clip:33,clone:[0,7,17],close:4,cludform:25,cluster:[21,23,27],cluster_test_fil:21,cluster_train:[11,23],cluster_train_fil:21,cluster_train_v2:[23,24,28],cm469:25,cmake:[0,4,7,8,17,37],cmakefil:8,cmakelist:6,cname:25,cnn:26,code:[6,25,26],coded_stream:13,codedinputstream:13,colindic:19,collectbia:6,colum:19,column:19,com:[0,1,4,7,8,17,25,26],command:[0,6,25,26,27,29,30,35],commandlin:[27,37],comment:[27,39],commit:26,commun:25,compil:[0,31],complet:[25,26,27],complianc:[11,14],compress:19,comput:25,concat:42,concaten:11,condit:[11,14,26],conf:[13,23,27,39],conf_paddle_gradient_num:[25,27],conf_paddle_n:[25,27],conf_paddle_port:[25,27],conf_paddle_ports_num:[25,27],conf_paddle_ports_num_spars:[25,27],config:[6,14,17,25,26,27,32,33,35],config_:33,config_arg:[32,33,35],config_lay:6,config_pars:6,configprotostr:13,configur:[6,16],conflict:4,connect:[9,25,26],consol:25,constant:13,contain:[0,25,26,27],containerport:25,content:7,content_dir:7,context:[11,42],contin:25,control:[25,26],cool:4,copi:[11,14,25],copyright:[11,14],coreo:25,correct:25,correspond:13,cost:[11,14],could:25,count:[21,26,33,35,37],cp27:3,cp27m:3,cp27mu:3,cpp:[6,13,27,37,39],cpu:[0,26,35,37],cpu_avx_mkl:3,cpu_avx_openbla:3,cpu_noavx_openbla:3,cpuinfo:1,crash:37,creat:[4,6,7,11,13,14,19,26,27,28],create_bias_paramet:6,create_input_paramet:6,createstack:25,creation:25,creationd:25,crlf:4,cross:13,csc:6,csr:[6,19],csv:13,ctest:[0,4],ctrl:[0,23],cuda7:[3,16],cuda8:[0,1,3],cuda:[33,37],cuda_dir:[32,33],cuda_so:[1,8],cuda_visible_devic:11,cudaconfigurecal:37,cudadevicegetattribut:37,cudaeventcr:37,cudaeventcreatewithflag:37,cudafre:37,cudagetdevic:37,cudagetdevicecount:37,cudagetdeviceproperti:37,cudagetlasterror:37,cudahostalloc:37,cudalaunch:37,cudamalloc:37,cudamemcpi:37,cudaprofilerstart:37,cudaprofilerstop:37,cudaprofilestop:37,cudaruntimegetvers:37,cudasetdevic:37,cudasetupargu:37,cudastreamcr:37,cudastreamcreatewithflag:37,cudastreamsynchron:37,cudeviceget:37,cudevicegetattribut:37,cudevicegetcount:37,cudevicegetnam:37,cudevicetotalmem:37,cudnn_conv_workspace_limit_in_mb:[32,33],cudnn_dir:[32,33],cudnnv5:0,cudrivergetvers:37,cuinit:37,curl:25,current:[7,25],current_word:[11,42],custom:25,cxxabi_1:3,d3e0:25,darwin:25,data:[11,14,16,19,22,26,27,29,32,39,42],data_batch:11,data_lay:[11,39],data_typ:[12,14,16,19,42],dataprovid:[11,13,27],dataset:[14,16,21,42],dcmake_build_typ:[7,17],dcmake_install_prefix:17,dcuda_arch_nam:0,dcudnn_root:0,deb:4,debug:7,decod:[41,42],decoder_boot:42,decoder_group_nam:42,decoder_input:[11,42],decoder_mem:42,decoder_s:[11,42],decoder_st:42,decrypt:25,deep:37,def:[6,11,13,14,27,39,42],default_decor:27,default_devic:35,default_valu:35,defin:[11,14],define_py_data_sources2:13,delet:4,deletestack:25,delimit:13,demo:[26,29],dens:25,dense_arrai:12,dense_vector:[14,16,19],dense_vector_sequ:19,dense_vector_sub_sequ:19,deploy:25,describ:[25,26],describestack:25,describestackev:25,describestackresourc:25,descript:[25,27],desir:[25,26],detail:25,detect:4,dev:[0,1,4,8],develop:[0,4],devic:[1,8,35],deviceid:35,dict:[13,27,39],dict_dim:[11,39],dict_fil:39,dictionari:11,diff:[4,11],differ:25,dig:25,dim:6,dimens:11,dimension:19,dir:[8,27],directori:[4,7,26,37],disabl:13,discard:[4,33],discexp:13,discoveri:25,dist:[0,8],distribut:[11,14,29,30,33],distribute_test:[32,33],diy_beam_search_prob_so:[32,33],dnn:8,dns:25,doc:[7,21,23,27],doc_cn:7,docker:[0,1,4,7,8,25,26,27,29,30,31],docker_clust:[23,28],dockerfil:[0,4,27],domain:25,don:25,done:[4,25,27,37],dot_period:[27,33,35],doubl:33,down:37,download:[8,26],doxygen:4,dpython_execut:8,dpython_include_dir:8,dpython_librari:8,drop_fc:12,drop_rat:12,dropout:12,dropout_r:12,drwxr:26,dtoh:37,dtype:13,dump:20,dump_config:20,dump_v2_config:20,dure:25,dwith_c_api:17,dwith_doc:7,dwith_golang:17,dwith_gpu:[0,7,17],dwith_mkl:[7,17],dwith_profil:37,dwith_python:17,dwith_swig_pi:17,dwith_test:0,dwith_tim:37,dynamic_cast:6,each:25,echo:[1,8],edit:25,edu:[25,26],eeoi3ezpr86c:25,effect:25,efs:25,efs_dns_nam:25,efsvol:25,either:[11,14],elb:25,elbapis:25,electron:26,els:[1,6,39],emac:0,emb1:39,emb2:39,emb:[11,13,26,39],emb_group:39,emb_para:13,emb_param_fil:13,emb_sum:11,embed:[13,42],embedding_lay:[11,39],embedding_nam:42,embedding_s:42,emplace_back:6,enabl:[25,37],enable_grad_shar:[32,33],enable_parallel_vector:33,enc_proj:42,enc_vec:42,encod:39,encoded_proj:42,encoded_sequ:42,encoded_vector:42,encoder1:39,encoder1_expand:39,encoder1_last:39,encoder1_rep:39,encoder2:39,encoder2_rep:39,encoder_s:42,encrypt:25,encrypt_decrypt:25,end:[4,42],endforwardbackward:11,enditer:[11,14],endpass:14,endpoint:25,entri:25,enumer:13,env:[7,11,25,27],environ:[8,25,26,37],environmenterror:21,eol:4,eos_id:42,equal:39,error:[8,13,17,25,33],error_clipping_threshold:11,eta:26,etc:25,eth0:[25,27],evalu:[20,37],evenli:25,event:[11,14,26],event_handl:[11,14],everi:14,exactli:[11,25],exampl:25,exc_path:8,except:14,execut:25,exist:25,exit:26,exp:13,expand:39,expand_a:[38,39],expand_lay:39,expand_level:38,expandlevel:38,explicit:6,expos:25,express:[11,14,25],extern_mklml:8,extraattr:35,extract:25,extralayerattribut:[11,12],f1205:13,f120da72:26,f7e3:25,fa0wx:26,fabric:[23,24],fail:[8,13,26,33],fals:[6,11,14,16,19,21,26,35,39,42],faq:43,fast:37,fbd1f2bb71f4:26,fc1:[6,35],fc2:35,fc3:35,fc4:35,fc8a365:25,fc8a:25,fc_layer:[11,13,35,39],fclayer:6,fdata:39,featur:4,feed:14,fetch:[4,8],field:[11,25],file:[4,11,14],file_nam:[13,39],filenam:11,filesystem:25,fill:[19,25],fingerprint:25,finish:[25,26],first:[19,25],first_seq:42,firstseen:26,fix:[4,19],flatten_result:11,flist:21,float32:13,floor:13,fmt:13,fname:13,fnt03:25,folder:25,follow:[25,29,30],format:[4,6,16,19,25],forward:6,forwardactiv:6,fparam:13,from:[8,20,25,26,37,41],from_no_sequ:38,from_sequ:38,from_tar:14,fromfil:13,fromstr:13,fulfil:37,full_matrix_project:[39,42],fullyconnectedlay:6,gate_act:39,gcc:0,gcc_3:3,gen_proto_pi:7,gen_rand_param:13,gender:27,gener:[14,19,25,27,35,37],generatedinput:[41,42],genr:27,get:[1,6,11,19,23,25,26,28],get_config_arg:35,get_data:26,get_grad:11,get_input_lay:6,get_sample_from_lin:11,get_support:[3,8],getbatchs:6,getenv:[21,27],gethostbynam:27,gethostnam:27,getidmap:27,getinput:6,getinputgrad:6,getinputvalu:6,getlayeroutput:11,getoutputgrad:6,getoutputvalu:6,getparameterptr:6,getpodlist:27,getsiz:6,gettempl:25,gettranspos:6,getw:6,getweight:6,getwgrad:6,gflag:17,gflags_complet:17,gflags_declar:17,git:[0,4,7,8,17],github:[0,4,7,8,17],give:25,glibc_2:3,glibcxx_3:3,global:[0,25,37],globalstat:37,globalstatinfo:37,glog:17,googl:[13,17],googleapi:25,govern:[11,14],gpg2:25,gpg:25,gpu:[1,3,8,16,19,21,35,37],gpu_id:[11,33,35],gpugpu_id:32,grad:[11,33],grad_share_block_num:[32,33],gradient:[20,21,22,27,33],gradient_clipping_threshold:11,gradient_machin:17,gradientmachin:27,grant:25,greater:19,grep:[1,28],groudtruth:42,group_input1:42,group_input2:42,group_input:[39,42],gru_decod:42,gru_decoder_with_attent:42,gru_step:42,grumemori:[12,42],gserver:6,gsizex:37,guest:3,guid:[25,26],gzip:26,half:25,hard:25,has:[19,25,37],hassubseq:39,have:25,head:28,header:13,headip:28,height:[6,13,19],help:4,here:[19,25],hidden:[12,13,25],hidden_a:13,hidden_b:13,hidden_dim:39,hierach:41,hl_get_sync_flag:6,hold:25,home:[1,25,26,27,28],hook2:39,hook:39,host:[25,26],hostfil:28,hostnam:25,hostnetwork:27,hostpath:[26,27],hostport:25,hous:16,how:25,howardjohnson:39,howev:25,howto:[21,23,27],htod:37,http:[0,1,4,7,8,11,14,17,25,26],i1116:27,i1117:37,iamfullaccess:25,iamusersshkei:25,id_rsa:28,ident:25,idmap:27,ids:[11,19],ids_arrai:19,idx:6,iil:13,imag:[0,4,25,26,27,29,30],imagepullpolici:[25,27],imgsiz:37,imgsizei:37,imgsizex:37,immedi:25,impli:[11,14],improv:25,in_arg:19,inbound:25,inc_path:8,includ:[4,17,25,37],increas:13,incupd:6,index:[25,39],indic:[19,25],individu:25,infer:[14,16],info:[6,11,14,23,27,39],inform:25,infrastructur:25,init:[6,14,16,21,25,27],init_hook:39,init_model_path:[32,33,35],initi:[16,33],initial_max:13,initial_mean:13,initial_min:13,initial_std:13,inlin:25,inner:[11,19,39],inner_:39,inner_mem:39,inner_pos_arrai:19,inner_rnn_output:39,inner_rnn_st:39,inner_rnn_state_:39,inner_seq_pos_arrai:19,inner_step:39,inner_step_impl:39,input:[6,11,12,13,14,16,19,27,35,38,39,41,42],input_data:6,input_data_target:6,input_hassub_sequence_data:6,input_index:6,input_label:6,input_lay:6,input_sequence_data:6,input_sequence_label:6,input_sparse_float_value_data:6,input_sparse_non_value_data:6,input_t:6,input_typ:[11,39],inputdef:6,inputlayers_:6,insert:4,insid:25,instal:[0,1,3,4,7,8,16,17,26,31],install_step:16,instance_ip:25,int32:33,integ:14,integer_sequ:11,integer_valu:[11,14,19,39],integer_value_sequ:[19,39,42],integer_value_sub_sequ:[19,39],interact:25,interfac:25,intern:25,invok:[25,37],ip_str:27,ips:[25,27],ipt:[13,39,42],is_async:21,is_inf:20,is_seq:42,is_stat:13,isbinari:19,isinst:[11,14],ispodallrun:27,isspars:6,issu:[0,10],issue_numb:4,item:[16,27],its:[25,37],ivector:[19,20],jeremi:37,job:[27,32,33,35],job_dispatch_packag:23,job_nam:[25,27],job_namespac:[25,27],job_path:[25,27],job_path_output:27,job_workspac:23,jobnam:27,jobpath:[25,27],jobport0:25,jobport1:25,jobport2:25,jobport3:25,jobselector:27,join:39,json:[25,26],just:25,jx4xr:25,k8s:27,k8s_data:[25,27],k8s_train:[25,27],kebilinearinterpbw:37,kebilinearinterpfw:37,kei:[0,4,11,27,37],key1:33,key2:33,key_pair_nam:25,keyid:25,keymetadata:25,keypair:25,keyserv:25,keystat:25,keyusag:25,keyword:27,kill:25,kind:[11,14,25,26,27],kms:25,know:25,kubeconfig:25,kubectl:[23,26,27,28],kuberent:25,kubernet:[24,27,29,30],kwarg:39,l2regular:[11,21],label:[11,14,18,26,39],label_dim:39,labelselector:27,lag:33,lan:31,languag:[11,14],larg:4,last:[38,39],last_seq:39,lastseen:26,latenc:25,later:25,latest:[1,4,7,8,26,27],launch:25,law:[11,14],layer1:[11,38],layer2:[11,38],layer:[6,11,13,14,16,19,20,38,41,42],layer_0:6,layer_att:12,layer_attr:[11,12,35,42],layer_expand:38,layer_first_seq:38,layer_last_seq:38,layer_nam:11,layer_num:35,layer_pool:38,layer_s:19,layerbas:6,layerconfig:6,layergradutil:6,layermap:6,layers_test:8,ld_library_path:17,learn:[1,37],learning_r:[11,13,21],learning_rate_arg:13,learning_rate_decay_a:13,learning_rate_decay_b:13,learning_rate_schedul:13,leav:25,left_scor:11,len:[6,16,27,39],length:26,let02:26,let:25,level:[19,41],lib64:[1,8,33],lib:[0,17],lib_path:8,libc:3,libcuda:[1,8],libgcc_:3,libgflag:17,libglog:17,libnvidia:[1,8],libopenbla:17,libpaddl:4,libpaddle_capi_engin:17,libpaddle_capi_lay:17,libpaddle_capi_shar:17,libpaddle_capi_whol:17,libprotobuf:[13,17],librari:33,libstdc:3,libz:17,licens:[11,14],like:25,limit:[11,13,14,37],line:[4,11,13,25,35,39],line_count:13,linear:[11,13,14,16],link:[25,41],linux:[0,3,25],linux_x86_64:[3,8],list:[11,25,35],listdir:21,lite:17,load:[14,25,27],load_missing_parameter_strategi:[32,33,35],load_paramet:13,loadsave_parameters_in_pserv:[32,33],local:[0,27,32,33],localhost:[1,7],localip:27,log:[3,6,9,13,21,23,25,26,27,28,33],log_barrier_abstract:[32,33],log_barrier_lowest_nod:[32,33],log_barrier_show_log:[32,33],log_clip:[32,33],log_error_clip:[32,33],log_period:[26,27,33,35],log_period_serv:[32,33],logger:[11,39],look:25,lpaddle_capi_engin:17,lpaddle_capi_lay:17,lstm:[26,39,42],lstm_group:39,lstm_group_input:39,lstm_input:39,lstm_last:39,lstm_nest_group:39,lstm_output:39,lstmemori:[12,39,42],lstmemory_group:[12,39],lstmemory_unit:12,machin:[11,20,25,28,41],machine_transl:42,maco:[0,3],mai:[11,14,25],main:[17,25],maintain:25,make:[0,4,6,7,8,17,25,37],manag:7,manual:13,manylinux1:3,manylinux1_x86_64:[3,8],map:[14,17],mat:19,match:11,math:[6,37],matrix:[6,17,19,20],matrixptr:6,max:[13,35,37,38],max_length:42,mean:[11,25,33],mechan:25,mem:39,memcpi:37,memori:[26,37,42],memory_nam:12,memory_threshold_on_load_data:[32,33],merg:[4,20],merge_model:20,merge_v2_model:20,merge_v2_modelss:20,messag:[4,8,26],metadata:[25,26,27],mfs:27,might:25,min:[25,35,37],min_pool_s:11,mini:11,minut:25,mit:25,mix:42,mixed_lay:39,mkdir:[7,17,25,28],mkl:[0,8,17],mklml:8,mklml_lnx_2018:8,mnist:20,mnist_v2:20,mnt:27,mobil:7,mode:[4,27],model:[7,14,16,20,25,35],model_list:[33,35],model_path:35,modifi:[4,25],modul:13,momentum:[11,14],mon:26,more:[13,37],mount:25,mountpath:[25,26,27],move:25,movie_id:27,mpi:28,mpirun:28,mul:6,multipl:[14,25],must:[6,11,21,25],my_cluster_nam:25,my_cost:13,my_external_dns_nam:25,my_lib:21,mypaddl:[26,27],name:[6,13,14,16,26,27,29,30,35,37,39,42],namespac:[6,26,27],ndarrai:11,need:[8,25,27,37],need_tran:13,nest:19,net:[0,20],network:[12,14,16,19,20,27,35,39],network_config:35,networkadministr:25,neural:[16,39,41],never:[25,26,27],next:25,nfs4:25,nfs:[25,27],nfsdir:27,nfsver:25,nic:[27,32,33],nmt_without_attent:11,nnz:[6,19],no_sequ:14,node0:27,node1ip:28,node2ip:28,node3ip:28,node:[25,26,27,28],node_0:[25,27],node_1:[25,27],node_2:[25,27],node_id:21,nodefil:23,nohup:21,non:25,none:42,normal:[26,27],note:25,notebook:1,noth:4,now:[25,41],nproc:0,nullptr:6,num:[21,27,33],num_gradient_serv:[21,32,33],num_pass:[14,26,27,32,33,35],num_samples_process:13,number:25,numdevices_:35,numlogicaldevices_:35,numpi:[0,11,13,14],numsampl:37,nvidia:[1,8],obj:13,object:37,obtain:[11,14],occup:27,oct:26,off:[0,4,7,17,31],offici:25,offset:19,ograd:6,omit:11,ompi_comm_world_rank:21,onc:25,one:[19,25],onli:[19,25,39,41],onto:25,open:[11,13,14,25,39],openbla:[0,1,17],openmpi:[24,28],oper:25,opt:[0,27],optim:[11,13,14,21],optimzi:11,order:[25,27],oregon:25,org:[11,14],origin:4,other:25,our:25,out:[11,25,39,41,42],out_dir:[25,27],out_mem:42,outer:39,outer_mem:39,outer_rnn_st:39,outer_rnn_state_:39,outer_step:39,output:[11,14,20,23,26,27,35,39,42],output_fil:20,output_lay:[11,14,16],output_mem:42,outter:19,outter_pos_arrai:19,outter_seq_pos_arrai:19,outv:6,own:25,pack:13,packag:[8,25],paddl:[0,1,3,4,6,7,11,13,14,16,17,19,20,21,23,25,26,27,28,31,35,37,42],paddle_arguments_get_sequence_start_po:19,paddle_arguments_set_id:19,paddle_arguments_set_sequence_start_po:19,paddle_arguments_set_valu:19,paddle_capi:17,paddle_doc:7,paddle_docs_cn:7,paddle_gradient_machine_create_shared_param:20,paddle_gradient_machine_forward:20,paddle_gradient_machine_load_parameter_from_disk:20,paddle_init:20,paddle_init_num_gradient_serv:21,paddle_init_port:21,paddle_init_ports_num:21,paddle_init_ports_num_for_spars:21,paddle_init_pserv:21,paddle_init_trainer_count:21,paddle_init_trainer_id:21,paddle_init_use_gpu:21,paddle_ivector:19,paddle_ivector_cr:19,paddle_manylinux_devel:0,paddle_matrix:[19,20],paddle_matrix_cr:19,paddle_matrix_create_spars:19,paddle_matrix_get_row:19,paddle_matrix_sparse_copy_from:19,paddle_n:27,paddle_output:26,paddle_port:27,paddle_ports_num:27,paddle_ports_num_spars:27,paddle_process_by_paddl:27,paddle_pserver2:23,paddle_r:19,paddle_root:17,paddle_server_num:27,paddle_train:[23,27],paddlepaddl:[0,1,3,4,11,14,16,17,19,20,23,26,27,29,30,31,37,42],paddlepaddle_gpu:3,paddlepaddlebook:1,paddlepaddlehub:1,page:25,parallel:[25,26,27,35,37],parallel_nn:[32,33],param:[11,13],param_attr:[11,13,42],param_fil:[13,20],paramattr:[11,13,42],paramet:[11,13,14,16,19,22,23,27,33],parameter_block_s:[32,33],parameter_block_size_for_spars:[32,33],parameterclient2:27,parametermap:6,parameters_:6,params_pass_4:20,params_pass_90:14,params_pass_:14,paramt:25,paraspars:6,pars:[0,25],parse_known_arg:27,parsefromstr:13,parser:27,partit:25,paserv:27,pass:[4,11,14,25,26,27,33,35,37],pass_id:[11,14],pass_manu:13,passtyp:6,past:25,path:[17,25,26,27,33],path_to_paddlepaddle_working_directori:7,pattern:25,pd_api:19,peer:9,pem:25,pep425tag:[3,8],perform:[32,37],period:33,permiss:[11,14,25],persist:25,persistentvolum:25,persistentvolumeclaim:[25,27],pgp:25,pick:25,pickl:[21,28],pip:[0,3,4,7,8,16],platform:[3,25],pleas:[7,8,25,27],pnpairvalid:32,pod:[25,26,27],pod_nam:25,podip:27,podlist:27,point:[19,37],pointer:19,poli:13,polici:25,pool3:6,pooling_lay:11,pooling_typ:[11,38],port:[21,25,26,27,32,33],port_num:32,portal:7,ports_num:[21,27,33],ports_num_for_spars:[21,27,32,33,35],posit:19,post:0,potenti:37,pow:13,ppo_workspac:7,pre:25,predict:[11,16,20],predict_fil:[32,33],predict_output_dir:[32,33],prefetch:6,prefix:25,pregrad:6,prepar:[21,28,29],present:4,prev_batch_st:[32,33],previou:25,price:16,print:[3,8,14,16,28],printallstatu:37,printstatu:37,privat:4,privileg:25,prob:16,proc:1,proce:25,process2:39,process:[11,13,25,27,39],processor:37,prod:4,product:[14,25],productgraph:26,profil:37,proflier:37,prog:27,program:[27,37],protect:6,protobuf:[13,17,20],provi:21,provid:[11,16,25,32,39],provis:25,pserver:[21,23,25,27,32,33],pserver_num_thread:[32,33],psize:6,pub:28,pull:1,purpos:37,push:27,push_back:6,pvc:25,pwd:[0,1,4,7],py_paddl:8,pydataprovid:11,pydataprovider2:27,python:[0,3,4,6,7,8,11,16,20,21,28,42],pythonpath:8,queri:25,question:25,quick_start:[25,26,27,29],quick_start_data:26,quickstart:26,rais:21,ran:37,rand:[13,19,33,35,37],rand_max:19,random:13,randomli:19,rang:27,rank:25,rate:[11,27],rather:25,ratio:33,rdma_tcp:[32,33],read:[13,25],read_next_from_fil:11,read_paramet:13,reader:[14,21],readi:[25,26],readm:4,readwritemani:25,reason:26,recommend:27,record:25,recurr:[39,40],recurrent_group:[11,12,39,41,42],recv:25,refine_unknown_arg:27,register_gpu_profil:37,register_lay:6,register_timer_info:37,registri:26,regular:[11,21,25],releas:[8,17,25],remot:[4,25,33,35],remov:4,removing_docker_contain:0,repositori:7,repres:25,request:[25,26],requir:[7,11,14,25],reserv:[11,14],reserveoutput:6,reset:9,reshap:13,resolv:[4,26],resourc:25,respons:[25,26],restart:[25,26],restartpolici:[25,26,27],result:[14,25,37],retran:25,rev:0,revers:[41,42],review:[4,26],reviews_electronics_5:26,right:[11,14],right_scor:11,rkt:0,rmsprop:11,rnn:[32,41,42],rnn_bias_attr:42,rnn_layer_attr:42,rnn_out:42,rnn_state:39,rnn_state_:39,rnn_use_batch:[32,33],role:25,root:[25,26,27],row:19,row_offset:19,rowoffset:19,rsize:25,rstrip:27,rule:25,run:[0,1,4,7,8,23,25,26,27,29,30,31,37],run_test:0,runinitfunct:[27,37],runserv:7,runtim:8,same:[25,39],sampl:19,satisfi:25,save:[25,26],save_dir:[26,27,33,35],save_only_on:[32,33],save_parameter_to_tar:14,savetxt:13,saving_period:[27,32,33],saving_period_by_batch:[32,33,35],schdule:25,schedul:25,score:11,score_diff:11,scp:28,script:[0,7,23,25,28],search:42,secret:25,section:25,see:[11,13,14,25],seed:[13,33,37],select:25,selector:26,self:6,send:25,sent:26,sentanc:11,sentenc:[39,42],sentence_last_state1:39,sentence_last_state2:39,seq:39,seq_po:19,seq_pool:38,seq_pos_arrai:19,seqlastin:39,sequenc:[11,14,19,39,41],sequence_layer_group:39,sequence_nest_layer_group:39,sequence_recurr:13,sequence_start_posit:19,sequencegen:39,sequencetyp:14,seri:39,serv:25,server:[0,9,21,22,23,27,33],set:[0,7,11,13,14,19,25,26,37,39],set_active_typ:6,set_drop_r:6,set_siz:6,set_typ:6,setp:25,setq:0,settotalbyteslimit:13,setup:6,sever:25,sgd:[14,22,32],shape:14,shard:[22,25],shell:25,should:[7,25,41],should_shuffl:39,show:[0,25],show_check_sparse_distribution_log:[32,33],show_layer_stat:[32,33],show_parameter_stats_period:[26,32,33,35],shown:25,shuf:11,shuffl:11,sid:25,sig:25,sigint:23,sigmoidactiv:39,sign:25,signatur:25,similar:25,simpl:27,simple_attent:42,simple_gru:42,simple_lstm:12,simple_rnn:42,simplest:25,simultan:25,sinc:25,site:25,size:[6,11,13,14,16,19,39,42],size_t:6,sizeof:19,skip:[4,13,25],sleep:27,small_messag:[32,33],snap:26,snapshot:25,snippet:25,sock_recv_buf_s:[32,33],sock_send_buf_s:[32,33],socket:27,softmax:[6,11,42],softmax_param:13,softmaxactiv:39,softwar:[11,14],some:25,sort:[25,27],sourc:25,source_dict_dim:42,source_language_word:42,spars:[6,11,19,21,25,27,33,35],sparse_binary_vector:[11,14,19],sparse_binary_vector_sequ:19,sparse_binary_vector_sub_sequ:19,sparse_float_vector:14,sparse_upd:11,sparse_vector:[11,19],sparse_vector_sequ:19,sparse_vector_sub_sequ:19,sparseparam:6,sparseprefetchrowcpumatrix:6,spec:[25,26,27],specif:[11,14],specifi:[7,25],sphinx:7,split:[21,25,39],split_count:[21,25,27],square_error_cost:14,squash:4,srand:[19,33],src:[8,21,23,27],src_backward:42,src_dict:13,src_dict_path:13,src_embed:42,src_forward:42,src_word_id:42,ssh:[25,28],ssh_server:23,stabl:25,stack:25,stage:4,stamp:8,stanford:26,start:[4,8,19,26,27,33],start_mpi_train:28,start_paddl:27,start_pass:[32,33],start_pserv:[32,33],startpaddl:27,startup:25,stat:[33,37],state:[12,26,41],state_act:39,statement:25,statfulset:27,staticinput:[41,42],statset:37,statu:[4,25,26,27,37],status:26,std:[6,33],stdbuf:21,stderr:23,stdout:23,step:[25,39,41,42],stepout:39,stmt1482205552000:25,stmt1482205746000:25,storag:25,store:25,str:[27,35],strategi:33,string:[6,25,33],strip:[13,39],struct:13,structur:25,sts:25,stuff:4,sub_sequ:14,submit:25,subnet0:25,subnet:25,subobjectpath:26,subseq:[38,41],subsequenceinput:39,succeed:26,success:[25,26],successfulcr:26,sudo:[0,25],suffix:21,sumpool:11,support:[3,25,39],suppos:19,sure:25,swig:0,switch_ord:12,symlink:4,syncflag:6,synchron:25,system:11,tag:[1,4,8,31],tainer_id:27,take:25,tanh:[6,42],tanhactiv:39,tar:[8,14,20,25],tarbal:25,target_dict_dim:42,target_language_word:42,targetinlink:39,task:20,tbd:39,tcp:[25,33],tear:37,tee:26,templat:[26,27],termin:26,test:[4,6,16,19,21,28,33,35,37],test_all_data_in_one_period:26,test_compar:8,test_comparespars:8,test_comparetwonet:8,test_comparetwoopt:8,test_config_pars:8,test_data_dir:21,test_fcgrad:6,test_gpuprofil:37,test_layergrad:6,test_list:13,test_networkcompar:8,test_pass:[32,33,35],test_period:[32,33,35],test_predict:8,test_pydataprovid:8,test_pydataprovider2:8,test_pydataproviderwrapp:8,test_recurrent_machine_gener:8,test_recurrentgradientmachin:[8,39],test_sum_op:0,test_swig_api:8,test_train:8,test_traineronepass:8,test_wait:[32,33],testbilinearfwdbwd:37,testconfig:6,testfcgrad:6,testfclay:6,testlayergrad:6,testutil:6,text:25,tflop:37,tgz:[3,8],than:[13,19,25],thei:[25,37],them:[25,37],thi:[3,11,14,19,25,37],third:19,third_parti:[8,17],thread:37,thread_local_rand_use_global_se:[32,33],threadid:35,threadloc:37,threshold:33,through:7,throughput:37,thu:25,tier:26,time:[19,26,27,33,37,39],timeo:25,timer:37,titl:27,to_no_sequ:38,to_sequ:[38,39],to_your_paddle_clone_path:7,token:42,tool:[7,25,27],toplevel:0,tostr:13,total:[26,37],touch:8,track:4,train:[1,9,11,14,21,23,26,27,28,29,30,33,35,42],train_arg:27,train_args_dict:27,train_args_list:27,train_config_dir:[25,27],train_data:21,train_data_dir:21,train_i:14,train_id:25,train_list:[13,21],train_read:14,train_x:14,trainer:[6,14,21,22,27,33,35],trainer_config:[20,25,26,27],trainer_config_help:6,trainer_count:[11,16,21,25,26,27,32,33,35],trainer_id:[21,25,27,33],trainerconfighelp:13,trainerid:27,tran:6,translat:11,travi:4,tree:27,trg_embed:42,tune:32,turn:41,tutori:[25,27,28,29,30],two:[25,37],txt:[6,7,21,25,28],type:[6,14,16,19,25,26,35,39,42],ubuntu:[3,16],uci_h:16,uid:26,uint64_t:19,under:[11,14,25],understand_senti:42,undeterminist:37,uninstal:[0,8],uniqu:25,unique_ptr:6,unittest:8,unless:[11,14],until:[25,27],untrack:4,updat:[4,35],update_equ:14,updatecallback:6,updatestack:25,upgrad:[3,8],upstream:[4,8],uri:25,usag:[23,27],use:[4,11,14,19,21,25,27,37],use_gpu:[11,14,16,21,26,27,32,33,35],use_old_updat:[32,33],used:[25,37],usegpu:[6,19],user:25,user_id:27,usernam:4,uses:25,using:[16,25],usr:[0,1,8,21,25,27,33],usual:[25,37],util:[20,27,37],valid:25,valu:[6,11,19,25,27,35],value1:33,value2:33,valueerror:11,vanilla:42,vari:25,variabl:[25,26],vec:13,vector:[17,19],version:[4,11,14,25,31,32,33,37],via:[4,25],vim:1,virtualenv:0,volum:[7,26,27],volumemount:[25,26,27],volumn:25,wait:27,warn:[8,13,27],warranti:[11,14],wbia:25,weight:6,weightlist:6,weights_:6,weights_t:6,well:25,west:25,wget:8,what:4,wheel:3,when:[25,37],whether:19,which:[14,25],whl:[0,3],whole:[17,25],why:37,wide:[23,28],width:[6,13,19],window:0,with_avx:[0,4,31],with_c_api:[0,17],with_doc:0,with_doubl:[0,6,31],with_dso:0,with_golang:[0,17],with_gpu:[0,4,17,31],with_mkl:[0,17],with_profil:37,with_python:[0,17,31],with_rdma:31,with_style_check:[0,4],with_swig_pi:[0,17],with_test:[0,4],with_tim:[31,37],without:[11,14],wmt14:42,won:39,word2vec:[11,21,23],word:[11,39,41],word_dict:[21,28,39],word_dim:[13,39],word_id:11,word_vector_dim:42,work:[1,4,7,25,26,27,39],worker:25,workercount:25,workflow:25,workspac:[21,23],would:25,wrapper:37,write:[11,13,14,25],wsize:25,www:[11,14],xarg:[1,6,8,28],xgbe0:33,xgbe1:33,xrang:[6,14,16],xxxxxxxxx:25,xxxxxxxxxx:25,xxxxxxxxxxxxx:25,xxxxxxxxxxxxxxxxxxx:25,y_predict:[14,16],yaml:[23,25,26,27,28],yapf:4,yield:[11,14,39],you:[11,14,25],your:[8,25],your_access_key_id:25,your_param_nam:13,your_repo:27,your_secrete_access_kei:25,zaist:0,zero:[25,33],zhihu:0,zhuanlan:0,zip:27,zlib:17,zone:25,zxf:8,zxvf:25},titles:["\u4ece\u6e90\u7801\u7f16\u8bd1","\u4f7f\u7528Docker\u5b89\u88c5\u8fd0\u884c","\u5b89\u88c5\u4e0e\u7f16\u8bd1","\u4f7f\u7528pip\u5b89\u88c5","\u5982\u4f55\u8d21\u732e\u4ee3\u7801","\u5f00\u53d1\u6807\u51c6","\u5982\u4f55\u5b9e\u73b0\u65b0\u7684\u7f51\u7edc\u5c42","\u5982\u4f55\u8d21\u732e\u6587\u6863","\u7f16\u8bd1\u5b89\u88c5\u4e0e\u5355\u5143\u6d4b\u8bd5","\u96c6\u7fa4\u8bad\u7ec3\u4e0e\u9884\u6d4b","FAQ","\u672c\u5730\u8bad\u7ec3\u4e0e\u9884\u6d4b","\u6a21\u578b\u914d\u7f6e","\u53c2\u6570\u8bbe\u7f6e","\u57fa\u672c\u4f7f\u7528\u6982\u5ff5","\u65b0\u624b\u5165\u95e8","\u5feb\u901f\u5f00\u59cb","\u5b89\u88c5\u4e0e\u7f16\u8bd1C-API\u9884\u6d4b\u5e93","C-API\u9884\u6d4b\u5e93","\u8f93\u5165/\u8f93\u51fa\u6570\u636e\u7ec4\u7ec7","C-API\u4f7f\u7528\u6d41\u7a0b","\u542f\u52a8\u53c2\u6570\u8bf4\u660e","\u5206\u5e03\u5f0f\u8bad\u7ec3","\u4f7f\u7528fabric\u542f\u52a8\u96c6\u7fa4\u8bad\u7ec3","\u5728\u4e0d\u540c\u96c6\u7fa4\u4e2d\u8fd0\u884c","Kubernetes on AWS","Kubernetes\u5355\u673a\u8bad\u7ec3","Kubernetes\u5206\u5e03\u5f0f\u8bad\u7ec3","\u5728OpenMPI\u96c6\u7fa4\u4e2d\u542f\u52a8\u8bad\u7ec3","<no title>","<no title>","\u73af\u5883\u51c6\u5907","\u53c2\u6570\u6982\u8ff0","\u7ec6\u8282\u63cf\u8ff0","\u547d\u4ee4\u884c\u53c2\u6570\u8bbe\u7f6e","\u4f7f\u7528\u6848\u4f8b","\u8fdb\u9636\u4f7f\u7528","GPU\u6027\u80fd\u8c03\u4f18","\u652f\u6301\u53cc\u5c42\u5e8f\u5217\u4f5c\u4e3a\u8f93\u5165\u7684Layer","\u5355\u53cc\u5c42RNN API\u5bf9\u6bd4\u4ecb\u7ecd","RNN\u6a21\u578b","Recurrent Group\u6559\u7a0b","RNN\u914d\u7f6e","PaddlePaddle \u6587\u6863"],titleterms:{"\u4e00\u4e9b\u7ec6\u8282\u7684\u8865\u5145":27,"\u4e0b\u8f7d\u6570\u636e":26,"\u4e0b\u8f7dmklml\u5e93\u5931\u8d25":8,"\u4e0d\u540c\u7684":12,"\u4e13\u6ce8\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5f00\u53d1":2,"\u4e24\u79cd\u4f7f\u7528":12,"\u4e3a\u4ec0\u4e48\u9700\u8981\u6027\u80fd\u5206\u6790":37,"\u4ec0\u4e48\u662f\u6027\u80fd\u5206\u6790":37,"\u4ece\u6e90\u7801\u7f16\u8bd1":0,"\u4ee3\u7801\u8981\u6c42":4,"\u4f7f\u7528":[4,26],"\u4f7f\u7528\u6848\u4f8b":35,"\u4f7f\u7528\u6a21\u578b\u521d\u59cb\u5316\u7f51\u7edc":35,"\u4f7f\u7528\u6d41\u7a0b":20,"\u4f7f\u7528\u73af\u5883\u53d8\u91cf":27,"\u4f7f\u7528docker\u542f\u52a8paddlepaddl":1,"\u4f7f\u7528docker\u5b89\u88c5\u8fd0\u884c":1,"\u4f7f\u7528docker\u6267\u884cgpu\u8bad\u7ec3":1,"\u4f7f\u7528docker\u6784\u5efa":7,"\u4f7f\u7528fabric\u542f\u52a8\u96c6\u7fa4\u8bad\u7ec3":23,"\u4f7f\u7528paddlepaddl":7,"\u4f7f\u7528pip\u5b89\u88c5":3,"\u4fdd\u6301\u672c\u5730\u4ed3\u5e93\u6700\u65b0":4,"\u4fee\u6539\u542f\u52a8\u811a\u672c":26,"\u514b\u9686":4,"\u5173\u6ce8\u5e95\u5c42\u6846\u67b6":2,"\u5185\u7f6e\u5b9a\u65f6\u5668":37,"\u5199\u68af\u5ea6\u68c0\u67e5\u5355\u5143\u6d4b\u8bd5":6,"\u51c6\u5907\u4e00\u4e2alinux\u96c6\u7fa4":23,"\u51c6\u5907\u6570\u636e\u96c6":21,"\u51c6\u5907\u8bad\u7ec3\u6570\u636e":27,"\u51c6\u5907\u8bad\u7ec3\u7a0b\u5e8f":21,"\u51c6\u5907\u9884\u6d4b\u6a21\u578b":20,"\u51c6\u5907openmpi\u96c6\u7fa4":28,"\u51cf\u5c11\u6570\u636e\u8f7d\u5165\u7684\u8017\u65f6":11,"\u51cf\u5c11dataprovider\u7f13\u51b2\u6c60\u5185\u5b58":11,"\u51fa\u73b0":12,"\u5206\u5e03\u5f0f\u8bad\u7ec3":22,"\u521b\u5efa\u672c\u5730\u5206\u652f":4,"\u521b\u5efa\u795e\u7ecf\u7f51\u7edc\u8f93\u5165":20,"\u521b\u5efajob":27,"\u521b\u5efapaddlepaddl":26,"\u521d\u59cb\u5316paddlepaddle\u8fd0\u884c\u73af\u5883":20,"\u5220\u9664\u672c\u5730\u5206\u652f":4,"\u5220\u9664\u8fdc\u7a0b\u5206\u652f":4,"\u5229\u7528\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90":11,"\u5230\u8fdc\u7a0b\u4ed3\u5e93":4,"\u5236\u4f5c\u955c\u50cf":27,"\u5236\u4f5cdocker\u955c\u50cf":26,"\u524d\u5411\u8ba1\u7b97":20,"\u52a0\u8f7d\u6a21\u578b":20,"\u52a0\u8f7dpaddlepaddl":14,"\u52a0\u901f\u8bad\u7ec3\u901f\u5ea6":11,"\u5355\u5143\u6d4b\u8bd5":33,"\u5355\u53cc\u5c42rnn":39,"\u53c2\u6570\u5185\u5b58":11,"\u53c2\u6570\u670d\u52a1\u5668\u548c\u5206\u5e03\u5f0f\u901a\u4fe1":33,"\u53c2\u6570\u6982\u8ff0":32,"\u53c2\u6570\u8bbe\u7f6e":13,"\u53c2\u8003\u8d44\u6599":37,"\u53cc\u5c42rnn":39,"\u53cc\u5c42rnn\u4ecb\u7ecd":41,"\u53cc\u5c42rnn\u7684\u4f7f\u7528":41,"\u5404\u4e2a\u7248\u672c\u6700\u65b0\u7684whl\u5305":3,"\u5411\u91cf":33,"\u542f\u52a8\u4efb\u52a1":27,"\u542f\u52a8\u53c2\u6570\u670d\u52a1\u5668":21,"\u542f\u52a8\u53c2\u6570\u8bf4\u660e":21,"\u542f\u52a8\u8ba1\u7b97\u8282\u70b9":21,"\u542f\u52a8\u96c6\u7fa4\u4f5c\u4e1a":[23,28],"\u547d\u4ee4\u884c\u53c2\u6570\u8bbe\u7f6e":34,"\u548c":38,"\u5728\u4e0d\u540c\u8bbe\u5907\u4e0a\u6307\u5b9a\u5c42":35,"\u5728\u4e0d\u540c\u96c6\u7fa4\u4e2d\u8fd0\u884c":24,"\u5728docker\u4e2d\u6267\u884cpaddlepaddle\u8bad\u7ec3\u7a0b\u5e8f":1,"\u5728openmpi\u96c6\u7fa4\u4e2d\u542f\u52a8\u8bad\u7ec3":28,"\u57fa\u672c\u4f7f\u7528\u6982\u5ff5":[14,19],"\u57fa\u672c\u539f\u7406":41,"\u5982\u4f55\u4e66\u5199\u6587\u6863":7,"\u5982\u4f55\u4f7f\u7528":12,"\u5982\u4f55\u5171\u4eab\u53c2\u6570":13,"\u5982\u4f55\u51cf\u5c11\u5185\u5b58\u5360\u7528":11,"\u5982\u4f55\u521d\u59cb\u5316\u53c2\u6570":13,"\u5982\u4f55\u52a0\u8f7d\u9884\u8bad\u7ec3\u53c2\u6570":13,"\u5982\u4f55\u52a0\u901f\u8bad\u7ec3\u901f\u5ea6":11,"\u5982\u4f55\u548c\u660e\u6587\u8fdb\u884c\u76f8\u4e92\u8f6c\u5316":13,"\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u53c2\u6570\u7684\u6743\u91cd\u548c\u68af\u5ea6":11,"\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u67d0\u4e00\u4e2alayer\u7684output":11,"\u5982\u4f55\u5b9e\u73b0\u65b0\u7684\u7f51\u7edc\u5c42":6,"\u5982\u4f55\u6307\u5b9agpu\u8bbe\u5907":11,"\u5982\u4f55\u66f4\u65b0www":7,"\u5982\u4f55\u6784\u5efa\u6587\u6863":7,"\u5982\u4f55\u8bbe\u7f6e\u5b66\u4e60\u7387\u9000\u706b":13,"\u5982\u4f55\u8c03\u7528":11,"\u5982\u4f55\u8d21\u732e\u4ee3\u7801":4,"\u5982\u4f55\u8d21\u732e\u6587\u6863":7,"\u5982\u4f55\u8fdb\u884c\u6027\u80fd\u5206\u6790":37,"\u5982\u4f55\u9009\u62e9sgd\u7b97\u6cd5\u7684\u5b66\u4e60\u7387":13,"\u5b50\u5e8f\u5217\u95f4\u65e0memori":39,"\u5b50\u5e8f\u5217\u95f4\u6709memori":39,"\u5b58\u50a8\u7684\u53c2\u6570\u683c\u5f0f\u662f\u4ec0\u4e48":13,"\u5b89\u88c5":3,"\u5b89\u88c5\u4e0e\u7f16\u8bd1":2,"\u5b89\u88c5\u4e0e\u7f16\u8bd1c":17,"\u5b9e\u73b0c":6,"\u5b9e\u73b0python\u5c01\u88c5":6,"\u5c06\u547d\u4ee4\u53c2\u6570\u4f20\u7ed9\u7f51\u7edc\u914d\u7f6e":35,"\u5de5\u5177":37,"\u5e38\u89c1\u95ee\u9898":0,"\u5e38\u89c1\u95ee\u9898\u548c\u89e3\u51b3\u65b9\u6cd5":3,"\u5e38\u89c1\u95ee\u9898\u6c47\u603b":2,"\u5e76\u5b8c\u6210":4,"\u5efa\u7acb":4,"\u5f00\u53d1\u6807\u51c6":5,"\u5f00\u59cb\u5f00\u53d1":4,"\u5f02\u6b65":21,"\u5f02\u6b65\u968f\u673a\u68af\u5ea6\u4e0b\u964d":33,"\u5feb\u901f\u4f7f\u7528":16,"\u5feb\u901f\u5b89\u88c5":16,"\u5feb\u901f\u5f00\u59cb":16,"\u6027\u80fd\u5206\u6790\u5c0f\u6280\u5de7":37,"\u6027\u80fd\u5206\u6790\u5de5\u5177\u4ecb\u7ecd":37,"\u6027\u80fd\u8c03\u4f18":33,"\u603b\u7ed3":19,"\u6216\u8005\u662f":8,"\u6267\u884c\u5355\u5143\u6d4b\u8bd5":0,"\u627e\u5230\u7684pythonlibs\u548cpythoninterp\u7248\u672c\u4e0d\u4e00\u81f4":8,"\u62a5importerror":8,"\u63a5\u53e3\u8f93\u51fa\u591a\u4e2alayer\u7684\u9884\u6d4b\u7ed3\u679c":11,"\u63a8\u5bfc\u65b9\u7a0b":6,"\u63d0\u4ea4":4,"\u63d0\u4ea4\u4ee3\u7801\u7684\u4e00\u4e9b\u7ea6\u5b9a":4,"\u63d0\u4ea4\u955c\u50cf":26,"\u642d\u5efa\u795e\u7ecf\u7f51\u7edc":14,"\u652f\u6301\u53cc\u5c42\u5e8f\u5217\u4f5c\u4e3a\u8f93\u5165\u7684layer":38,"\u6570\u636e\u652f\u6301":33,"\u6574\u4f53\u65b9\u6848":27,"\u6587\u6863":43,"\u65b0\u624b\u5165\u95e8":15,"\u65e5\u5fd7\u4e2d\u4fdd\u5b58\u5747\u4e3a\u7f51\u7edc\u901a\u4fe1\u7c7b\u9519\u8bef":9,"\u65f6\u95f4\u5e8f\u5217":39,"\u65f6\u95f4\u6b65":39,"\u66f4\u65b0":21,"\u672c\u5730\u6d4b\u8bd5":35,"\u672c\u5730\u8bad\u7ec3":35,"\u672c\u5730\u8bad\u7ec3\u4e0e\u9884\u6d4b":11,"\u6784\u5efa\u548c\u6d4b\u8bd5":4,"\u67e5\u770b\u8bad\u7ec3\u7ed3\u679c":26,"\u67e5\u770b\u8f93\u51fa":27,"\u6848\u4f8b\u4e00":35,"\u6848\u4f8b\u4e8c":35,"\u68c0\u67e5\u6a21\u578b\u8f93\u51fa":23,"\u68c0\u67e5\u96c6\u7fa4\u8bad\u7ec3\u7ed3\u679c":23,"\u6982\u8ff0":[17,38,41],"\u6a21\u578b\u914d\u7f6e":[12,39],"\u6a21\u578b\u914d\u7f6e\u7684\u6a21\u578b\u914d\u7f6e":39,"\u6ce8\u610f\u4e8b\u9879":20,"\u6d4b\u8bd5":33,"\u6e05\u7406":20,"\u73af\u5883\u51c6\u5907":31,"\u751f\u6210\u5e8f\u5217":42,"\u751f\u6210\u6d41\u7a0b\u7684\u4f7f\u7528\u65b9\u6cd5":41,"\u7684\u533a\u522b":12,"\u7684\u53c2\u6570":12,"\u7684\u65b9\u6cd5\u6709\u4f55\u533a\u522b":12,"\u76f4\u63a5\u6784\u5efa":7,"\u76f8\u5173\u6982\u5ff5":41,"\u77e9\u9635":33,"\u793a\u4f8b1":39,"\u793a\u4f8b2":39,"\u793a\u4f8b3":39,"\u793a\u4f8b4":39,"\u795e\u7ecf\u5143\u6fc0\u6d3b\u5185\u5b58":11,"\u7a00\u758f\u8bad\u7ec3":35,"\u7aef\u6570\u636e\u7c7b\u578b\u8bf4\u660e":19,"\u7b80\u5355\u95e8\u63a7\u5faa\u73af\u795e\u7ecf\u7f51\u7edc":42,"\u7c7b":6,"\u7ebf\u6027\u56de\u5f52\u5b8c\u6574\u793a\u4f8b":14,"\u7ec4\u7ec7\u5e8f\u5217\u4fe1\u606f":19,"\u7ec4\u7ec7\u8f93\u5165\u6570\u636e":[19,20],"\u7ec6\u8282\u63cf\u8ff0":33,"\u7ec8\u6b62\u96c6\u7fa4\u4f5c\u4e1a":23,"\u7f16\u5199\u9884\u6d4b\u4ee3\u7801":20,"\u7f16\u5199yaml\u6587\u4ef6":26,"\u7f16\u8bd1\u4f9d\u8d56":0,"\u7f16\u8bd1\u5b89\u88c5\u4e0e\u5355\u5143\u6d4b\u8bd5":8,"\u7f16\u8bd1\u5b89\u88c5\u540e\u6267\u884c":8,"\u7f16\u8bd1\u65b9\u6cd5":0,"\u7f16\u8bd1\u9009\u9879":0,"\u7f16\u8bd1\u9009\u9879\u7684\u8bbe\u7f6e":0,"\u7f16\u8bd1\u9009\u9879\u8bf4\u660e":0,"\u81ea\u7136\u8bed\u8a00\u5904\u7406":33,"\u83b7\u53d6paddlepaddle\u7684docker\u955c\u50cf":1,"\u8bad\u7ec3":33,"\u8bad\u7ec3\u56e0\u6b64\u9000\u51fa\u600e\u4e48\u529e":11,"\u8bad\u7ec3\u6a21\u578b":14,"\u8bad\u7ec3\u6d41\u7a0b\u7684\u4f7f\u7528\u65b9\u6cd5":41,"\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u51fa\u73b0":11,"\u8bcd\u6c47\u8868":39,"\u8be6\u7ec6\u6559\u7a0b":37,"\u8bfb\u53d6\u53cc\u5c42\u5e8f\u5217\u6570\u636e":39,"\u8f93\u5165":[19,41],"\u8f93\u5165\u4e0d\u7b49\u957f":39,"\u8f93\u5165\u793a\u4f8b":41,"\u8f93\u51fa":41,"\u8f93\u51fa\u6570\u636e":19,"\u8f93\u51fa\u6570\u636e\u7c7b\u578b":19,"\u8f93\u51fa\u6570\u636e\u7ec4\u7ec7":19,"\u8fd0\u884c\u5bb9\u5668":26,"\u8fd0\u884c\u73af\u5883\u4f9d\u8d56":3,"\u8fd0\u884cdocker":8,"\u8fdb\u884c\u8bad\u7ec3":26,"\u8fdb\u9636\u4f7f\u7528":36,"\u901a\u7528":33,"\u9047\u5230":8,"\u914d\u7f6e\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u67b6\u6784":42,"\u914d\u7f6e\u7f51\u7edc":14,"\u94a9\u5b50":4,"\u94fe\u63a5\u8bf4\u660e":17,"\u9519\u8bef\u600e\u4e48\u529e":12,"\u9644\u5f55":0,"\u968f\u673a\u6570":33,"\u96c6\u7fa4\u591a\u8282\u70b9\u8bad\u7ec3":9,"\u96c6\u7fa4\u8bad\u7ec3":35,"\u96c6\u7fa4\u8bad\u7ec3\u4e0e\u9884\u6d4b":9,"\u9700\u8981\u7684\u8f6f\u786c\u4ef6":0,"\u975e\u6cd5\u6307\u4ee4":8,"api\u4f7f\u7528\u6d41\u7a0b":20,"api\u5bf9\u6bd4\u4ecb\u7ecd":39,"api\u9884\u6d4b\u5e93":[17,18],"beam_search\u7684\u751f\u6210":39,"book\u6559\u7a0b":1,"cmake\u6e90\u7801\u7f16\u8bd1":8,"float":11,"gpu\u548ccpu\u6df7\u5408\u4f7f\u7528":35,"gpu\u6027\u80fd\u8c03\u4f18":37,"gpu\u955c\u50cf\u51fa\u73b0":8,"group\u6559\u7a0b":41,"import":8,"kubernetes\u5206\u5e03\u5f0f\u8bad\u7ec3":27,"kubernetes\u5355\u673a\u8bad\u7ec3":26,"org\u5de5\u5177":7,"paddle\u7248\u672c\u53f7\u4e3a0":8,"paddlepaddle\u662f\u5426\u652f\u6301\u7ef4\u6570\u53ef\u53d8\u7684\u6570\u636e\u8f93\u5165":12,"paddlepaddle\u73af\u5883\u4f9d\u8d56":3,"paddlepaddle\u7684softmax\u80fd\u5426\u6307\u5b9a\u8ba1\u7b97\u7684\u7ef4\u5ea6":12,"paddlepaddle\u7f16\u8bd1\u4f9d\u8d56":0,"pod\u95f4\u901a\u4fe1":27,"python\u76f8\u5173\u7684\u5355\u5143\u6d4b\u8bd5\u90fd\u8fc7\u4e0d\u4e86":8,"rnn\u6a21\u578b":40,"rnn\u914d\u7f6e":42,AWS:25,DNS:25,EFS:25,KMS:25,access:25,account:25,add:25,address:25,anneal:13,asset:25,associ:25,async:33,attent:42,aws:25,becaus:13,big:13,bla:0,bucket:25,choos:25,clone:4,cloudform:25,cluster:25,commit:4,concept:25,configur:25,content:[8,9,11,12,13,25,37,38],core:25,creat:25,credenti:25,cuda:[0,8],cudnn:0,data:25,dataprovid:33,defin:25,delet:25,demo:25,destroi:25,directori:25,distribut:25,down:25,download:25,driver:8,drop_out:12,duplic:12,ec2:25,elast:25,except:11,expand:38,extern:25,faq:10,file:25,find:25,first_seq:38,fork:4,gate:42,gpu:33,group:25,gru:33,iam:25,illeg:8,infer:11,initi:25,inspect:25,instal:25,instanc:25,instruct:8,insuffici:8,integr:25,issu:4,job:[25,26],kei:25,kube:25,kubectl:25,kubernet:[25,26],last_seq:38,layer:12,learn:13,local:25,lstm:33,memori:[12,39,41],messag:13,model:42,modul:8,name:[8,12,25],network:42,neural:42,nlp:33,nvprof:37,nvvp:37,org:7,output:25,paddl:[8,12],paddlepaddl:[7,8,25,43],pair:25,parallel_nn:35,paramet:25,perform:33,platform:8,point:[11,25],pool:38,pre:4,prepar:25,privat:25,protocol:13,pull:4,push:4,python:19,rate:13,recurr:[12,41,42],region:25,reject:13,render:25,request:4,rnn:[33,39],route53:25,secur:25,sequenc:42,server:25,servic:25,setup:25,sgd:[21,33],start:25,step2:20,step:20,support:8,system:25,tear:25,templat:25,thi:8,too:13,train:25,trainer:25,tune:33,updat:25,verifi:25,version:8,volum:25,vpc:25,wheel:8,whl:8}}) \ No newline at end of file diff --git a/develop/doc_cn/survey/cluster_bootstrapping_tools.html b/develop/doc_cn/survey/cluster_bootstrapping_tools.html deleted file mode 100644 index c3356f9b53bba76c73af0645288b1ea04e3a0cec..0000000000000000000000000000000000000000 --- a/develop/doc_cn/survey/cluster_bootstrapping_tools.html +++ /dev/null @@ -1,365 +0,0 @@ - - - - - - - - - - - - - Cluster bootstrapping tool survey — PaddlePaddle 文档 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                                                  - - - - -
                                                                  - - - - - - -
                                                                  -
                                                                  - - - - - - -
                                                                  - -
                                                                  -
                                                                  -
                                                                  -
                                                                  - -
                                                                  -

                                                                  Cluster bootstrapping tool survey

                                                                  -
                                                                  -

                                                                  Abstract

                                                                  -

                                                                  In order to bring up a cluster from bare metal machine to a fully functional kubernetes cluster for Paddlepaddle to run, we need to utilize some tools. Here we are going to compare Sextant and Tectonic installer

                                                                  -
                                                                  -
                                                                  -

                                                                  Basic assumptions

                                                                  -

                                                                  Here are some basic assumptions before we move on to details

                                                                  -
                                                                    -
                                                                  1. You are an administrator of a bare metal machine cluster, which means:
                                                                  2. -
                                                                  -
                                                                    -
                                                                  • you have full control to each of the machines.
                                                                  • -
                                                                  • you have full control to the network which machines are connected to.
                                                                  • -
                                                                  -
                                                                    -
                                                                  1. Machines can be booted from network with PEX or iPXE
                                                                  2. -
                                                                  3. You understand the general procedure to bring up a cluster
                                                                  4. -
                                                                  -

                                                                  if your cluster is able to mark above items with checkmarks, then keep reading.

                                                                  -
                                                                  -
                                                                  -

                                                                  Comparing Sextant and Tectonic installer

                                                                  -
                                                                  -

                                                                  Sextant

                                                                  -

                                                                  Sextant is an end2end solution to bring up a bare metal cluster to a fully functional k8s cluster, it integrates DHCP, name service, PEX, cloud-config-service, docker registry services altogether.

                                                                  -
                                                                  -

                                                                  Pros

                                                                  -
                                                                    -
                                                                  1. End2End: basically all admin need to do is to config the cluster.yaml and power on the cluster.
                                                                  2. -
                                                                  3. Offline cluster configuration: Sextant has 2 phases during working with it, config time and deploy time. when admin is configuring, it requires admin’s machine has internet connectivity, which will download some images, etc. But in deploy time, it’s completely OK to go offline since all dependencies are ready during config time.
                                                                  4. -
                                                                  5. docker registry integrated.
                                                                  6. -
                                                                  7. GPU machine took care of.
                                                                  8. -
                                                                  -
                                                                  -
                                                                  -
                                                                  -

                                                                  Cons

                                                                  -
                                                                    -
                                                                  1. k8s API server is not deployed with high availability in considering by default.
                                                                  2. -
                                                                  3. No grouping support.
                                                                  4. -
                                                                  5. No API interface, a one-off service.
                                                                  6. -
                                                                  -
                                                                  -
                                                                  -

                                                                  Tectonic installer

                                                                  -

                                                                  First of all, Tectonic is not free, it requires coreos.com account as a step of installation, and free user can only create less than 10 nodes.

                                                                  -

                                                                  Tectonic is a suite of software which wraps around k8s and providing more utility regarding dev ops, ie, -Tectonic installer as it’s named, it installs Tectonic to a bare metal cluster which means it’s not totally an equivalent of Sextant. At the “booting a cluster” part, it mostly utilizes Matchbox, which is a general cluster bootstrapper.

                                                                  -

                                                                  Matchbox’s Approach is similar to Sexstant.

                                                                  -
                                                                  -
                                                                  -

                                                                  Pros

                                                                  -
                                                                    -
                                                                  1. supports grouping machines.
                                                                  2. -
                                                                  3. supports running provisioning service in rtk. (not a big deal though).
                                                                  4. -
                                                                  5. supports http/gRPC API interface.
                                                                  6. -
                                                                  7. supports multi-template.
                                                                  8. -
                                                                  -
                                                                  -
                                                                  -

                                                                  Cons

                                                                  -
                                                                    -
                                                                  1. Not an e2e solution to bring up a cluster, need a lot of extra work and other software.
                                                                  2. -
                                                                  3. Not fully supporting centOS deployment yet.
                                                                  4. -
                                                                  -
                                                                  -
                                                                  -
                                                                  -

                                                                  Conclusion

                                                                  -

                                                                  Sextant is a better solution overall for paddle cloud deploying to a bare metal cluster. It would be great if Sextant can also 1) deploy k8s api server with high availability by default; 2) not designed as a one-off service.

                                                                  -
                                                                  -
                                                                  -

                                                                  Appendix: General procedure to bring up a cluster

                                                                  -

                                                                  It’s physically impossible for a cluster admin to manually install OS and applications into cluster nodes one by one, here is what an admin would do in cloud industry:

                                                                  -
                                                                    -
                                                                  1. setup a bootstrap machine with static IP in the cluster, which has following services:
                                                                  2. -
                                                                  -
                                                                    -
                                                                  • DHCP: assigns ip address for rest of the nodes.
                                                                  • -
                                                                  • name service: to map node name to a IP
                                                                  • -
                                                                  • PXE related services: the booting related info will be delivered to newly booted machines as their IP is assigned via DHCP service, PXE service will provide further booting and installing info and image with TFTP and http protocol.
                                                                  • -
                                                                  • cluster config service: this is for providing cluster node with OS config via http
                                                                  • -
                                                                  • optional docker registry: a built-in docker registry makes the whole cluster independent from connecting internet, and speeds up software distribution.
                                                                  • -
                                                                  -
                                                                    -
                                                                  1. New node powers on, it will
                                                                  2. -
                                                                  -
                                                                    -
                                                                  • broadcast the request for an IP address
                                                                  • -
                                                                  • DHCP server assigns the IP address, and deliver the PXE booting related info to the node.
                                                                  • -
                                                                  • cluster node will request config files with booting info delivered with DHCP via the TFTP service, and in most of the cases, the config file will point to a http service for the booting image.
                                                                  • -
                                                                  • Since PXE is configured with initrd, it will utilize the cloud config service and do further installations like coreOS or K8s installations.
                                                                  • -
                                                                  • then restart the node.
                                                                  • -
                                                                  -

                                                                  For further understanding, following 2 links from Matchbox are some good readings:

                                                                  - -
                                                                  -
                                                                  - - -
                                                                  -
                                                                  -
                                                                  - - -
                                                                  - -
                                                                  -

                                                                  - © Copyright 2016, PaddlePaddle developers. - -

                                                                  -
                                                                  - Built with Sphinx using a theme provided by Read the Docs. - -
                                                                  - -
                                                                  -
                                                                  - -
                                                                  - -
                                                                  - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file