diff --git a/doc/fluid/advanced_guide/addon_development/new_op/op_notes.md b/doc/fluid/advanced_guide/addon_development/new_op/op_notes.md index 70b1901b9bbc1710414b2be951e039d9b1f5180c..ddae81c39873944f11f3ee227ebd51e785943df6 100644 --- a/doc/fluid/advanced_guide/addon_development/new_op/op_notes.md +++ b/doc/fluid/advanced_guide/addon_development/new_op/op_notes.md @@ -181,7 +181,7 @@ REGISTER_OPERATOR( - Fluid提供的`DefaultGradOpMaker`,默认会将前向op的所有输入(`Input`)、输出(`Output`)以及输出变量所对应的梯度(`Output@Grad`)作为反向Op的输入,将前向Op输入所对应的梯度(`Input@Grad`)作为反向Op的输出。所以在使用`DefaultGradOpMaker`时需要考虑是否有些变量在计算中不被用到。 - 如果`DefaultGradOpMaker`不能够满足需求,需要用户自己手动构建`GradOpMaker`,具体实现请参考[相关文档](new_op.html#gradopmaker); -- 如果有些反向Op需要依赖前向Op的输入或输出变量的的Shape或LoD,但不依赖于变量中Tensor的Buffer,且不能根据其他变量推断出该Shape和LoD,需要对该变量(以下称该变量为`X`)在反向Op中进行注册`NoNeedBufferVarsInference`。**一旦注册了`NoNeedBufferVarsIference`,反向op中就不能读写该变量对应的Tensor中的buffer,只能调用Tensor的dims()和lod()方法,同时,反向Op中的`GetExpectedKernelType()`必须要重写,并且`GetExpectedKernelType()`中不能访问`X`变量中Tensor的type()方法**。比如在`SliceOpGrad`中只会用到`Input`中变量的Shape信息,所以需要为对`Input`在`SliceOpGrad`上进行注册: +- 如果有些反向Op需要依赖前向Op的输入或输出变量的的Shape或LoD,但不依赖于变量中Tensor的Buffer,且不能根据其他变量推断出该Shape和LoD,则可以通过`DECLARE_NO_NEED_BUFFER_VARS_INFERER`接口对该变量(以下称该变量为`X`)在反向Op中进行注册`NoNeedBufferVars`。**一旦注册了`NoNeedBufferVars`,反向op中就不能读写该变量对应的Tensor中的buffer,只能调用Tensor的dims()和lod()方法,同时,反向Op中的`GetExpectedKernelType()`必须要重写,并且`GetExpectedKernelType()`中不能访问`X`变量中Tensor的type()方法**。比如在`SliceOpGrad`中只会用到`Input`中变量的Shape信息,所以需要为对`Input`在`SliceOpGrad`上进行注册: ``` namespace paddle { namespace operators { @@ -230,8 +230,8 @@ class SliceOpGradMaker : public framework::SingleGradOpMaker { } }; -DECLARE_NO_NEED_BUFFER_VARS_INFERENCE(SliceOpGradNoNeedBufferVarsInference, - "Input"); +DECLARE_NO_NEED_BUFFER_VARS_INFERER(SliceOpGradNoNeedBufferVarsInference, + "Input"); } // namespace operators } // namespace paddle namespace ops = paddle::operators; diff --git a/doc/fluid/advanced_guide/distributed_training/cluster_quick_start.rst b/doc/fluid/advanced_guide/distributed_training/cluster_quick_start.rst index 5509d34403aedd8fd92dd3978fed10b723073d0a..1988aee0ae578f584b723bdf38010945b264320d 100644 --- a/doc/fluid/advanced_guide/distributed_training/cluster_quick_start.rst +++ b/doc/fluid/advanced_guide/distributed_training/cluster_quick_start.rst @@ -14,7 +14,7 @@ * - [x] 成功安装Paddle Fluid,如果尚未安装,请参考 `快速开始 `_ + [x] 成功安装Paddle Fluid,如果尚未安装,请参考 `快速开始 `_ * [x] 学会最基本的单机训练方法,请参考 `单机训练 `_ 中描述的单卡训练,进行学习 @@ -113,7 +113,7 @@ main_function(args.is_local) -* 说明:示例中使用的IO方法是dataset,想了解具体的文档和用法请参考 `Dataset API `_ 。示例中使用的 ``train_from_dataset`` 接口,想了解具体的文档和使用方法请参考 `Executor API `_ 。示例中的 ``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet`` 表示引入参数服务器架构进行分布式训练,如果想更进一步了解Fleet API的更多选项和示例,请参考 `Fleet API `_ +* 说明:示例中使用的IO方法是dataset,想了解具体的文档和用法请参考 `Dataset API `_ 。示例中使用的 ``train_from_dataset`` 接口,想了解具体的文档和使用方法请参考 `Executor API `_ 。示例中的 ``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet`` 表示引入参数服务器架构进行分布式训练,如果想更进一步了解Fleet API的更多选项和示例,请参考 `Fleet API `_ 单机训练启动命令 diff --git a/doc/fluid/advanced_guide/distributed_training/cluster_quick_start_en.rst b/doc/fluid/advanced_guide/distributed_training/cluster_quick_start_en.rst index ad9868e38bf4c3f4b953749db45064436c972661..ff8ea39c02200f8397c9d3bd9454fd6d01214f51 100644 --- a/doc/fluid/advanced_guide/distributed_training/cluster_quick_start_en.rst +++ b/doc/fluid/advanced_guide/distributed_training/cluster_quick_start_en.rst @@ -1,193 +1,159 @@ -.. _cluster_quick_start_en: +Quick start for distributed training +==================================== -Quick Start with Distributed Training -========================== +Distributed training with Fleet API +----------------------------------- -Preparation --------------------- -In this article, we'll show you how to quickly start a PaddlePaddle distributed training task in a cluster. Before you start, do some preparatory work as follows: - -1. Prepare a connected training cluster. Here we use 4 training nodes with format ``*.paddlepaddle.com`` to represent the host name of the node. You can modify it according to the actual situation. - -2. Make sure you have read :ref:`install_steps` before you start and can run PaddlePaddle on all nodes of the cluster. +Since Paddle Fluid `Release +1.5.1 `__, +it is officially recommended to use the Fleet API for distributed +training. For the introduction of the Fleet API, please refer to `Fleet +Design Doc `__. -Example code -------------- - -Let's use a very simple linear regression model as an example to explain how to start a distributed training task with 2 pserver server nodes and 2 trainer nodes. You can save this code as ``dist_train.py`` . +Preparation +~~~~~~~~~~~ + +- [x] Install Paddle Fluid. If not already installed, please refer to + `Beginner’s + Guide `__. +- [x] Master the most basic single node training method. Please refer + to the single card training described in `Single-node + training `__. + +Click-through rate prediction +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Here, we will use a simple example, click-through rate prediction task, +to illustrate how to configure Fleet API for distributed training, and +gives an example by using a single node environment to simulate the +distributed environment. The source code of the example comes from `CTR +with +Fleet `__. + +In order to facilitate learning, the example given here is a mixed code +of single node and multi node. You can start single node or multi node +tasks through different startup commands. For the part of obtaining data +and the logic of data preprocessing, please refer to the source code and +description of `CTR with +Fleet `__. .. code:: python - + from __future__ import print_function + from args import parse_args import os - import paddle import paddle.fluid as fluid - - # train reader - BATCH_SIZE = 20 - EPOCH_NUM = 30 - BATCH_SIZE = 8 - - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.uci_housing.train(), buf_size=500), - batch_size=BATCH_SIZE) - - def train(): - y = fluid.layers.data(name='y', shape=[1], dtype='float32') - x = fluid.layers.data(name='x', shape=[13], dtype='float32') - y_predict = fluid.layers.fc(input=x, size=1, act=None) - - loss = fluid.layers.square_error_cost(input=y_predict, label=y) - avg_loss = fluid.layers.mean(loss) - opt = fluid.optimizer.SGD(learning_rate=0.001) - opt.minimize(avg_loss) - - place = fluid.CPUPlace() - feeder = fluid.DataFeeder(place=place, feed_list=[x, y]) - exe = fluid.Executor(place) - - # fetch distributed training environment setting - training_role = os.getenv("PADDLE_TRAINING_ROLE", None) - port = os.getenv("PADDLE_PSERVER_PORT", "6174") - pserver_ips = os.getenv("PADDLE_PSERVER_IPS", "") - trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0")) - eplist = [] - for ip in pserver_ips.split(","): - eplist.append(':'.join([ip, port])) - pserver_endpoints = ",".join(eplist) - trainers = int(os.getenv("PADDLE_TRAINERS")) - current_endpoint = os.getenv("PADDLE_CURRENT_IP", "") + ":" + port - - t = fluid.DistributeTranspiler() - t.transpile( - trainer_id = trainer_id, - pservers = pserver_endpoints, - trainers = trainers) - - if training_role == "PSERVER": - pserver_prog = t.get_pserver_program(current_endpoint) - startup_prog = t.get_startup_program(current_endpoint, pserver_prog) - exe.run(startup_prog) - exe.run(pserver_prog) - elif training_role == "TRAINER": - trainer_prog = t.get_trainer_program() + import sys + from network_conf import ctr_dnn_model_dataset + import paddle.fluid.incubate.fleet.base.role_maker as role_maker + + from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet + from paddle.fluid.transpiler.distribute_transpiler import DistributeTranspilerConfig + + dense_feature_dim = 13 + sparse_feature_dim = 10000001 + batch_size = 100 + thread_num = 10 + embedding_size = 10 + args = parse_args() + + def main_function(is_local): + # common code for local training and distributed training + dense_input = fluid.layers.data( + name="dense_input", shape=[dense_feature_dim], dtype='float32') + + sparse_input_ids = [ + fluid.layers.data(name="C" + str(i), shape=[1], lod_level=1, + dtype="int64") for i in range(1, 27)] + + label = fluid.layers.data(name="label", shape=[1], dtype="int64") + dataset = fluid.DatasetFactory().create_dataset() + dataset.set_use_var([dense_input] + sparse_input_ids + [label]) + pipe_command = "python criteo_reader.py %d" % sparse_feature_dim + dataset.set_pipe_command(pipe_command) + dataset.set_batch_size(batch_size) + dataset.set_thread(thread_num) + + whole_filelist = ["raw_data/part-%d" % x + for x in range(len(os.listdir("raw_data")))] + + dataset.set_filelist(whole_filelist) + loss, auc_var, batch_auc_var = ctr_dnn_model_dataset( + dense_input, sparse_input_ids, label, embedding_size, + sparse_feature_dim) + + exe = fluid.Executor(fluid.CPUPlace()) + def train_loop(epoch=20): + for i in range(epoch): + exe.train_from_dataset(program=fluid.default_main_program(), + dataset=dataset, + fetch_list=[auc_var], + fetch_info=["auc"], + debug=False) + # local training + def local_train(): + optimizer = fluid.optimizer.SGD(learning_rate=1e-4) + optimizer.minimize(loss) exe.run(fluid.default_startup_program()) - - for epoch in range(EPOCH_NUM): - for batch_id, batch_data in enumerate(train_reader()): - avg_loss_value, = exe.run(trainer_prog, - feed=feeder.feed(batch_data), - fetch_list=[avg_loss]) - if (batch_id + 1) % 10 == 0: - print("Epoch: {0}, Batch: {1}, loss: {2}".format( - epoch, batch_id, avg_loss_value[0])) - # destory the resource of current trainer node in pserver server node - exe.close() + train_loop() + + # distributed training + def dist_train(): + role = role_maker.PaddleCloudRoleMaker() + fleet.init(role) + strategy = DistributeTranspilerConfig() + strategy.sync_mode = False + optimizer = fluid.optimizer.SGD(learning_rate=1e-4) + optimizer = fleet.distributed_optimizer(optimizer, strategy) + optimizer.minimize(loss) + + if fleet.is_server(): + fleet.init_server() + fleet.run_server() + elif fleet.is_worker(): + fleet.init_worker() + exe.run(fluid.default_startup_program()) + train_loop() + if is_local: + local_train() else: - raise AssertionError("PADDLE_TRAINING_ROLE should be one of [TRAINER, PSERVER]") - - train() - - -Environment Variables ------------------------------------- - -When starting a distributed training task, different environment variables are used to represent different node roles, details as follows: - -.. list-table:: - :header-rows: 1 - - * - Environment Variable - - Data Type - - Example - - Description - * - :code:`PADDLE_TRAINING_ROLE` - - str - - :code:`PSERVER,TRANERR` - - role of current training node - * - :code:`PADDLE_PSERVER_IPS` - - str - - :code:`ps0.paddlepaddle.com, ps1.paddlepaddle.com` - - The IP addresses or hostnames of all pserver nodes in the distributed training task, separated by "," - * - :code:`PADDLE_PSERVER_PORT` - - int - - 6174 - - port that the pserver process listens to - * - :code:`PADDLE_TRAINERS` - - int - - 2 - - Number of trainer nodes in a distributed training task - * - :code:`PADDLE_CURRENT_IP` - - str - - :code:`ps0.paddlepaddle.com` - - IP address or hostname of the current pserver node - * - :code:`PADDLE_TRAINER_ID` - - str - - 0 - - ID of the current trainer node (unique), in the range of [0, PADDLE_TRAINERS) - -**Note:** Environment variables are just a way to get runtime information. In practical tasks, you can use command line parameters to obtain runtime information. - -API related to Distributed Training ---------------------------------- - -DistributeTranspiler -~~~~~~~~~~~~~~~~~~~~~~ - -The machines in distributed training tasks based on the pserver-trainer architecture are divided into two roles: Parameter Server (pserver) and trainer. In Fluid, users only need to configure the network configuration required for single node training. The ``DistributeTranspiler`` module automatically modifies the single-node network settings into settings on which pserver and trainer needs to run based on the role of current training node: + dist_train() -.. code:: python + if __name__ == '__main__': + main_function(args.is_local) - t = fluid.DistributeTranspiler() - t.transpile( - trainer_id = trainer_id, - pservers = pserver_endpoints, - trainers = trainers) - if PADDLE_TRAINING_ROLE == "TRAINER": - # fetch the trainer program and execute it - trainer_prog = t.get_trainer_program() - ... +- Note: The IO method used in this example is dataset, please refer to + `Dataset + API `__ + for specific documents and usage. For the ``train_from_dataset`` + interface, please refer to `Executor + API `__. + ``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet`` + in this example means to introduce parameter server architecture for + distributed training, which you can refer to `Fleet + API `__ + for getting more about the options and examples of Fleet API. - elif PADDLE_TRAINER_ROLE == "PSERVER": - # fetch the pserver program and execute it - pserver_prog = t.get_pserver_program(current_endpoint) - ... +Start command of single node training +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. code:: bash -Exe.close() -~~~~~~~~~~~~~~ + python train.py --is_local 1 +Start command of single machine simulation distributed training +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The status information of all trainer nodes is saved in the pserver node. When trainer finishes training, ``exe.close()`` should be called to notify all PServer nodes to release the resources of the current Trainer nodes: +Here we use launch\_ps, a built-in launcher of paddle, which users can +specify the number of workers and servers to start the parameter server +tasks. -.. code:: python +.. code:: bash + + python -m paddle.distributed.launch_ps --worker_num 2 --server_num 2 train.py - exe = fluid.Executor(fluid.CPUPlace()) - # training process ... - exe.close() # notify PServer to destory the resource - -Note: every trainer needs to call exe.close() when the trainer finishes. - -Start a Distributed Training Task ----------------------------------- - -.. list-table:: - :header-rows: 1 - - - * - Start Node - - Start Command - - Description - * - ps0.paddlepaddle.com - - :code:`PADDLE_TRAINING_ROLE=PSERVER PADDLE_CURRENT_IP=ps0.paddlepaddle.com PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_PSERVER_PORT=6174 python fluid_dist.py` - - Start pserver node - * - ps1.paddlepaddle.com - - :code:`PADDLE_TRAINING_ROLE=PSERVER PADDLE_CURRENT_IP=ps1.paddlepaddle.com PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_PSERVER_PORT=6174 python fluid_dist.py` - - Start pserver node - * - trainer0.paddlepaddle.com - - :code:`PADDLE_TRAINING_ROLE=TRAINER PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_TRAINER_ID=0 PADDLE_PSERVER_PORT=6174 python fluid_dist.py` - - Start the number 0 Trainer Node - * - trainer1.paddlepaddle.com - - :code:`PADDLE_TRAINING_ROLE=TRAINER PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_TRAINER_ID=1 PADDLE_PSERVER_PORT=6174 python fluid_dist.py` - - Start the number 1 trainer node +The task running log can be viewed in the logs directory of the working +directory. When you can use a single machine to simulate distributed +training, you can perform true multi node distributed training. We +recommend that users refer directly to +`百度云运行分布式任务的示例 `__. diff --git a/doc/fluid/api/dygraph.rst b/doc/fluid/api/dygraph.rst index 927a767edc2adbadf65488ab10c233b6fbb207fb..3630bcb92d1d57e1ded5af72386a12445f33569c 100644 --- a/doc/fluid/api/dygraph.rst +++ b/doc/fluid/api/dygraph.rst @@ -13,9 +13,17 @@ fluid.dygraph dygraph/Conv3D.rst dygraph/Conv3DTranspose.rst dygraph/CosineDecay.rst + dygraph/DataParallel.rst + dygraph/disable_dygraph.rst + dygraph/dygraph_to_static_code.rst + dygraph/dygraph_to_static_func.rst dygraph/dygraph_to_static_output.rst + dygraph/dygraph_to_static_program.rst dygraph/Embedding.rst + dygraph/enable_dygraph.rst + dygraph/enabled.rst dygraph/ExponentialDecay.rst + dygraph/grad.rst dygraph/GroupNorm.rst dygraph/GRUUnit.rst dygraph/guard.rst diff --git a/doc/fluid/api/dygraph/DataParallel.rst b/doc/fluid/api/dygraph/DataParallel.rst new file mode 100644 index 0000000000000000000000000000000000000000..33c8a3fd80181e529c1c6728f867cc952b818eb4 --- /dev/null +++ b/doc/fluid/api/dygraph/DataParallel.rst @@ -0,0 +1,12 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_dygraph_DataParallel: + +DataParallel +------------ + +.. autoclass:: paddle.fluid.dygraph.DataParallel + :members: + :noindex: + diff --git a/doc/fluid/api/dygraph/disable_dygraph.rst b/doc/fluid/api/dygraph/disable_dygraph.rst new file mode 100644 index 0000000000000000000000000000000000000000..17adf7a7559fe31260a4d37b618bafa5e0575b57 --- /dev/null +++ b/doc/fluid/api/dygraph/disable_dygraph.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_dygraph_disable_dygraph: + +disable_dygraph +--------------- + +.. autofunction:: paddle.fluid.dygraph.disable_dygraph + :noindex: + diff --git a/doc/fluid/api/dygraph/dygraph_to_static_code.rst b/doc/fluid/api/dygraph/dygraph_to_static_code.rst new file mode 100644 index 0000000000000000000000000000000000000000..bd6af528d903316df1c1c03f63392ace8af7c55b --- /dev/null +++ b/doc/fluid/api/dygraph/dygraph_to_static_code.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_dygraph_dygraph_to_static_code: + +dygraph_to_static_code +---------------------- + +.. autofunction:: paddle.fluid.dygraph.dygraph_to_static_code + :noindex: + diff --git a/doc/fluid/api/dygraph/dygraph_to_static_func.rst b/doc/fluid/api/dygraph/dygraph_to_static_func.rst new file mode 100644 index 0000000000000000000000000000000000000000..d73ac96d88263e5c759b260ddd7dd45f57b9fe71 --- /dev/null +++ b/doc/fluid/api/dygraph/dygraph_to_static_func.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_dygraph_dygraph_to_static_func: + +dygraph_to_static_func +---------------------- + +.. autofunction:: paddle.fluid.dygraph.dygraph_to_static_func + :noindex: + diff --git a/doc/fluid/api/dygraph/dygraph_to_static_program.rst b/doc/fluid/api/dygraph/dygraph_to_static_program.rst new file mode 100644 index 0000000000000000000000000000000000000000..1f481533654b6db79b70a3a9b2235ec8a9696ec5 --- /dev/null +++ b/doc/fluid/api/dygraph/dygraph_to_static_program.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_dygraph_dygraph_to_static_program: + +dygraph_to_static_program +------------------------- + +.. autofunction:: paddle.fluid.dygraph.dygraph_to_static_program + :noindex: + diff --git a/doc/fluid/api/dygraph/enable_dygraph.rst b/doc/fluid/api/dygraph/enable_dygraph.rst new file mode 100644 index 0000000000000000000000000000000000000000..02dfdcd457761c8533118bcd7b505f427ab5d849 --- /dev/null +++ b/doc/fluid/api/dygraph/enable_dygraph.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_dygraph_enable_dygraph: + +enable_dygraph +-------------- + +.. autofunction:: paddle.fluid.dygraph.enable_dygraph + :noindex: + diff --git a/doc/fluid/api/dygraph/enabled.rst b/doc/fluid/api/dygraph/enabled.rst new file mode 100644 index 0000000000000000000000000000000000000000..dc2bfa7649ef185be254a6a71161b613726e7449 --- /dev/null +++ b/doc/fluid/api/dygraph/enabled.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_dygraph_enabled: + +enabled +------- + +.. autofunction:: paddle.fluid.dygraph.enabled + :noindex: + diff --git a/doc/fluid/api/dygraph/grad.rst b/doc/fluid/api/dygraph/grad.rst new file mode 100644 index 0000000000000000000000000000000000000000..fb697aaf16523a5ef1014061e91b2408468448d0 --- /dev/null +++ b/doc/fluid/api/dygraph/grad.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_dygraph_grad: + +grad +---- + +.. autofunction:: paddle.fluid.dygraph.grad + :noindex: + diff --git a/doc/fluid/api/fluid.rst b/doc/fluid/api/fluid.rst index 5a52fead243c1a5418b665312bcf9ed7cb96e158..40bd74ef82f388ec6f738dbc9602a6dbf367ec9e 100644 --- a/doc/fluid/api/fluid.rst +++ b/doc/fluid/api/fluid.rst @@ -20,14 +20,15 @@ fluid fluid/DataFeeder.rst fluid/default_main_program.rst fluid/default_startup_program.rst - fluid/disable_dygraph.rst fluid/device_guard.rst + fluid/disable_dygraph.rst fluid/DistributeTranspiler.rst fluid/DistributeTranspilerConfig.rst fluid/embedding.rst fluid/enable_dygraph.rst fluid/ExecutionStrategy.rst fluid/Executor.rst + fluid/get_flags.rst fluid/global_scope.rst fluid/gradients.rst fluid/in_dygraph_mode.rst @@ -47,6 +48,7 @@ fluid fluid/require_version.rst fluid/save.rst fluid/scope_guard.rst + fluid/set_flags.rst fluid/Tensor.rst fluid/Variable.rst fluid/WeightNormParamAttr.rst diff --git a/doc/fluid/api/fluid/device_guard.rst b/doc/fluid/api/fluid/device_guard.rst index 1cfdb1f90822b44e60cd04503805eb911be80d45..d8d611168644c45322972669bdd2806f393bcf43 100644 --- a/doc/fluid/api/fluid/device_guard.rst +++ b/doc/fluid/api/fluid/device_guard.rst @@ -4,7 +4,7 @@ .. _api_fluid_device_guard: device_guard ------------------------ +------------ .. autofunction:: paddle.fluid.device_guard :noindex: diff --git a/doc/fluid/api/fluid/get_flags.rst b/doc/fluid/api/fluid/get_flags.rst new file mode 100644 index 0000000000000000000000000000000000000000..2432965408118fe7c58d2898c2871391a32750a2 --- /dev/null +++ b/doc/fluid/api/fluid/get_flags.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_get_flags: + +get_flags +--------- + +.. autofunction:: paddle.fluid.get_flags + :noindex: + diff --git a/doc/fluid/api/fluid/set_flags.rst b/doc/fluid/api/fluid/set_flags.rst new file mode 100644 index 0000000000000000000000000000000000000000..730438b200ee575912c940d616f0dbffdcf73d41 --- /dev/null +++ b/doc/fluid/api/fluid/set_flags.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_set_flags: + +set_flags +--------- + +.. autofunction:: paddle.fluid.set_flags + :noindex: + diff --git a/doc/fluid/api/gen_doc.py b/doc/fluid/api/gen_doc.py index a45f854ef38715f66d019a58edb4e17dcb1a74c5..9651f206dbe9322a8f11b91554d5ca36a867e01b 100644 --- a/doc/fluid/api/gen_doc.py +++ b/doc/fluid/api/gen_doc.py @@ -19,6 +19,9 @@ import types import os import contextlib import paddle.fluid as fluid +import paddle.tensor as tensor +import paddle.nn as nn +#import paddle.framework as framework def parse_arg(): parser = argparse.ArgumentParser() @@ -29,8 +32,13 @@ def parse_arg(): '--module_prefix', type=str, help='Generate the prefix of module') parser.add_argument( '--output', type=str, help='Output file or output directory for output rst') + parser.add_argument( + '--output_name', type=str, help='Output file or output directory for output rst') + parser.add_argument( + '--output_dir', type=str, help='Output file or output directory for output rst') parser.add_argument( '--to_multiple_files', type=bool, default=False, help='Whether to separate to multiple files') + return parser.parse_args() def print_item(self, name): @@ -140,7 +148,7 @@ class DocGenerator(object): self.stream.write(".. _api_{0}_{1}:\n\n".format("_".join( self.module_prefix.split(".")), name)) -def generate_doc(module_name, module_prefix, output, to_multiple_files): +def generate_doc(module_name, module_prefix, output, output_name, to_multiple_files, output_dir): if module_name == "": module_name = None @@ -150,24 +158,29 @@ def generate_doc(module_name, module_prefix, output, to_multiple_files): gen = DocGenerator() if module_name is None: - gen.module = fluid - gen.module_name = 'fluid' + gen.module = eval(output_name) + gen.module_name = str(output_name) else: - gen.module = fluid + gen.module = eval(output_name) for each_module_name in module_name.split('.'): if not hasattr(gen.module, each_module_name): raise ValueError("Cannot find fluid.{0}".format(module_name)) else: gen.module = getattr(gen.module, each_module_name) - gen.module_name = "fluid." + module_name + gen.module_name = output_name + "." + module_name if module_prefix is None: gen.module_prefix = gen.module_name else: - gen.module_prefix = "fluid." + module_prefix + gen.module_prefix = output_name + "." + module_prefix dirname = output if to_multiple_files else os.path.dirname(output) + + if output_dir != None: + dirname = output_dir + "/" + dirname + output = output_dir + "/" + output + if len(dirname) > 0 and (not os.path.exists(dirname) or not os.path.isdir(dirname)): os.makedirs(dirname) @@ -199,7 +212,7 @@ def generate_doc(module_name, module_prefix, output, to_multiple_files): def main(): args = parse_arg() - generate_doc(args.module_name, args.module_prefix, args.output, args.to_multiple_files) + generate_doc(args.module_name, args.module_prefix, args.output, args.output_name, args.to_multiple_files, args.output_dir) if __name__ == '__main__': diff --git a/doc/fluid/api/gen_doc.sh b/doc/fluid/api/gen_doc.sh index 1e833161ef0e225e4725136ad3d466d945b69bec..e315fa496d57a15e2668919ea59b00f1b957ba39 100644 --- a/doc/fluid/api/gen_doc.sh +++ b/doc/fluid/api/gen_doc.sh @@ -1,23 +1,29 @@ #!/bin/bash -#for module in nn -#do -# python gen_doc.py --module_name layers.${module} --module_prefix layers --output layers/${module} --to_multiple_files True -#done - -#for module in control_flow nn io ops tensor learning_rate_scheduler detection metric_op -#do -# python gen_doc.py --module_name layers.${module} --module_prefix layers --output layers/${module}.rst -#done - for module in layers dataset clip metrics executor initializer io nets optimizer profiler regularizer transpiler backward profiler unique_name dygraph do - python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --to_multiple_files True + python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name fluid --to_multiple_files True python gen_module_index.py ${module} fluid.${module} done -python gen_doc.py --module_name "" --module_prefix "" --output fluid --to_multiple_files True +python gen_doc.py --module_name "" --module_prefix "" --output fluid --output_name fluid --to_multiple_files True python gen_module_index.py fluid fluid +for module in math random stat +do + python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name tensor --to_multiple_files True --output_dir tensor + python gen_module_index.py tensor.${module} ${module} +done + +python gen_module_index.py tensor paddle.tensor + +for module in loss +do + python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name nn --to_multiple_files True --output_dir nn + python gen_module_index.py nn.${module} ${module} +done + +python gen_module_index.py nn paddle.nn + python gen_index.py diff --git a/doc/fluid/api/index_en.rst b/doc/fluid/api/index_en.rst index dce0d7cf9679ca2b47fd2a2713c31d5538b6a61c..e4e1661a0aa0020e69294f6df4dc2efc426ac603 100644 --- a/doc/fluid/api/index_en.rst +++ b/doc/fluid/api/index_en.rst @@ -19,8 +19,10 @@ API Reference layers.rst metrics.rst nets.rst + nn.rst optimizer.rst profiler.rst regularizer.rst + tensor.rst transpiler.rst unique_name.rst diff --git a/doc/fluid/api/io/ComposeNotAligned.rst b/doc/fluid/api/io/ComposeNotAligned.rst index c2f3f465b7dd287e22c5bdcf2401c40d5494d4f1..3968d80ce3cc5c174763c7b6161c80e3c3840042 100644 --- a/doc/fluid/api/io/ComposeNotAligned.rst +++ b/doc/fluid/api/io/ComposeNotAligned.rst @@ -11,4 +11,3 @@ ComposeNotAligned :inherited-members: :noindex: -This indicates an error state of compose API, which will raise when outputs of readers are not aligned. diff --git a/doc/fluid/api/layers.rst b/doc/fluid/api/layers.rst index 066c1d5c744f40df2a8d9c092bc68e28d2786c7e..3ad2eeaa2941b7a0b6df18eb51aaf7b3691504b5 100644 --- a/doc/fluid/api/layers.rst +++ b/doc/fluid/api/layers.rst @@ -25,6 +25,7 @@ fluid.layers layers/atan.rst layers/auc.rst layers/autoincreased_step_counter.rst + layers/BasicDecoder.rst layers/batch_norm.rst layers/beam_search.rst layers/beam_search_decode.rst @@ -68,6 +69,7 @@ fluid.layers layers/cumsum.rst layers/data.rst layers/data_norm.rst + layers/DecodeHelper.rst layers/Decoder.rst layers/deformable_conv.rst layers/deformable_roi_pooling.rst @@ -104,6 +106,7 @@ fluid.layers layers/eye.rst layers/fc.rst layers/fill_constant.rst + layers/fill_constant_batch_size_like.rst layers/filter_by_instag.rst layers/flatten.rst layers/floor.rst @@ -112,6 +115,7 @@ fluid.layers layers/gather_nd.rst layers/gather_tree.rst layers/gaussian_random.rst + layers/gaussian_random_batch_size_like.rst layers/gelu.rst layers/generate_mask_labels.rst layers/generate_proposal_labels.rst @@ -119,6 +123,7 @@ fluid.layers layers/get_tensor_from_selected_rows.rst layers/greater_equal.rst layers/greater_than.rst + layers/GreedyEmbeddingHelper.rst layers/grid_sampler.rst layers/group_norm.rst layers/gru_unit.rst @@ -136,6 +141,7 @@ fluid.layers layers/image_resize.rst layers/image_resize_short.rst layers/increment.rst + layers/inplace_abn.rst layers/instance_norm.rst layers/inverse_time_decay.rst layers/iou_similarity.rst @@ -237,6 +243,7 @@ fluid.layers layers/rpn_target_assign.rst layers/rsqrt.rst layers/sampled_softmax_with_cross_entropy.rst + layers/SampleEmbeddingHelper.rst layers/sampling_id.rst layers/scale.rst layers/scatter.rst @@ -302,10 +309,12 @@ fluid.layers layers/tensor_array_to_tensor.rst layers/thresholded_relu.rst layers/topk.rst + layers/TrainingHelper.rst layers/transpose.rst layers/unfold.rst layers/Uniform.rst layers/uniform_random.rst + layers/uniform_random_batch_size_like.rst layers/unique.rst layers/unique_with_counts.rst layers/unsqueeze.rst diff --git a/doc/fluid/api/layers/BasicDecoder.rst b/doc/fluid/api/layers/BasicDecoder.rst new file mode 100644 index 0000000000000000000000000000000000000000..8eb0f78dc8621d42061dfe944106c64007a0d0c1 --- /dev/null +++ b/doc/fluid/api/layers/BasicDecoder.rst @@ -0,0 +1,13 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_BasicDecoder: + +BasicDecoder +------------ + +.. autoclass:: paddle.fluid.layers.BasicDecoder + :members: + :inherited-members: + :noindex: + diff --git a/doc/fluid/api/layers/DecodeHelper.rst b/doc/fluid/api/layers/DecodeHelper.rst new file mode 100644 index 0000000000000000000000000000000000000000..ba475f2a1daea0e842d07d0372de2a828de2931a --- /dev/null +++ b/doc/fluid/api/layers/DecodeHelper.rst @@ -0,0 +1,13 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_DecodeHelper: + +DecodeHelper +------------ + +.. autoclass:: paddle.fluid.layers.DecodeHelper + :members: + :inherited-members: + :noindex: + diff --git a/doc/fluid/api/layers/GreedyEmbeddingHelper.rst b/doc/fluid/api/layers/GreedyEmbeddingHelper.rst new file mode 100644 index 0000000000000000000000000000000000000000..fb7741ec499fcff9404d384b760c03613c63a327 --- /dev/null +++ b/doc/fluid/api/layers/GreedyEmbeddingHelper.rst @@ -0,0 +1,13 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_GreedyEmbeddingHelper: + +GreedyEmbeddingHelper +--------------------- + +.. autoclass:: paddle.fluid.layers.GreedyEmbeddingHelper + :members: + :inherited-members: + :noindex: + diff --git a/doc/fluid/api/layers/SampleEmbeddingHelper.rst b/doc/fluid/api/layers/SampleEmbeddingHelper.rst new file mode 100644 index 0000000000000000000000000000000000000000..99b9ca39900643e5c5c5806106c66a151fdf9a27 --- /dev/null +++ b/doc/fluid/api/layers/SampleEmbeddingHelper.rst @@ -0,0 +1,13 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_SampleEmbeddingHelper: + +SampleEmbeddingHelper +--------------------- + +.. autoclass:: paddle.fluid.layers.SampleEmbeddingHelper + :members: + :inherited-members: + :noindex: + diff --git a/doc/fluid/api/layers/TrainingHelper.rst b/doc/fluid/api/layers/TrainingHelper.rst new file mode 100644 index 0000000000000000000000000000000000000000..247ac73d1f15cc9413ec60e0dd4d7d9047e308de --- /dev/null +++ b/doc/fluid/api/layers/TrainingHelper.rst @@ -0,0 +1,13 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_TrainingHelper: + +TrainingHelper +-------------- + +.. autoclass:: paddle.fluid.layers.TrainingHelper + :members: + :inherited-members: + :noindex: + diff --git a/doc/fluid/api/layers/fill_constant_batch_size_like.rst b/doc/fluid/api/layers/fill_constant_batch_size_like.rst new file mode 100644 index 0000000000000000000000000000000000000000..ae7abaea85f6382382eaf18bde3e045ee7d222f2 --- /dev/null +++ b/doc/fluid/api/layers/fill_constant_batch_size_like.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_fill_constant_batch_size_like: + +fill_constant_batch_size_like +----------------------------- + +.. autofunction:: paddle.fluid.layers.fill_constant_batch_size_like + :noindex: + diff --git a/doc/fluid/api/layers/gaussian_random_batch_size_like.rst b/doc/fluid/api/layers/gaussian_random_batch_size_like.rst new file mode 100644 index 0000000000000000000000000000000000000000..f57989de190118d2e0deb05726e43f1958d68fce --- /dev/null +++ b/doc/fluid/api/layers/gaussian_random_batch_size_like.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_gaussian_random_batch_size_like: + +gaussian_random_batch_size_like +------------------------------- + +.. autofunction:: paddle.fluid.layers.gaussian_random_batch_size_like + :noindex: + diff --git a/doc/fluid/api/layers/inplace_abn.rst b/doc/fluid/api/layers/inplace_abn.rst new file mode 100644 index 0000000000000000000000000000000000000000..b3b31942f37c7cb43a5d95f1d6965acaf08efdca --- /dev/null +++ b/doc/fluid/api/layers/inplace_abn.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_inplace_abn: + +inplace_abn +----------- + +.. autofunction:: paddle.fluid.layers.inplace_abn + :noindex: + diff --git a/doc/fluid/api/layers/uniform_random_batch_size_like.rst b/doc/fluid/api/layers/uniform_random_batch_size_like.rst new file mode 100644 index 0000000000000000000000000000000000000000..8d2ea2d33afb508decec030d1edfd0556d01ad56 --- /dev/null +++ b/doc/fluid/api/layers/uniform_random_batch_size_like.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_fluid_layers_uniform_random_batch_size_like: + +uniform_random_batch_size_like +------------------------------ + +.. autofunction:: paddle.fluid.layers.uniform_random_batch_size_like + :noindex: + diff --git a/doc/fluid/api/nn.rst b/doc/fluid/api/nn.rst new file mode 100644 index 0000000000000000000000000000000000000000..362b033a8d8d7bfd58bc4a8bb035c66db3db2d2c --- /dev/null +++ b/doc/fluid/api/nn.rst @@ -0,0 +1,8 @@ +========= +paddle.nn +========= + +.. toctree:: + :maxdepth: 1 + + nn/loss.rst diff --git a/doc/fluid/api/nn/loss.rst b/doc/fluid/api/nn/loss.rst new file mode 100644 index 0000000000000000000000000000000000000000..c9b096802c039606f09c6373081e402b5302b068 --- /dev/null +++ b/doc/fluid/api/nn/loss.rst @@ -0,0 +1,8 @@ +==== +loss +==== + +.. toctree:: + :maxdepth: 1 + + loss/L1Loss.rst diff --git a/doc/fluid/api/nn/loss/L1Loss.rst b/doc/fluid/api/nn/loss/L1Loss.rst new file mode 100644 index 0000000000000000000000000000000000000000..161cb38c80a87c4ba38ed685133a705811ec9103 --- /dev/null +++ b/doc/fluid/api/nn/loss/L1Loss.rst @@ -0,0 +1,13 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_nn_loss_L1Loss: + +L1Loss +------ + +.. autoclass:: paddle.nn.loss.L1Loss + :members: + :inherited-members: + :noindex: + diff --git a/doc/fluid/api/tensor.rst b/doc/fluid/api/tensor.rst new file mode 100644 index 0000000000000000000000000000000000000000..1255d761e1a18b1d0cc42a369afb01080f4cbce5 --- /dev/null +++ b/doc/fluid/api/tensor.rst @@ -0,0 +1,9 @@ +============= +paddle.tensor +============= + +.. toctree:: + :maxdepth: 1 + + tensor/math.rst + tensor/random.rst diff --git a/doc/fluid/api/tensor/math.rst b/doc/fluid/api/tensor/math.rst new file mode 100644 index 0000000000000000000000000000000000000000..1b28cfecb50c699967f74c9788e120644dab688e --- /dev/null +++ b/doc/fluid/api/tensor/math.rst @@ -0,0 +1,18 @@ +==== +math +==== + +.. toctree:: + :maxdepth: 1 + + math/add.rst + math/atan.rst + math/div.rst + math/elementwise_sum.rst + math/mm.rst + math/mul.rst + math/pow.rst + math/sin.rst + math/sqrt.rst + math/sum.rst + math/tanh.rst diff --git a/doc/fluid/api/tensor/math/add.rst b/doc/fluid/api/tensor/math/add.rst new file mode 100644 index 0000000000000000000000000000000000000000..0b604ac2d1805066bfa24f8a12c8fbed36b14d0d --- /dev/null +++ b/doc/fluid/api/tensor/math/add.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_add: + +add +--- + +.. autofunction:: paddle.tensor.math.add + :noindex: + diff --git a/doc/fluid/api/tensor/math/atan.rst b/doc/fluid/api/tensor/math/atan.rst new file mode 100644 index 0000000000000000000000000000000000000000..31b11dbbe4fbc39d7f5c472478cea1544edfefe7 --- /dev/null +++ b/doc/fluid/api/tensor/math/atan.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_atan: + +atan +---- + +.. autofunction:: paddle.tensor.math.atan + :noindex: + diff --git a/doc/fluid/api/tensor/math/div.rst b/doc/fluid/api/tensor/math/div.rst new file mode 100644 index 0000000000000000000000000000000000000000..cf8397dbffc36f895319dced427487e7f3851d40 --- /dev/null +++ b/doc/fluid/api/tensor/math/div.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_div: + +div +--- + +.. autofunction:: paddle.tensor.math.div + :noindex: + diff --git a/doc/fluid/api/tensor/math/elementwise_sum.rst b/doc/fluid/api/tensor/math/elementwise_sum.rst new file mode 100644 index 0000000000000000000000000000000000000000..05acb3f78f66192b5eea938cdd528b56da247a22 --- /dev/null +++ b/doc/fluid/api/tensor/math/elementwise_sum.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_elementwise_sum: + +elementwise_sum +--------------- + +.. autofunction:: paddle.tensor.math.elementwise_sum + :noindex: + diff --git a/doc/fluid/api/tensor/math/mm.rst b/doc/fluid/api/tensor/math/mm.rst new file mode 100644 index 0000000000000000000000000000000000000000..8668c44055f25025ae080b3a08cce39919cd888b --- /dev/null +++ b/doc/fluid/api/tensor/math/mm.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_mm: + +mm +-- + +.. autofunction:: paddle.tensor.math.mm + :noindex: + diff --git a/doc/fluid/api/tensor/math/mul.rst b/doc/fluid/api/tensor/math/mul.rst new file mode 100644 index 0000000000000000000000000000000000000000..9b14559a4a35c701b8169cf425c1e8787db3561c --- /dev/null +++ b/doc/fluid/api/tensor/math/mul.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_mul: + +mul +--- + +.. autofunction:: paddle.tensor.math.mul + :noindex: + diff --git a/doc/fluid/api/tensor/math/pow.rst b/doc/fluid/api/tensor/math/pow.rst new file mode 100644 index 0000000000000000000000000000000000000000..5d0da558dd738aee845ef01c14d62ba4f023e921 --- /dev/null +++ b/doc/fluid/api/tensor/math/pow.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_pow: + +pow +--- + +.. autofunction:: paddle.tensor.math.pow + :noindex: + diff --git a/doc/fluid/api/tensor/math/sin.rst b/doc/fluid/api/tensor/math/sin.rst new file mode 100644 index 0000000000000000000000000000000000000000..862334131da6f38a853fcff1ed5860db625561ae --- /dev/null +++ b/doc/fluid/api/tensor/math/sin.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_sin: + +sin +--- + +.. autofunction:: paddle.tensor.math.sin + :noindex: + diff --git a/doc/fluid/api/tensor/math/sqrt.rst b/doc/fluid/api/tensor/math/sqrt.rst new file mode 100644 index 0000000000000000000000000000000000000000..c0ad257993458844e18e514731f988b289a8e73f --- /dev/null +++ b/doc/fluid/api/tensor/math/sqrt.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_sqrt: + +sqrt +---- + +.. autofunction:: paddle.tensor.math.sqrt + :noindex: + diff --git a/doc/fluid/api/tensor/math/sum.rst b/doc/fluid/api/tensor/math/sum.rst new file mode 100644 index 0000000000000000000000000000000000000000..8946b3aa5056818046793ac856962c6ee4e0d175 --- /dev/null +++ b/doc/fluid/api/tensor/math/sum.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_sum: + +sum +--- + +.. autofunction:: paddle.tensor.math.sum + :noindex: + diff --git a/doc/fluid/api/tensor/math/tanh.rst b/doc/fluid/api/tensor/math/tanh.rst new file mode 100644 index 0000000000000000000000000000000000000000..ceb5971e0c3e51565f9fedec4df881eab5f86396 --- /dev/null +++ b/doc/fluid/api/tensor/math/tanh.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_math_tanh: + +tanh +---- + +.. autofunction:: paddle.tensor.math.tanh + :noindex: + diff --git a/doc/fluid/api/tensor/random.rst b/doc/fluid/api/tensor/random.rst new file mode 100644 index 0000000000000000000000000000000000000000..c83d597f154c61ba799b00cbb49872febcde787a --- /dev/null +++ b/doc/fluid/api/tensor/random.rst @@ -0,0 +1,9 @@ +====== +random +====== + +.. toctree:: + :maxdepth: 1 + + random/randint.rst + random/randperm.rst diff --git a/doc/fluid/api/tensor/random/randint.rst b/doc/fluid/api/tensor/random/randint.rst new file mode 100644 index 0000000000000000000000000000000000000000..e5a9d6139f425536ef05ab9ef28f3fba625ce7ad --- /dev/null +++ b/doc/fluid/api/tensor/random/randint.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_random_randint: + +randint +------- + +.. autofunction:: paddle.tensor.random.randint + :noindex: + diff --git a/doc/fluid/api/tensor/random/randperm.rst b/doc/fluid/api/tensor/random/randperm.rst new file mode 100644 index 0000000000000000000000000000000000000000..0aa4cc612db88a78abc8b3bb91fc347bed5b6f5a --- /dev/null +++ b/doc/fluid/api/tensor/random/randperm.rst @@ -0,0 +1,11 @@ +.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}` + !DO NOT EDIT THIS FILE MANUALLY! + +.. _api_tensor_random_randperm: + +randperm +-------- + +.. autofunction:: paddle.tensor.random.randperm + :noindex: + diff --git a/doc/fluid/api_cn/clip_cn/GradientClipByGlobalNorm_cn.rst b/doc/fluid/api_cn/clip_cn/GradientClipByGlobalNorm_cn.rst index 959c75d56bd36cbead37c58fc0a7cf4f344ac9e7..dae28cd5448d568f30db1fb1d83b8fb2c87c075f 100644 --- a/doc/fluid/api_cn/clip_cn/GradientClipByGlobalNorm_cn.rst +++ b/doc/fluid/api_cn/clip_cn/GradientClipByGlobalNorm_cn.rst @@ -3,13 +3,19 @@ GradientClipByGlobalNorm ------------------------------- -.. py:class:: paddle.fluid.clip.GradientClipByGlobalNorm(clip_norm, group_name='default_group') +.. py:class:: paddle.fluid.clip.GradientClipByGlobalNorm(clip_norm, group_name='default_group', need_clip=None) -通过多个 Tensor 的范数之和的比率,来剪切(clip)多个 Tensor ( Tensor 不是从该类传入, 通过 ``fluid.program_guard`` 的 ``main_program`` 参数传入,即公式中的 :math:`t\_list` 见代码实例)。 +将一个 Tensor列表 :math:`t\_list` 中所有Tensor的L2范数之和,限定在 ``clip_norm`` 范围内。 -给定一个 Tensor 列表 :math:`t\_list` 和一个剪切比率 ``clip_norm`` ,返回该类的实例作为 ``set_gradient_clip`` 方法的第一个参数, ``set_gradient_clip`` 第二个参数是用来计算被剪切的 Tensor 列表(该值默认为 ``None`` 会基于所有 Tensor 列表来计算全局范数 ``global_norm`` 。 +- 如果范数之和大于 ``clip_norm`` ,则所有 Tensor 会乘以一个系数进行压缩 -剪切过程如下: +- 如果范数之和小于或等于 ``clip_norm`` ,则不会进行任何操作。 + +输入的 Tensor列表 不是从该类里传入, 而是默认会选择 ``Program`` 中全部的梯度,如果 ``need_clip`` 不为None,则可以只选择部分参数进行梯度裁剪。 + +该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效,可参看 ``optimizer`` 文档(例如: :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。 + +裁剪公式如下: .. math:: \\t\_list[i]=t\_list[i]∗\frac{clip\_norm}{max(global\_norm,clip\_norm)}\\ @@ -21,67 +27,73 @@ GradientClipByGlobalNorm 参数: - - **clip_norm** (float) - 范数最大值 + - **clip_norm** (float) - 所允许的范数最大值 - **group_name** (str, optional) - 剪切的组名 + - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数,该函数接收一个 ``Parameter`` ,返回一个 ``bool`` (True表示需要裁剪,False不需要裁剪)。默认为None,此时会裁剪网络中全部参数。 -**代码示例** +**代码示例1:静态图** .. code-block:: python - - import paddle.fluid as fluid - import paddle.fluid.core as core + import paddle - - place = core.CPUPlace() - prog = fluid.framework.Program() - startup_program = fluid.framework.Program() + import paddle.fluid as fluid + import numpy as np + + main_prog = fluid.Program() + startup_prog = fluid.Program() with fluid.program_guard( - main_program=prog, startup_program=startup_program): - image = fluid.layers.data(name='x', shape=[784], dtype='float32') - label = fluid.layers.data(name='y', shape=[1], dtype='int64') - hidden1 = fluid.layers.fc(input=image, size=128, act='relu') - hidden2 = fluid.layers.fc(input=hidden1, size=64, act='relu') - predict = fluid.layers.fc(input=hidden2, size=10, act='softmax') - cost = fluid.layers.cross_entropy(input=predict, label=label) - avg_cost = fluid.layers.mean(cost) - - prog_clip = prog.clone() - avg_cost_clip = prog_clip.block(0).var(avg_cost.name) - - p_g = fluid.backward.append_backward(loss=avg_cost) - p_g_clip = fluid.backward.append_backward(loss=avg_cost_clip) - - with fluid.program_guard(main_program=prog_clip, startup_program=startup_program): - fluid.clip.set_gradient_clip( - fluid.clip.GradientClipByGlobalNorm(clip_norm=2.0)) - p_g_clip = fluid.clip.append_gradient_clip_ops(p_g_clip) - - grad_list = [elem[1] for elem in p_g] - grad_clip_list = [elem[1] for elem in p_g_clip] - - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=8192), - batch_size=128) - - exe = fluid.Executor(place) - feeder = fluid.DataFeeder(feed_list=[image, label], place=place) - exe.run(startup_program) - - count = 0 - for data in train_reader(): - count += 1 - print("count:%s" % count) - if count > 5: - break - out = exe.run(prog, feed=feeder.feed(data), fetch_list=grad_list) - out_clip = exe.run(prog_clip, - feed=feeder.feed(data), - fetch_list=grad_clip_list) - + main_program=main_prog, startup_program=startup_prog): + image = fluid.data( + name='x', shape=[-1, 2], dtype='float32') + predict = fluid.layers.fc(input=image, size=3, act='relu') #Trainable parameters: fc_0.w.0, fc_0.b.0 + loss = fluid.layers.mean(predict) + + # 裁剪网络中全部参数: + clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0) + + # 仅裁剪参数fc_0.w_0时: + # 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为Parameter,返回类型为bool + # def fileter_func(Parameter): + # # 可以较为方便的通过Parameter.name判断(name可以在fluid.ParamAttr中设置,默认为fc_0.w_0、fc_0.b_0) + # return Parameter.name=="fc_0.w_0" + # clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func) + sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1) + sgd_optimizer.minimize(loss, grad_clip=clip) + place = fluid.CPUPlace() + exe = fluid.Executor(place) + x = np.random.uniform(-100, 100, (10, 2)).astype('float32') + exe.run(startup_prog) + out = exe.run(main_prog, feed={'x': x}, fetch_list=loss) +**代码示例2:动态图** +.. code-block:: python + import paddle + import paddle.fluid as fluid + + with fluid.dygraph.guard(): + linear = fluid.dygraph.Linear(10, 10) #可训练参数: linear_0.w.0, linear_0.b.0 + inputs = fluid.layers.uniform_random([32, 10]).astype('float32') + out = linear(fluid.dygraph.to_variable(inputs)) + loss = fluid.layers.reduce_mean(out) + loss.backward() + + # 裁剪网络中全部参数: + clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0) + + # 仅裁剪参数linear_0.w_0时: + # 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为ParamBase,返回类型为bool + # def fileter_func(ParamBase): + # # 可以通过ParamBase.name判断(name可以在fluid.ParamAttr中设置,默认为linear_0.w_0、linear_0.b_0) + # return ParamBase.name == "linear_0.w_0" + # # 注:linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差,也可以此来判断 + # return ParamBase.name == linear.weight.name + # clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func) + + sgd_optimizer = fluid.optimizer.SGD( + learning_rate=0.1, parameter_list=linear.parameters()) + sgd_optimizer.minimize(loss, grad_clip=clip) \ No newline at end of file diff --git a/doc/fluid/api_cn/clip_cn/GradientClipByNorm_cn.rst b/doc/fluid/api_cn/clip_cn/GradientClipByNorm_cn.rst index d3eeb280eb89bdc30dcb681bee30316c9c6975a0..d4de77526d4f15964313368b117aff1903786f04 100644 --- a/doc/fluid/api_cn/clip_cn/GradientClipByNorm_cn.rst +++ b/doc/fluid/api_cn/clip_cn/GradientClipByNorm_cn.rst @@ -3,11 +3,19 @@ GradientClipByNorm ------------------------------- -.. py:class:: paddle.fluid.clip.GradientClipByNorm(clip_norm) +.. py:class:: paddle.fluid.clip.GradientClipByNorm(clip_norm, need_clip=None) -将输入多维Tensor :math:`X` 转换为L2范数不超过给定的二范数最大值( ``clip_norm`` )的多维Tensor。(多维Tensor不是从该类传入, 而是通过 ``fluid.program_guard`` 的 ``main_program`` 参数传入)。 +将输入的多维Tensor :math:`X` 的L2范数限制在 ``clip_norm`` 范围之内。 -该类限制了输入多维Tensor :math:`X` 的L2范数不会超过 ``clip_norm`` 。 +- 如果L2范数大于 ``clip_norm`` ,则该 Tensor 会乘以一个系数进行压缩 + +- 如果L2范数小于或等于 ``clip_norm`` ,则不会进行任何操作。 + +输入的 Tensor 不是从该类里传入, 而是默认会选择 ``Program`` 中全部的梯度,如果 ``need_clip`` 不为None,则可以只选择部分参数进行梯度裁剪。 + +该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效,可参看 ``optimizer`` 文档(例如: :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。 + +裁剪公式如下: .. math:: @@ -26,54 +34,72 @@ GradientClipByNorm \\norm(X) = (\sum_{i=1}^{n}|x_i|^2)^{\frac{1}{2}}\\ 参数: - - **clip_norm** (float) - 二范数最大值 - + - **clip_norm** (float) - 所允许的二范数最大值。 + - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数,该函数接收一个 ``Parameter`` ,返回一个 ``bool`` (True表示需要裁剪,False不需要裁剪)。默认为None,此时会裁剪网络中全部参数。 -**代码示例** +**代码示例1:静态图** + +.. code-block:: python + + import paddle + import paddle.fluid as fluid + import numpy as np + + main_prog = fluid.Program() + startup_prog = fluid.Program() + with fluid.program_guard( + main_program=main_prog, startup_program=startup_prog): + image = fluid.data( + name='x', shape=[-1, 2], dtype='float32') + predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0 + loss = fluid.layers.mean(predict) + + # 裁剪网络中全部参数: + clip = fluid.clip.GradientClipByNorm(clip_norm=1.0) + + # 仅裁剪参数fc_0.w_0时: + # 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为Parameter,返回类型为bool + # def fileter_func(Parameter): + # # 可以较为方便的通过Parameter.name判断(name可以在fluid.ParamAttr中设置,默认为fc_0.w_0、fc_0.b_0) + # return Parameter.name=="fc_0.w_0" + # clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func) + + sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1) + sgd_optimizer.minimize(loss, grad_clip=clip) + + place = fluid.CPUPlace() + exe = fluid.Executor(place) + x = np.random.uniform(-100, 100, (10, 2)).astype('float32') + exe.run(startup_prog) + out = exe.run(main_prog, feed={'x': x}, fetch_list=loss) + + +**代码示例2:动态图** .. code-block:: python - import paddle.fluid as fluid - import paddle.fluid.core as core - import paddle - place = core.CPUPlace() - prog = fluid.framework.Program() - startup_program = fluid.framework.Program() - with fluid.program_guard( - main_program=prog, startup_program=startup_program): - image = fluid.layers.data(name='x', shape=[784], dtype='float32') - label = fluid.layers.data(name='y', shape=[1], dtype='int64') - hidden1 = fluid.layers.fc(input=image, size=128, act='relu') - hidden2 = fluid.layers.fc(input=hidden1, size=64, act='relu') - predict = fluid.layers.fc(input=hidden2, size=10, act='softmax') - cost = fluid.layers.cross_entropy(input=predict, label=label) - avg_cost = fluid.layers.mean(cost) - prog_clip = prog.clone() - avg_cost_clip = prog_clip.block(0).var(avg_cost.name) - p_g = fluid.backward.append_backward(loss=avg_cost) - p_g_clip = fluid.backward.append_backward(loss=avg_cost_clip) - with fluid.program_guard(main_program=prog_clip, startup_program=startup_program): - fluid.clip.set_gradient_clip( - fluid.clip.GradientClipByNorm(clip_norm=2.0)) - p_g_clip = fluid.clip.append_gradient_clip_ops(p_g_clip) - grad_list = [elem[1] for elem in p_g] - grad_clip_list = [elem[1] for elem in p_g_clip] - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=8192), - batch_size=128) - - exe = fluid.Executor(place) - feeder = fluid.DataFeeder(feed_list=[image, label], place=place) - exe.run(startup_program) - - count = 0 - for data in train_reader(): - count += 1 - print("count:%s" % count) - if count > 5: - break - out = exe.run(prog, feed=feeder.feed(data), fetch_list=grad_list) - out_clip = exe.run(prog_clip, - feed=feeder.feed(data), - fetch_list=grad_clip_list) + import paddle + import paddle.fluid as fluid + + with fluid.dygraph.guard(): + linear = fluid.dygraph.Linear(10, 10) #可训练参数: linear_0.w.0, linear_0.b.0 + inputs = fluid.layers.uniform_random([32, 10]).astype('float32') + out = linear(fluid.dygraph.to_variable(inputs)) + loss = fluid.layers.reduce_mean(out) + loss.backward() + + # 裁剪网络中全部参数: + clip = fluid.clip.GradientClipByNorm(clip_norm=1.0) + + # 仅裁剪参数linear_0.w_0时: + # 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为ParamBase,返回类型为bool + # def fileter_func(ParamBase): + # # 可以通过ParamBase.name判断(name可以在fluid.ParamAttr中设置,默认为linear_0.w_0、linear_0.b_0) + # return ParamBase.name == "linear_0.w_0" + # # 注:linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差,也可以此来判断 + # return ParamBase.name == linear.weight.name + # clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func) + + sgd_optimizer = fluid.optimizer.SGD( + learning_rate=0.1, parameter_list=linear.parameters()) + sgd_optimizer.minimize(loss, grad_clip=clip) \ No newline at end of file diff --git a/doc/fluid/api_cn/clip_cn/GradientClipByValue_cn.rst b/doc/fluid/api_cn/clip_cn/GradientClipByValue_cn.rst index 236459f71277f9ea4fe80ba2a4bd9e3e98a33e0a..338058cb71628499804d7defdfe10fb1268b6505 100644 --- a/doc/fluid/api_cn/clip_cn/GradientClipByValue_cn.rst +++ b/doc/fluid/api_cn/clip_cn/GradientClipByValue_cn.rst @@ -3,10 +3,14 @@ GradientClipByValue ------------------------------- -.. py:class:: paddle.fluid.clip.GradientClipByValue(max, min=None) +.. py:class:: paddle.fluid.clip.GradientClipByValue(max, min=None, need_clip=None) -将梯度值(gradient values)的范围压缩到 [min, max]。 +将输入的多维Tensor :math:`X` 的值限制在 [min, max] 范围。 + +输入的 Tensor 不是从该类里传入, 而是默认会选择 ``Program`` 中全部的梯度,如果 ``need_clip`` 不为None,则可以只选择部分参数进行梯度裁剪。 + +该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效,可参看 ``optimizer`` 文档(例如: :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。 给定一个 Tensor ``t`` ,该操作将它的值压缩到 ``min`` 和 ``max`` 之间 @@ -16,25 +20,75 @@ GradientClipByValue 参数: - **max** (foat) - 要修剪的最大值。 - - **min** (float,optional) - 要修剪的最小值。如果用户没有设置,将被 ``framework`` 设置为 ``-max`` 。 + - **min** (float,optional) - 要修剪的最小值。如果用户没有设置,将被自动设置为 ``-max`` (此时 ``max`` 必须大于0)。 + - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数,该函数接收一个 ``Parameter`` ,返回一个 ``bool`` (True表示需要裁剪,False不需要裁剪)。默认为None,此时会裁剪网络中全部参数。 -**代码示例** +**代码示例1:静态图** .. code-block:: python + + import paddle + import paddle.fluid as fluid + import numpy as np + + main_prog = fluid.Program() + startup_prog = fluid.Program() + with fluid.program_guard( + main_program=main_prog, startup_program=startup_prog): + image = fluid.data( + name='x', shape=[-1, 2], dtype='float32') + predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0 + loss = fluid.layers.mean(predict) + + # 裁剪网络中全部参数: + clip = fluid.clip.GradientClipByValue(min=-1, max=1) - import paddle.fluid as fluid - w_param_attrs = fluid.ParamAttr(name=None, - initializer=fluid.initializer.UniformInitializer(low=-1.0, high=1.0, seed=0), - learning_rate=1.0, - regularizer=fluid.regularizer.L1Decay(1.0), - trainable=True, - gradient_clip=fluid.clip.GradientClipByValue(-1.0, 1.0)) - x = fluid.layers.data(name='x', shape=[10], dtype='float32') - y_predict = fluid.layers.fc(input=x, size=1, param_attr=w_param_attrs) - + # 仅裁剪参数fc_0.w_0时: + # 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为Parameter,返回类型为bool + # def fileter_func(Parameter): + # # 可以较为方便的通过Parameter.name判断(name可以在fluid.ParamAttr中设置,默认为fc_0.w_0、fc_0.b_0) + # return Parameter.name=="fc_0.w_0" + # clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func) + + sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1) + sgd_optimizer.minimize(loss, grad_clip=clip) + + place = fluid.CPUPlace() + exe = fluid.Executor(place) + x = np.random.uniform(-100, 100, (10, 2)).astype('float32') + exe.run(startup_prog) + out = exe.run(main_prog, feed={'x': x}, fetch_list=loss) + + +**代码示例2:动态图** + +.. code-block:: python + + import paddle + import paddle.fluid as fluid + + with fluid.dygraph.guard(): + linear = fluid.dygraph.Linear(10, 10) #可训练参数: linear_0.w.0, linear_0.b.0 + inputs = fluid.layers.uniform_random([32, 10]).astype('float32') + out = linear(fluid.dygraph.to_variable(inputs)) + loss = fluid.layers.reduce_mean(out) + loss.backward() + # 裁剪网络中全部参数: + clip = fluid.clip.GradientClipByValue(min=-1, max=1) + # 仅裁剪参数linear_0.w_0时: + # 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为ParamBase,返回类型为bool + # def fileter_func(ParamBase): + # # 可以通过ParamBase.name判断(name可以在fluid.ParamAttr中设置,默认为linear_0.w_0、linear_0.b_0) + # return ParamBase.name == "linear_0.w_0" + # # 注:linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差,可以此来判断 + # return ParamBase.name == linear.weight.name + # clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func) + sgd_optimizer = fluid.optimizer.SGD( + learning_rate=0.1, parameter_list=linear.parameters()) + sgd_optimizer.minimize(loss, grad_clip=clip) diff --git a/doc/fluid/api_cn/clip_cn/set_gradient_clip_cn.rst b/doc/fluid/api_cn/clip_cn/set_gradient_clip_cn.rst index f27853f781f873d188be77ac57e90a9f6352e297..013d0cce9d6638abdd2924243d35ab82ad90c89f 100644 --- a/doc/fluid/api_cn/clip_cn/set_gradient_clip_cn.rst +++ b/doc/fluid/api_cn/clip_cn/set_gradient_clip_cn.rst @@ -7,12 +7,17 @@ set_gradient_clip .. py:function:: paddle.fluid.clip.set_gradient_clip(clip, param_list=None, program=None) +.. warning:: + 此API对位置使用的要求较高,其必须位于组建网络之后, ``minimize`` 之前,因此在未来版本中可能被删除,故不推荐使用。推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。 + 有三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。 + 如果 ``set_gradient_clip(clip)`` 与 ``minimize(loss, grad_clip=clip)`` 被同时使用,``set_gradient_clip`` 将不会生效。 + 给指定参数做梯度裁剪。 参数: - - **clip** (BaseGradientClipAttr) - BaseGradientClipAttr子类的实例,如 :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 等,用于描述具体的裁剪方法和属性。 + - **clip** (GradientClipBase) - 梯度裁剪的策略,如 :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 等,用于描述具体的裁剪方法和属性。 - **param_list** (list(Variable),可选) - 需要裁剪的参数列表,可以是参数或参数名称列表。默认值为None,表示裁剪 ``program`` 中的所有参数。 - - **program** (Program,可选) - 参数所在的Program。默认值为None,表示使用 :ref:`cn_api_fluid_default_main_program`。 + - **program** (Program,可选) - 参数所在的Program。默认值为None,表示使用 :ref:`cn_api_fluid_default_main_program` 。 返回: 无。 @@ -59,3 +64,17 @@ set_gradient_clip param_list=[param_var1, param_var2]) sgd = fluid.optimizer.SGD(learning_rate=1e-3) sgd.minimize(loss) + + # network 4: use set_gradient_clip and minimize(grad_clip=clip) together + with fluid.program_guard(fluid.Program(), fluid.Program()): + loss = network() + param_var1 = fluid.default_main_program().global_block().var("fc1_param") + param_var2 = fluid.default_main_program().global_block().var("fc2_param") + clip1 = fluid.clip.GradientClipByValue(min=-1.0, max=1.0), param_list=[param_var1, param_var2]) + clip2 = fluid.clip.GradientClipByNorm(clip_norm=1.0), param_list=[param_var1, param_var2]) + # 设置梯度裁剪策略:clip1 + fluid.clip.set_gradient_clip(clip1) + sgd = fluid.optimizer.SGD(learning_rate=1e-3) + # 设置梯度裁剪策略:clip2 + sgd.minimize(loss, grad_clip=clip2) + # 有设置冲突时,set_gradient_clip将不会生效,将以clip2的策略进行梯度裁剪 diff --git a/doc/fluid/api_cn/dygraph_cn.rst b/doc/fluid/api_cn/dygraph_cn.rst index 19f286b5a83f021a910bcf8bf9bf6ce932b3703a..4725a68523038a0158d7aeea4788398ab18f1d19 100644 --- a/doc/fluid/api_cn/dygraph_cn.rst +++ b/doc/fluid/api_cn/dygraph_cn.rst @@ -19,6 +19,7 @@ fluid.dygraph dygraph_cn/Embedding_cn.rst dygraph_cn/ExponentialDecay_cn.rst dygraph_cn/FC_cn.rst + dygraph_cn/grad_cn.rst dygraph_cn/GroupNorm_cn.rst dygraph_cn/GRUUnit_cn.rst dygraph_cn/guard_cn.rst diff --git a/doc/fluid/api_cn/dygraph_cn/LayerNorm_cn.rst b/doc/fluid/api_cn/dygraph_cn/LayerNorm_cn.rst index 96ef964915564c591d08dc217a7fd588f883d4a3..7078bf066b4318877f1656f8565a0d7e405135e7 100644 --- a/doc/fluid/api_cn/dygraph_cn/LayerNorm_cn.rst +++ b/doc/fluid/api_cn/dygraph_cn/LayerNorm_cn.rst @@ -47,7 +47,7 @@ LayerNorm x = numpy.random.random((3, 32, 32)).astype('float32') with fluid.dygraph.guard(): x = to_variable(x) - layernorm = fluid.LayerNorm('LayerNorm', begin_norm_axis=1) - ret = layernorm(x) + layerNorm = fluid.LayerNorm([32, 32]) + ret = layerNorm(x) diff --git a/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst b/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst index fcc1bf6fc03b41cb7bbd1095fb754d9b4c115944..b13278ad10331eb088f370ac9f5acf6ae1de77d6 100644 --- a/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst +++ b/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst @@ -21,6 +21,100 @@ Layer的全名。组成方式为: ``name_scope`` + “/” + MyLayer.__class__ 返回类型:str +.. py:method:: register_forward_pre_hook(hook) + +为Layer注册一个 ``forward pre-hook`` 函数,该 ``hook`` 函数将会在 ``forward`` 函数调用之前被调用。 + +``hook`` 函数具有以下形式:它的 ``input`` 是 ``Layer`` 的 ``input`` ,并且可以返回一个元组或者单个修改值;如果返回单个修改值,则将值包装到一个元组中。用户可以使用该函数来查看或修改 ``Layer`` ``forward`` 函数的输入。 + +hook(Layer, input) -> None or modified input + +参数: + - **hook** (function) - 被注册为 ``forward pre-hook`` 的函数 + +返回:一个 ``HookRemoveHelper`` 类对象,可通过调用 ``hook_remove_helper.remove()`` 来删除注册的hook函数。 + +返回类型: ``HookRemoveHelper`` 类对象 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # forward_pre_hook函数修改了layer的输入:input = input * 2 + def forward_pre_hook(layer, input): + # 改变输入值 + input_return = (input[0] * 2) + return input_return + + with fluid.dygraph.guard(): + linear = fluid.Linear(13, 5, dtype="float32") + + # 注册hook + forward_pre_hook_handle = linear.register_forward_pre_hook(forward_pre_hook) + + value0 = np.arange(26).reshape(2, 13).astype("float32") + in0 = fluid.dygraph.to_variable(value0) + out0 = linear(in0) + + # 移除hook + forward_pre_hook_handle.remove() + + value1 = value0 * 2 + in1 = fluid.dygraph.to_variable(value1) + out1 = linear(in1) + + # hook改变了layer的输入(input = input * 2),所以out0等于out1 + assert (out0.numpy() == out1.numpy()).any() + +.. py:method:: register_forward_post_hook(hook) + +为Layer注册一个 ``forward post-hook`` 函数,该 ``hook`` 函数将会在 ``forward`` 函数调用之后被调用。 + +``hook`` 函数具有以下形式,它的 ``input`` 和 ``output`` 是 ``Layer`` 的 ``input`` 和 ``output`` 。用户可以用该函数来查看和修改 ``Layer`` ``forward`` 函数的输出。 + +hook(Layer, input, output) -> None or modified output + +参数: + - **hook** (function) - 被注册为 ``forward post-hook`` 的函数 + +返回:一个 ``HookRemoveHelper`` 类对象,可通过调用 ``hook_remove_helper.remove()`` 来删除注册的hook函数。 + +返回类型: ``HookRemoveHelper`` 类对象 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # forward_post_hook函数改变了layer的输出:output = output * 2 + def forward_post_hook(layer, input, output): + # 改变输出值 + return output * 2 + + with fluid.dygraph.guard(): + linear = fluid.Linear(13, 5, dtype="float32") + + # 注册hook + forward_post_hook_handle = linear.register_forward_post_hook(forward_post_hook) + + value1 = np.arange(26).reshape(2, 13).astype("float32") + in1 = fluid.dygraph.to_variable(value1) + + out0 = linear(in1) + + # remove the hook + forward_post_hook_handle.remove() + + out1 = linear(in1) + + # hook改变了layer的输出(output = output * 2),所以out0等于out1 * 2 + assert (out0.numpy() == (out1.numpy()) * 2).any() + .. py:method:: create_parameter(shape, attr=None, dtype="float32", is_bias=False, default_initializer=None) 为Layer创建参数。 diff --git a/doc/fluid/api_cn/dygraph_cn/NoamDecay_cn.rst b/doc/fluid/api_cn/dygraph_cn/NoamDecay_cn.rst index 1b58aefb9ce6788bdeca195cc3479647a56da846..9f94a3f5188939dd1cf281bb83032fffbd610092 100644 --- a/doc/fluid/api_cn/dygraph_cn/NoamDecay_cn.rst +++ b/doc/fluid/api_cn/dygraph_cn/NoamDecay_cn.rst @@ -5,7 +5,7 @@ NoamDecay **注意:该API仅支持【动态图】模式** -.. py:class:: paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32') +.. py:class:: paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32', learning_rate=1.0) 该接口提供Noam衰减学习率的功能。 @@ -13,7 +13,7 @@ Noam衰减的计算方式如下。 .. math:: - decayed\_learning\_rate = d_{model}^{-0.5} * min(global\_steps^{-0.5}, global\_steps * warmup\_steps^{-1.5}) + decayed\_learning\_rate = learning\_rate * d_{model}^{-0.5} * min(global\_steps^{-0.5}, global\_steps * warmup\_steps^{-1.5}) 关于Noam衰减的更多细节请参考 `attention is all you need `_ @@ -28,6 +28,7 @@ Noam衰减的计算方式如下。 - **begin** (int,可选) – 起始步。即以上运算式子中global_steps的初始值。默认值为0。 - **step** (int,可选) – 步大小。即以上运算式子中global_steps的递增值。默认值为1。 - **dtype** (str,可选) – 学习率值的数据类型,可以为"float32", "float64"。默认值为"float32"。 + - **learning_rate** (Variable|float|int,可选) - 初始学习率。如果类型为Variable,则为shape为[1]的Tensor,数据类型为float32或float64;也可以是python的int类型。默认值为1.0。 返回: 无 @@ -39,7 +40,9 @@ Noam衰减的计算方式如下。 warmup_steps = 100 learning_rate = 0.01 with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) optimizer = fluid.optimizer.SGD( learning_rate = fluid.dygraph.NoamDecay( 1/(warmup_steps *(learning_rate ** 2)), - warmup_steps) ) + warmup_steps), + parameter_list = emb.parameters()) diff --git a/doc/fluid/api_cn/dygraph_cn/PiecewiseDecay_cn.rst b/doc/fluid/api_cn/dygraph_cn/PiecewiseDecay_cn.rst index 74478aabded2640d36e461cfe7075bc4435dc137..c692e62da369dec38393368cdf5a6449f9cb7c0e 100644 --- a/doc/fluid/api_cn/dygraph_cn/PiecewiseDecay_cn.rst +++ b/doc/fluid/api_cn/dygraph_cn/PiecewiseDecay_cn.rst @@ -35,8 +35,10 @@ PiecewiseDecay boundaries = [10000, 20000] values = [1.0, 0.5, 0.1] with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding( [10, 10] ) optimizer = fluid.optimizer.SGD( - learning_rate=fluid.dygraph.PiecewiseDecay(boundaries, values, 0) ) + learning_rate=fluid.dygraph.PiecewiseDecay(boundaries, values, 0), + parameter_list = emb.parameters() ) diff --git a/doc/fluid/api_cn/dygraph_cn/grad_cn.rst b/doc/fluid/api_cn/dygraph_cn/grad_cn.rst new file mode 100644 index 0000000000000000000000000000000000000000..bc73f2e14a68bf2be3a68d43c24e39bdd0ad1337 --- /dev/null +++ b/doc/fluid/api_cn/dygraph_cn/grad_cn.rst @@ -0,0 +1,106 @@ +.. _cn_api_fluid_dygraph_grad: + +grad +------------------------------- + +**注意:该API仅支持【动态图】模式** + +.. py:method:: paddle.fluid.dygraph.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False, no_grad_vars=None, backward_strategy=None) + +对于每个 `inputs` ,计算所有 `outputs` 相对于其的梯度和。 + +参数: + - **outputs** (Variable|list(Variable)|tuple(Variable)) – 用于计算梯度的图的输出变量,或多个输出变量构成的list/tuple。 + - **inputs** (Variable|list(Variable)|tuple(Variable)) - 用于计算梯度的图的输入变量,或多个输入变量构成的list/tuple。该API的每个返回值对应每个 `inputs` 的梯度。 + - **grad_outputs** (Variable|list(Variable|None)|tuple(Variable|None), 可选) - `outputs` 变量梯度的初始值。若 `grad_outputs` 为None,则 `outputs` 梯度的初始值均为全1的Tensor。若 `grad_outputs` 不为None,它必须与 `outputs` 的长度相等,此时,若 `grad_outputs` 的第i个元素为None,则第i个 `outputs` 的梯度初始值为全1的Tensor;若 `grad_outputs` 的第i个元素为Variable,则第i个 `outputs` 的梯度初始值为 `grad_outputs` 的第i个元素。默认值为None。 + - **retain_graph** (bool, 可选) - 是否保留计算梯度的前向图。若值为True,则前向图会保留,用户可对同一张图求两次反向。若值为False,则前向图会释放。默认值为None,表示值与 `create_graph` 相等。 + - **create_graph** (bool, 可选) - 是否创建计算过程中的反向图。若值为True,则可支持计算高阶导数。若值为False,则计算过程中的反向图会释放。默认值为False。 + - **only_inputs** (bool, 可选) - 是否只计算 `inputs` 的梯度。若值为False,则图中所有叶节点变量的梯度均会计算,并进行累加。若值为True,则只会计算 `inputs` 的梯度。默认值为True。only_inputs=False功能正在开发中,目前尚不支持。 + - **allow_unused** (bool, 可选) - 决定当某些 `inputs` 变量不在计算图中时抛出错误还是返回None。若某些 `inputs` 变量不在计算图中(即它们的梯度为None),则当allowed_unused=False时会抛出错误,当allow_unused=True时会返回None作为这些变量的梯度。默认值为False。 + - **no_grad_vars** (Variable|list(Variable)|tuple(Variable)|set(Variable), 可选) - 指明不需要计算梯度的变量。默认值为None。 + - **backward_strategy** (BackwardStrategy, 可选) - 计算梯度的策略。详见 :ref:`cn_api_fluid_dygraph_BackwardStrategy` 。默认值为None。 + +返回: 变量构成的tuple,其长度等于 `inputs` 中的变量个数,且第i个返回的变量是所有 `outputs` 相对于第i个 `inputs` 的梯度之和。 + +返回类型: tuple + +**示例代码 1** + .. code-block:: python + + import paddle.fluid as fluid + + def test_dygraph_grad(create_graph): + with fluid.dygraph.guard(): + x = fluid.layers.ones(shape=[1], dtype='float32') + x.stop_gradient = False + y = x * x + + # Since y = x * x, dx = 2 * x + dx = fluid.dygraph.grad( + outputs=[y], + inputs=[x], + create_graph=create_graph, + retain_graph=True)[0] + + z = y + dx + + # If create_graph = False, the gradient of dx + # would not be backpropagated. Therefore, + # z = x * x + dx, and x.gradient() = 2 * x = 2.0 + + # If create_graph = True, the gradient of dx + # would be backpropagated. Therefore, + # z = x * x + dx = x * x + 2 * x, and + # x.gradient() = 2 * x + 2 = 4.0 + + z.backward() + return x.gradient() + + print(test_dygraph_grad(create_graph=False)) # [2.] + print(test_dygraph_grad(create_graph=True)) # [4.] + +**示例代码 2** + .. code-block:: python + + import paddle.fluid as fluid + + fluid.enable_dygraph() + + def test_dygraph_grad(grad_outputs=None): + x = fluid.layers.fill_constant(shape=[1], value=2.0, dtype='float32') + x.stop_gradient = False + + y1 = x * x + y2 = x * 3 + + # If grad_outputs=None, dy1 = [1], dy2 = [1]. + # If grad_outputs=[g1, g2], then: + # - dy1 = [1] if g1 is None else g1 + # - dy2 = [1] if g2 is None else g2 + + # Since y1 = x * x, dx = 2 * x * dy1. + # Since y2 = x * 3, dx = 3 * dy2. + # Therefore, the final result would be: + # dx = 2 * x * dy1 + 3 * dy2 = 4 * dy1 + 3 * dy2. + + dx = fluid.dygraph.grad( + outputs=[y1, y2], + inputs=[x], + grad_outputs=grad_outputs)[0] + + return dx.numpy() + + THREE = fluid.layers.fill_constant(shape=[1], value=3.0, dtype='float32') + FOUR = fluid.layers.fill_constant(shape=[1], value=4.0, dtype='float32') + + # dy1 = [1], dy2 = [1] + print(test_dygraph_grad(None)) # [7.] + + # dy1 = [1], dy2 = [4] + print(test_dygraph_grad([None, FOUR])) # [16.] + + # dy1 = [4], dy2 = [1] + print(test_dygraph_grad([FOUR, None])) # [19.] + + # dy1 = [3], dy2 = [4] + print(test_dygraph_grad([THREE, FOUR])) # [24.] \ No newline at end of file diff --git a/doc/fluid/api_cn/executor_cn/Executor_cn.rst b/doc/fluid/api_cn/executor_cn/Executor_cn.rst index f879d28d0aac0a6a14c3cda9494771d15198b5b1..1ba0417749cec8d89c8d5f77bf3e2717e9ee9f9a 100644 --- a/doc/fluid/api_cn/executor_cn/Executor_cn.rst +++ b/doc/fluid/api_cn/executor_cn/Executor_cn.rst @@ -96,6 +96,7 @@ Executor支持单GPU、多GPU以及CPU运行。在Executor构造时,需要传 - **scope** (Scope) – 该参数表示执行当前program所使用的作用域,用户可以为不同的program指定不同的作用域。默认值:fluid.global_scope()。 - **return_numpy** (bool) – 该参数表示是否将返回返回的计算结果(fetch list中指定的变量)转化为numpy;如果为False,则每个变量返回的类型为LoDTensor,否则返回变量的类型为numpy.ndarray。默认为:True。 - **use_program_cache** (bool) – 该参数表示是否对输入的Program进行缓存。如果该参数为True,在以下情况时,模型运行速度可能会更快:输入的program为 ``fluid.Program`` ,并且模型运行过程中,调用该接口的参数(program、 feed变量名和fetch_list变量)名始终不变。默认为:False。 + - **use_prune** (bool) – 该参数表示是否对输入的Program进行剪枝。如果该参数为True,输入的Program会在run之前根据 ``feed`` 和 ``fetch_list`` 进行剪枝,剪枝的逻辑是将产生 ``feed`` 的 ``Variable`` 和 ``Operator`` 以及不产生 ``fetch_list`` 的 ``Variable`` 和 ``Operator`` 进行裁剪。默认为:False,表示不进行剪枝。请注意,如果将 ``Optimizer.minimize()`` 方法返回的 ``tuple`` 传入 ``fetch_list`` 中,则 ``use_prune`` 会被重写为True,并且会开启剪枝。 返回:返回fetch_list中指定的变量值 diff --git a/doc/fluid/api_cn/fluid_cn/BuildStrategy_cn.rst b/doc/fluid/api_cn/fluid_cn/BuildStrategy_cn.rst index 98cb0e800f79a1c25f3d92248d3a26de2191de8d..bb8ae4d2a53bb93e96513f2fe618250a4a07fc5c 100644 --- a/doc/fluid/api_cn/fluid_cn/BuildStrategy_cn.rst +++ b/doc/fluid/api_cn/fluid_cn/BuildStrategy_cn.rst @@ -7,7 +7,7 @@ BuildStrategy .. py:class:: paddle.fluid.BuildStrategy -``BuildStrategy`` 使用户更方便地控制[ ``ParallelExecutor`` ](../fluid_cn.html\#parallelexecutor)中计算图的建造方法,可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。 +``BuildStrategy`` 使用户更方便地控制 :ref:`cn_api_fluid_ParallelExecutor` 中计算图的建造方法,可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。 **代码示例** @@ -68,6 +68,7 @@ bool类型。表明是否融合(fuse) broadcast ops。该选项指在Reduce模 **代码示例** .. code-block:: python + import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.fuse_broadcast_ops = True @@ -108,6 +109,7 @@ bool类型。表明是否融合(fuse) relu和depthwise_conv2d,节省GPU内存 import os import numpy as np + import paddle.fluid as fluid import paddle.fluid.compiler as compiler use_cuda = True diff --git a/doc/fluid/api_cn/fluid_cn/CompiledProgram_cn.rst b/doc/fluid/api_cn/fluid_cn/CompiledProgram_cn.rst index f9ec0995503393891741794489233c88df3f4d24..81e2aa6965a17bcdf678edb7e996a28d0ce67668 100644 --- a/doc/fluid/api_cn/fluid_cn/CompiledProgram_cn.rst +++ b/doc/fluid/api_cn/fluid_cn/CompiledProgram_cn.rst @@ -22,34 +22,29 @@ CompiledProgram根据 `build_strategy` 的配置将输入的Program或Graph进 .. code-block:: python import paddle.fluid as fluid - import paddle.fluid.compiler as compiler import numpy - import os - + place = fluid.CUDAPlace(0) # fluid.CPUPlace() exe = fluid.Executor(place) - - data = fluid.layers.data(name='X', shape=[1], dtype='float32') + + data = fluid.data(name='X', shape=[None, 1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) fluid.optimizer.SGD(learning_rate=0.01).minimize(loss) exe.run(fluid.default_startup_program()) - build_strategy = fluid.BuildStrategy() - build_strategy.fuse_all_optimizer_ops = True - compiled_prog = compiler.CompiledProgram( - fluid.default_main_program(), - build_strategy=build_strategy) - + compiled_prog = fluid.CompiledProgram( + fluid.default_main_program()) + x = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = exe.run(compiled_prog, - feed={"X": x}, - fetch_list=[loss.name]) + feed={"X": x}, + fetch_list=[loss.name]) .. py:method:: with_data_parallel(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None, places=None) -该接口用于将输入的Program或Graph进行转换,以便通过数据并行模式运行该模型。用户可以通过 `build_strategy` 和 `exec_strategy` 设置计算图构建和计算图执行过程中可以进行的一些优化,例如:将梯度聚合的AllReduce操作进行融合、指定计算图运行过程中使用的线程池大小等。**注意:如果在构建CompiledProgram和调用with_data_parallel时都指定了build_strategy,在CompiledProgram中的build_strategy会被复写,因此,如果是数据并行训练,建议在调用with_data_parallel接口是设置build_strategy**。 +该接口用于将输入的Program或Graph进行转换,以便通过数据并行模式运行该模型。用户可以通过 `build_strategy` 和 `exec_strategy` 设置计算图构建和计算图执行过程中可以进行的一些优化,例如:将梯度聚合的AllReduce操作进行融合、指定计算图运行过程中使用的线程池大小等。**注意:如果在构建CompiledProgram和调用with_data_parallel时都指定了build_strategy,在CompiledProgram中的build_strategy会被复写,因此,如果是数据并行训练,建议在调用with_data_parallel接口时设置build_strategy**。 参数: - **loss_name** (str) - 该参数为模型最后得到的损失变量的名字,**注意:如果是模型训练,必须设置loss_name,否则计算结果可能会有问题。** 默认为:None。 @@ -70,45 +65,47 @@ CompiledProgram根据 `build_strategy` 的配置将输入的Program或Graph进 **代码示例** .. code-block:: python - + import paddle.fluid as fluid - import paddle.fluid.compiler as compiler import numpy import os - + use_cuda = True place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() + parallel_places = [fluid.CUDAPlace(0), fluid.CUDAPlace(1)] if use_cuda else [fluid.CPUPlace()] * 2 + # 注意:如果你使用CPU运行程序,需要具体设置CPU_NUM, # 否则fluid会把逻辑核的所有数目设为CPU_NUM, # 在这种情况下,输入的batch size应大于CPU_NUM, # 否则程序会异常中断。 if not use_cuda: os.environ['CPU_NUM'] = str(2) - + exe = fluid.Executor(place) - - data = fluid.layers.data(name='X', shape=[1], dtype='float32') + + data = fluid.data(name='X', shape=[None, 1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) + test_program = fluid.default_main_program().clone(for_test=True) fluid.optimizer.SGD(learning_rate=0.01).minimize(loss) - + exe.run(fluid.default_startup_program()) - build_strategy = fluid.BuildStrategy() - build_strategy.fuse_all_reduce_ops = True - compiled_train_prog = compiler.CompiledProgram( - fluid.default_main_program()).with_data_parallel( - loss_name=loss.name, build_strategy=build_strategy) - # 注意:如果此处不设置share_vars_from=compiled_train_prog,测试过程中用的参数与训练使用的参数是不一致 - compiled_test_prog = compiler.CompiledProgram( - test_program).with_data_parallel( - share_vars_from=compiled_train_prog) + compiled_train_prog = fluid.CompiledProgram( + fluid.default_main_program()).with_data_parallel( + loss_name=loss.name, places=parallel_places) + # 注意:如果此处不设置share_vars_from=compiled_train_prog, + # 测试过程中用的参数与训练使用的参数是不一致 + compiled_test_prog = fluid.CompiledProgram( + test_program).with_data_parallel( + share_vars_from=compiled_train_prog, + places=parallel_places) train_data = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = exe.run(compiled_train_prog, - feed={"X": train_data}, - fetch_list=[loss.name]) + feed={"X": train_data}, + fetch_list=[loss.name]) test_data = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = exe.run(compiled_test_prog, - feed={"X": test_data}, - fetch_list=[loss.name]) \ No newline at end of file + feed={"X": test_data}, + fetch_list=[loss.name]) \ No newline at end of file diff --git a/doc/fluid/api_cn/fluid_cn/ExecutionStrategy_cn.rst b/doc/fluid/api_cn/fluid_cn/ExecutionStrategy_cn.rst index 4d6cc28fa05d6a14f74650d6078a98ba06fb9d5c..4e26c1e237aaf90c74cb8c6fed6d7a5bf4ff3ce0 100644 --- a/doc/fluid/api_cn/fluid_cn/ExecutionStrategy_cn.rst +++ b/doc/fluid/api_cn/fluid_cn/ExecutionStrategy_cn.rst @@ -33,7 +33,7 @@ ExecutionStrategy train_exe = fluid.ParallelExecutor(use_cuda=False, loss_name=avg_loss.name, - exec_strategy=exec_strategy) + exec_strategy=exec_strategy) .. py:attribute:: num_iteration_per_drop_scope diff --git a/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst b/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst index 43975efd6a3b0723ad7ce51d3399b3de99d8dc5e..3c8537e59e771306a820e3ef647503b2cd5c338e 100644 --- a/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst +++ b/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst @@ -5,7 +5,11 @@ ParamAttr ------------------------------- -.. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False) +.. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, do_model_average=False) + +.. note:: + 该类中的 ``gradient_clip`` 属性在2.0版本会废弃,推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。共有三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 + :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。 创建一个参数属性对象,用户可设置参数的名称、初始化方式、学习率、正则化规则、是否需要训练、梯度裁剪方式、是否做模型平均等属性。 @@ -13,9 +17,10 @@ ParamAttr - **name** (str,可选) - 参数的名称。默认值为None,表示框架自动创建参数的名称。 - **initializer** (Initializer,可选) - 参数的初始化方式。默认值为None,表示权重参数采用Xavier初始化方式,偏置参数采用全0初始化方式。 - **learning_rate** (float) - 参数的学习率。实际参数的学习率等于全局学习率乘以参数的学习率,再乘以learning rate schedule的系数。 - - **regularizer** (WeightDecayRegularizer,可选) - 正则化因子。默认值为None,表示没有正则化因子。 + - **regularizer** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 + :ref:`cn_api_fluid_regularizer_L2Decay` ,如果在 ``optimizer`` (例如 :ref:`cn_api_fluid_optimizer_SGDOptimizer` ) 中也 + 设置了正则化,``optimizer`` 中的正则化将被忽略。默认值为None,表示没有正则化。 - **trainable** (bool) - 参数是否需要训练。默认值为True,表示需要训练。 - - **gradient_clip** (BaseGradientClipAttr,可选) - 梯度裁剪方式。默认值为None,表示不需要梯度裁剪。 - **do_model_average** (bool) - 是否做模型平均。默认值为False,表示不做模型平均。 返回: 表示参数属性的对象。 diff --git a/doc/fluid/api_cn/fluid_cn/Program_cn.rst b/doc/fluid/api_cn/fluid_cn/Program_cn.rst index 3fc5c40939c615a2b15db8144ed28ff6c05e0e52..211dc91dcd72a069834d455b9972138ab48bb8f0 100644 --- a/doc/fluid/api_cn/fluid_cn/Program_cn.rst +++ b/doc/fluid/api_cn/fluid_cn/Program_cn.rst @@ -57,13 +57,12 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构 import paddle.fluid as fluid prog = fluid.default_main_program() - a = fluid.layers.data(name="X", shape=[2,3], dtype="float32", append_batch_size=False) - c = fluid.layers.fc(a, size=3) + x = fluid.layers.data(name="X", shape=[2,3], dtype="float32", append_batch_size=False) + pred = fluid.layers.fc(x, size=3) prog_string = prog.to_string(throw_on_error=True, with_details=False) prog_string_with_details = prog.to_string(throw_on_error=False, with_details=True) - print(prog_string) - print("\n =============== with_details =============== \n") - print(prog_string_with_details) + print("program string without detail: {}".format(prog_string)) + print("program string with detail: {}".format(prog_string_with_details)) .. py:method:: clone(for_test=False) @@ -82,16 +81,19 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构 **代码示例** - .. code-block:: python + :: - import paddle.fluid as fluid - ## 我们推荐在使用 Optimizer前使用clone()接口 - test_program = fluid.default_main_program().clone(for_test=True) - optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9) - optimizer.minimize() + import paddle.fluid as fluid + img = fluid.layers.data(name='image', shape=[784]) + pred = fluid.layers.fc(input=img, size=10, act='relu') + loss = fluid.layers.mean(pred) + ## 我们推荐在使用 Optimizer前使用clone()接口 + test_program = fluid.default_main_program().clone(for_test=True) + optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9) + optimizer.minimize(loss) 参数: - - **for_test** (bool) – 取值为True时,clone方法内部会把operator的属性 ``is_test`` 设置为 True, 并裁剪反向OP和参数优化OP + - **for_test** (bool) – 取值为True时,clone方法内部会把operator的属性 ``is_test`` 设置为 True, 并裁剪反向OP和参数优化OP,默认值为False 返回:当 ``for_test=True`` 时返回一个新的、仅包含当前Program前向内容的Program。否则返回一个新的,和当前Program完全相同的Program @@ -150,7 +152,7 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构 input=fluid.layers.fc(hidden, size=10, act='softmax'), label=fluid.layers.data(name='label', shape=[1], dtype='int64')) avg_loss = fluid.layers.mean(loss) - test_program = train_program.clone(for_test=False) + test_program = train_program.clone(for_test=True) print_prog(test_program) # 由于需要使训练和测试参数共享,我们需要使用训练的 ``startup_program`` @@ -182,7 +184,8 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构 for key, value in sorted(six.iteritems(op.all_attrs())): if key not in ['op_callstack', 'op_role_var']: print(" [ attrs: {}: {} ]".format(key, value)) - def network(is_test): + + def network(): img = fluid.layers.data(name='image', shape=[784]) hidden = fluid.layers.fc(input=img, size=200, act='relu') hidden = fluid.layers.dropout(hidden, dropout_prob=0.5) @@ -192,19 +195,19 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构 avg_loss = fluid.layers.mean(loss) return avg_loss - train_program_2 = fluid.Program() startup_program_2 = fluid.Program() test_program_2 = fluid.Program() with fluid.program_guard(train_program_2, startup_program_2): with fluid.unique_name.guard(): - sgd = fluid.optimizer.SGD(learning_rate=1e-3) - sgd.minimize(avg_loss) + avg_loss = network() + sgd = fluid.optimizer.SGD(learning_rate=1e-3) + sgd.minimize(avg_loss) # 不使用测试阶段的启动程序 - with fluid.program_guard(test_program_2, fluid.Program()): + with fluid.program_guard(test_program_2, startup_program_2): with fluid.unique_name.guard(): - loss = network(is_test=True) - print(test_program_2) + avg_loss = network() + print_prog(test_program_2) 上边两个代码片段生成和打印的Program是一样的。 @@ -268,24 +271,7 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构 .. py:attribute:: random_seed -**注意:必须在相关OP被添加之前设置。例如** - -**代码示例** - -.. code-block:: python - - import paddle.fluid as fluid - - prog = fluid.default_main_program() - random_seed = prog.random_seed - x_var = fluid.layers.data(name="X", shape=[3,3], dtype="float32", append_batch_size=False) - - # 这里我们必须要在fluid.layers.dropout之前设置random_seed - print(random_seed) - prog.random_seed = 1 - z_var = fluid.layers.dropout(x_var, 0.7) - - print(prog.random_seed) +**注意:必须在相关OP被添加之前设置。** 程序中随机运算符的默认随机种子。0意味着随机生成随机种子。 @@ -301,12 +287,16 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构 prog = fluid.default_main_program() random_seed = prog.random_seed + x_var = fluid.layers.data(name="X", shape=[3,3], dtype="float32", append_batch_size=False) print(random_seed) - prog.random_seed = 1 - print(prog.random_seed) - ## 0 ## 默认的random seed是 0 + + # 这里我们必须要在fluid.layers.dropout之前设置random_seed + prog.random_seed = 1 + z_var = fluid.layers.dropout(x_var, 0.7) + + print(prog.random_seed) ## 1 ## 修改后random seed变成了 1 diff --git a/doc/fluid/api_cn/fluid_cn/WeightNormParamAttr_cn.rst b/doc/fluid/api_cn/fluid_cn/WeightNormParamAttr_cn.rst index 88946bb7ae93ddbd01da63fc16769f53d16a8023..6249b87849a837b03d0e21d1f5b0a5299f745873 100644 --- a/doc/fluid/api_cn/fluid_cn/WeightNormParamAttr_cn.rst +++ b/doc/fluid/api_cn/fluid_cn/WeightNormParamAttr_cn.rst @@ -5,8 +5,11 @@ WeightNormParamAttr **注意:该API仅支持【静态图】模式** -.. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False) +.. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, do_model_average=False) +.. note:: + 该类中的 ``gradient_clip`` 属性在2.0版本会废弃,推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。共有三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 + :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。 该类定义了权重归一化(Weight Normalization)的参数。权重归一化可以将神经网络中权重向量的长度与其方向解耦,详细的定义与实现可以参考论文:`Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks `_ @@ -15,9 +18,10 @@ WeightNormParamAttr - **name** (None|str) - 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认为None。 - **initializer** (Initializer) - 初始化参数方法,例如 ``initializer = fluid.initializer.ConstantInitializer(1.0)`` 。默认为None,如果为None则使用默认初始化函数 `Xavier()` 。 - **learning_rate** (float32) - 学习率,优化过程 :math:`global\_lr∗parameter\_lr∗scheduler\_factor` 的学习速率,默认为1.0。 - - **regularizer** (WeightDecayRegularizer) - 正则化方法,例如 ``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)`` 。默认为None,如果为None则对权重不做正则化。 + - **regularizer** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 + :ref:`cn_api_fluid_regularizer_L2Decay` ,如果在 ``optimizer`` (例如 :ref:`cn_api_fluid_optimizer_SGDOptimizer` ) 中也 + 设置了正则化,``optimizer`` 中的正则化将被忽略。默认值为None,表示没有正则化。 - **trainable** (bool) - 可选,指明参数是否可训练,默认为True。 - - **gradient_clip** - 梯度裁剪(Gradient Clipping)的方法,例如 ``gradient_clip = fluid.clip.GradientClipByNorm(clip_norm=2.0))`` 。默认为None,如果为None则对权重不做裁剪。 - **do_model_average** (bool) - 可选,指明参数是否需要模型平均化操作(Model Average),默认为False。 @@ -36,7 +40,6 @@ WeightNormParamAttr learning_rate=1.0, regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1), trainable=True, - gradient_clip=fluid.clip.GradientClipByNorm(clip_norm=2.0), do_model_average=False)) diff --git a/doc/fluid/api_cn/fluid_cn/gradients_cn.rst b/doc/fluid/api_cn/fluid_cn/gradients_cn.rst index 7e2e7d4fd635de3332aea4f293951567dd66c79c..b38f2747aafb9d258a49eaca315afa03cc747d56 100644 --- a/doc/fluid/api_cn/fluid_cn/gradients_cn.rst +++ b/doc/fluid/api_cn/fluid_cn/gradients_cn.rst @@ -26,7 +26,7 @@ gradients import paddle.fluid as fluid - x = fluid.layers.data(name='x', shape=[2,8,8], dtype='float32') + x = fluid.data(name='x', shape=[None,2,8,8], dtype='float32') x.stop_gradient=False y = fluid.layers.conv2d(x, 4, 1, bias_attr=False) y = fluid.layers.relu(y) diff --git a/doc/fluid/api_cn/fluid_cn/program_guard_cn.rst b/doc/fluid/api_cn/fluid_cn/program_guard_cn.rst index 83c53aeab3e1791ca5f99fa1db623a24e465acdd..a7f6f9dba999214aa956bf2c3e6847837179e6e2 100644 --- a/doc/fluid/api_cn/fluid_cn/program_guard_cn.rst +++ b/doc/fluid/api_cn/fluid_cn/program_guard_cn.rst @@ -23,7 +23,7 @@ program_guard main_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(main_program, startup_program): - data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32') + data = fluid.data(name='image', shape=[None, 784, 784], dtype='float32') hidden = fluid.layers.fc(input=data, size=10, act='relu') 例如,当组的网不需要startup_program初始化各变量时,可以传入一个临时的program。 @@ -36,5 +36,5 @@ program_guard main_program = fluid.Program() # 如果您不需要关心startup program,传入一个临时值即可 with fluid.program_guard(main_program, fluid.Program()): - data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32') + data = fluid.data(name='image', shape=[None, 784, 784], dtype='float32') diff --git a/doc/fluid/api_cn/framework_cn.rst b/doc/fluid/api_cn/framework_cn.rst new file mode 100644 index 0000000000000000000000000000000000000000..9fe6f3fc8da3edfb79a7a4359477d3c8db3f1aa1 --- /dev/null +++ b/doc/fluid/api_cn/framework_cn.rst @@ -0,0 +1,13 @@ +======================= +paddle.framework +======================= + + + + +.. toctree:: + :maxdepth: 1 + + framework_cn/get_default_dtype.rst + framework_cn/manual_seed.rst + framework_cn/set_default_dtype.rst diff --git a/doc/fluid/api_cn/framework_cn/get_default_dtype.rst b/doc/fluid/api_cn/framework_cn/get_default_dtype.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/framework_cn/manual_seed.rst b/doc/fluid/api_cn/framework_cn/manual_seed.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/framework_cn/set_default_dtype.rst b/doc/fluid/api_cn/framework_cn/set_default_dtype.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/index_cn.rst b/doc/fluid/api_cn/index_cn.rst index 4e7245c7e01b34b2c53fb3d3e692e07952febc9c..8599fb58c60845168918446808735b2cec68951d 100644 --- a/doc/fluid/api_cn/index_cn.rst +++ b/doc/fluid/api_cn/index_cn.rst @@ -7,6 +7,7 @@ API Reference ../api_guides/index_cn.rst fluid_cn.rst + api_tree_cn.rst backward_cn.rst clip_cn.rst dataset_cn.rst diff --git a/doc/fluid/api_cn/io_cn/DataLoader_cn.rst b/doc/fluid/api_cn/io_cn/DataLoader_cn.rst index 53ff250e10939aaa1bde0066ab0d0b8b8a507c57..7c5a9c6181d1cc6bb47edf97981cc9b2ddce62c5 100755 --- a/doc/fluid/api_cn/io_cn/DataLoader_cn.rst +++ b/doc/fluid/api_cn/io_cn/DataLoader_cn.rst @@ -138,7 +138,10 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本 # ------------------------------------------------------- -.. py:method:: from_generator(feed_list=None, capacity=None, use_double_buffer=True, iterable=True, return_list=False, use_multiprocess=False) +.. py:method:: from_generator(feed_list=None, capacity=None, use_double_buffer=True, iterable=True, return_list=False, use_multiprocess=False, drop_last=True) + +.. note:: + 框架保证DataLoader的数据加载顺序与用户提供的数据源读取顺序一致。 创建一个DataLoader对象用于加载Python生成器产生的数据。数据会由Python线程预先读取,并异步送入一个队列中。 @@ -158,12 +161,13 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本 - **iterable** (bool) - 所创建的DataLoader对象是否可迭代。 - **return_list** (bool) - 每个设备上的数据是否以list形式返回。仅在iterable = True模式下有效。若return_list = False,每个设备上的返回数据均是str -> LoDTensor的映射表,其中映射表的key是每个输入变量的名称。若return_list = True,则每个设备上的返回数据均是list(LoDTensor)。推荐在静态图模式下使用return_list = False,在动态图模式下使用return_list = True。 - **use_multiprocess** (bool) - 设置是否是用多进程加速动态图的数据载入过程。注意:该参数的设置仅在动态图模式下有效, 在静态图模式下,该参数设置与否均无任何影响。默认值为False。 + - **drop_last** (bool): 是否丢弃最后的不足CPU/GPU设备数的批次。默认值为True。在网络训练时,用户不能设置drop_last=False,此时所有CPU/GPU设备均应从DataLoader中读取到数据。在网络预测时,用户可以设置drop_last=False,此时最后不足CPU/GPU设备数的批次可以进行预测。 返回: 被创建的DataLoader对象 返回类型: loader (DataLoader) -**代码示例** +**代码示例 1** .. code-block:: python @@ -297,6 +301,50 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本 assert relu.shape == [BATCH_SIZE, 784] +**代码示例 2** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + import os + + # We use 2 CPU cores to run inference network + os.environ['CPU_NUM'] = '2' + + # The data source has only 3 batches, which can not be + # divided evenly to each CPU core + def batch_generator(): + for i in range(3): + yield np.array([i+1]).astype('float32'), + + x = fluid.data(name='x', shape=[None], dtype='float32') + y = x * x + + def run_inference(drop_last): + loader = fluid.io.DataLoader.from_generator(feed_list=[x], + capacity=8, drop_last=drop_last) + loader.set_batch_generator(batch_generator, fluid.cpu_places()) + + exe = fluid.Executor(fluid.CPUPlace()) + prog = fluid.CompiledProgram(fluid.default_main_program()) + prog = prog.with_data_parallel() + + result = [] + for data in loader(): + each_ret, = exe.run(prog, feed=data, fetch_list=[y]) + result.extend(each_ret) + return result + + # Set drop_last to True, so that the last batch whose + # number is less than CPU core number would be discarded. + print(run_inference(drop_last=True)) # [1.0, 4.0] + + # Set drop_last to False, so that the last batch whose + # number is less than CPU core number can be tested. + print(run_inference(drop_last=False)) # [1.0, 4.0, 9.0] + + .. py:method:: from_dataset(dataset, places, drop_last=True) 创建一个DataLoader对象用于加载Dataset产生的数据。目前,Dataset仅支持Linux系统下使用。 diff --git a/doc/fluid/api_cn/layers_cn/LSTMCell_cn.rst b/doc/fluid/api_cn/layers_cn/LSTMCell_cn.rst index 09d21492757c836fe1e4799e586e6b6d7001a204..e91f9b78febbb47dc334e01b4516e0ce9dcc4b9e 100644 --- a/doc/fluid/api_cn/layers_cn/LSTMCell_cn.rst +++ b/doc/fluid/api_cn/layers_cn/LSTMCell_cn.rst @@ -38,7 +38,7 @@ LSTMCell .. code-block:: python import paddle.fluid.layers as layers - cell = layers.rnn.LSTMCell(hidden_size=256) + cell = layers.LSTMCell(hidden_size=256) .. py:method:: call(inputs, states) @@ -61,4 +61,4 @@ LSTMCell的 :code:`state_shape` 是一个具有两个形状的列表::math:`[[ 返回:LSTMCell的 :code:`state_shape` -返回类型:list \ No newline at end of file +返回类型:list diff --git a/doc/fluid/api_cn/layers_cn/add_position_encoding_cn.rst b/doc/fluid/api_cn/layers_cn/add_position_encoding_cn.rst index 855acbfb1695606c7cd75156d9806316b69e918f..dc5bfc5610ca293bcbb134769786108c8da126cb 100644 --- a/doc/fluid/api_cn/layers_cn/add_position_encoding_cn.rst +++ b/doc/fluid/api_cn/layers_cn/add_position_encoding_cn.rst @@ -34,14 +34,13 @@ add_position_encoding .. code-block:: python - import paddle.fluid as fluid - - tensor = fluid.layers.data( + import paddle.fluid as fluid + + tensor = fluid.data( name='tensor', - shape=[32, 64, 512], - dtype='float32', - append_batch_size=False) - position_tensor = fluid.layers.add_position_encoding( + shape=[None, 64, 512], + dtype='float32') + position_tensor = fluid.layers.add_position_encoding( input=tensor, alpha=1.0, beta=1.0) @@ -53,4 +52,3 @@ add_position_encoding - diff --git a/doc/fluid/api_cn/layers_cn/argsort_cn.rst b/doc/fluid/api_cn/layers_cn/argsort_cn.rst index cc35109750567a42d5069f3d6d120ce8410b3066..b7a2c937c4285395146d0c0b42a4bbb6f1a77f24 100644 --- a/doc/fluid/api_cn/layers_cn/argsort_cn.rst +++ b/doc/fluid/api_cn/layers_cn/argsort_cn.rst @@ -9,7 +9,7 @@ argsort 参数: - - **input** (Variable) - 输入的多维 ``Tensor`` ,支持的数据类型:float32、float64。 + - **input** (Variable) - 输入的多维 ``Tensor`` ,支持的数据类型:float32、float64、int16、int32、int64、uint8。 - **axis** (int,可选) - 指定对输入Tensor进行运算的轴, ``axis`` 的有效范围是[-R, R),R是输入 ``x`` 的Rank, ``axis`` 为负时与 ``axis`` +R 等价。默认值为0。 - **descending** (bool,可选) - 指定算法排序的方向。如果设置为True,算法按照降序排序。如果设置为False或者不设置,按照升序排序。默认值为False。 - **name** (str,可选) – 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为None。 diff --git a/doc/fluid/api_cn/layers_cn/create_global_var_cn.rst b/doc/fluid/api_cn/layers_cn/create_global_var_cn.rst index 89d53c61fd91cc22e63cf5267fcb088bafd57414..46d2c5b9bd5b69b189eaab47fd071297ee486c04 100644 --- a/doc/fluid/api_cn/layers_cn/create_global_var_cn.rst +++ b/doc/fluid/api_cn/layers_cn/create_global_var_cn.rst @@ -26,7 +26,7 @@ create_global_var import paddle.fluid as fluid import paddle.fluid.layers as layers var = layers.create_global_var(shape=[2,3], value=1.0, dtype='float32', - persistable=True, force_cpu=True, name='new_var') + persistable=True, force_cpu=True, name='new_var') diff --git a/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst b/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst index 7df10ca633746d75d105b2c9692b083d200018be..1d9eed48056e697ef762c43069f8eae9f12118b5 100644 --- a/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst +++ b/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst @@ -5,17 +5,18 @@ ctc_greedy_decoder .. py:function:: paddle.fluid.layers.ctc_greedy_decoder(input, blank, name=None) -**注意:该OP的输入input必须是2维LoDTensor, lod_level为1** 该OP用于贪婪策略解码序列,步骤如下: 1. 获取输入中的每一行的最大值索引,也就是numpy.argmax(input, axis=0)。 2. 对于step1结果中的每个序列,合并两个空格之间的重复部分并删除所有空格。 +该API支持两种输入,LoDTensor和Tensor输入,不同输入的代码样例如下: **样例**: :: + # for lod tensor input 已知: input.data = [[0.6, 0.1, 0.3, 0.1], @@ -45,13 +46,38 @@ ctc_greedy_decoder output.lod = [[2, 1]] + # for tensor input + input.data = [[[0.6, 0.1, 0.3, 0.1], + [0.3, 0.2, 0.4, 0.1], + [0.1, 0.5, 0.1, 0.3], + [0.5, 0.1, 0.3, 0.1]], + + [[0.5, 0.1, 0.3, 0.1], + [0.2, 0.2, 0.2, 0.4], + [0.2, 0.2, 0.1, 0.5], + [0.5, 0.1, 0.3, 0.1]]] + + input_length.data = [[4], [4]] + input.shape = [2, 4, 4] + + step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get: + [[0], [2], [1], [0]], for input.data[4:8] is [[0], [3], [3], [0]], shape is [2,4,1] + step2: Change the argmax result to use padding mode, then argmax result is + [[0, 2, 1, 0], [0, 3, 3, 0]], shape is [2, 4], lod is [], input_length is [[4], [4]] + step3: Apply ctc_align to padding argmax result, padding_value is 0 + + Finally: + output.data = [[2, 1, 0, 0], + [3, 0, 0, 0]] + output_length.data = [[2], [1]] + 参数: - - **input** (Variable) — 变长序列的概率,2维LoDTensor, lod_level为1。它的形状是[Lp, num_classes + 1],其中Lp是所有输入序列长度的和,num_classes是类别数目(不包括空白标签)。数据类型是float32或者float64 + - **input** (Variable) — 变长序列的概率, 在输入为LoDTensor情况下,它是具有LoD信息的二维LoDTensor。 形状为[Lp,num_classes +1],其中Lp是所有输入序列的长度之和,num_classes是真实的类数。 在输入为Tensor情况下,它是带有填充的3-D张量,其形状为[batch_size,N,num_classes +1]。 (不包括空白标签)。 数据类型可以是float32或float64。 - **blank** (int) — Connectionist Temporal Classification (CTC) loss空白标签索引, 其数值属于半开区间[0,num_classes + 1) - **name** (str) — (str|None,可选) – 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None -返回: CTC贪婪解码结果是一个形为(Lp,1)的2维LoDTensor,lod_level为1,其中Lp是所有输出序列的长度之和。如果结果中的所有序列都为空,则输出LoDTensor为[-1],其lod信息为空。 +返回:对于输入为LoDTensor的情况,返回CTC贪婪解码器的结果,即2-D LoDTensor,形状为[Lp,1],数据类型为int64。 “ Lp”是所有输出序列长度的总和。 如果结果中的所有序列均为空,则结果LoDTensor将为[-1],其中LoD为[[]]。对于输入为Tensor的情况,返回一个元组,(output, output_length), 其中,output是一个形状为 [batch_size, N],类型为int64的Tensor。output_length是一个形状为[batch_size, 1],类型为int64的Tensor,表示Tensor输入下,每个输出序列的长度。 返回类型: Variable @@ -60,9 +86,15 @@ ctc_greedy_decoder .. code-block:: python + # for lod mode import paddle.fluid as fluid - x = fluid.layers.data(name='x', shape=[8], dtype='float32') + x = fluid.data(name='x', shape=[None, 8], dtype='float32', lod_level=1) cost = fluid.layers.ctc_greedy_decoder(input=x, blank=0) + # for padding mode + x_pad = fluid.data(name='x_pad', shape=[10, 4, 8], dtype='float32') + x_pad_len = fluid.data(name='x_pad_len', shape=[10, 1], dtype='int64') + out, out_len = fluid.layers.ctc_greedy_decoder(input=x_pad, blank=0, + input_length=x_pad_len) diff --git a/doc/fluid/api_cn/layers_cn/inplace_abn_cn.rst b/doc/fluid/api_cn/layers_cn/inplace_abn_cn.rst new file mode 100755 index 0000000000000000000000000000000000000000..11077c5b78fe34fb0387a9d2dcb7bfd3f73c20b0 --- /dev/null +++ b/doc/fluid/api_cn/layers_cn/inplace_abn_cn.rst @@ -0,0 +1,43 @@ +.. _cn_api_fluid_layers_inplace_abn: + +inplace_abn +------------------------------- + +**注意:该API仅支持【静态图】模式** + +.. py:function:: paddle.fluid.layers.inplace_abn(input, act=None, is_test=False, momentum=0.9, epsilon=1e-05, param_attr=None, bias_attr=None, data_layout='NCHW', name=None, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False, use_global_stats=False, act_alpha=1.0) + +就地批正则化化激活层(Inplace Activation Batch Normalization Layer) + +此层使用就地内存计算批处理正则化和激活来实现节省内存,有关批量正则化计算,请参见 ``fluid.layers.batch_norm`` ,有关就地激活批正则化化的计算,请参考 `In-Place Activated BatchNorm for Memory-Optimized Training of DNNs `_。 + +参数: + - **input** (Variable) - inplace_abn算子的输入特征,是一个Variable类型,输入维度可以是 2, 3, 4, 5。数据类型:flaot16, float32, float64。 + - **act** (string)- 激活函数类型,可以是leaky_realu、relu、prelu等。默认:None。 + - **is_test** (bool) - 指示它是否在测试阶段,非训练阶段使用训练过程中统计到的全局均值和全局方差。默认:False。 + - **momentum** (float|Variable)- 此值用于计算 moving_mean 和 moving_var,是一个float类型或者一个shape为[1],数据类型为float32的Variable类型。更新公式为: :math:`moving\_mean = moving\_mean * momentum + new\_mean * (1. - momentum)` , :math:`moving\_var = moving\_var * momentum + new\_var * (1. - momentum)` , 默认:0.9。 + - **epsilon** (float)- 加在分母上为了数值稳定的值。默认:1e-5。 + - **param_attr** (ParamAttr|None) :指定权重参数属性的对象。默认值为None,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。inplace_abn算子默认的权重初始化是1.0。 + - **bias_attr** (ParamAttr|None)- 指定偏置参数属性的对象。默认值为None,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。inplace_abn算子默认的偏置初始化是0.0。 + - **data_layout** (string) - 指定输入的数据格式,输出的数据格式将与输入保持一致,可以是"NCHW"和"NHWC"。N是批尺寸,C是通道数,H是特征高度,W是特征宽度。默认值:"NCHW"。 + - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ,一般无需设置,默认值为None。 + - **moving_mean_name** (string)- moving_mean的名称,存储全局均值。如果将其设置为None, ``inplace_abn`` 将随机命名全局均值;否则, ``inplace_abn`` 将命名全局均值为 ``moving_mean_name`` 。默认:None。 + - **moving_variance_name** (string)- moving_variance的名称,存储全局变量。如果将其设置为None, ``inplace_abn`` 将随机命名全局方差;否则, ``inplace_abn`` 将命名全局方差为 ``moving_variance_name`` 。默认:None。 + - **do_model_average_for_mean_and_var** (bool,默认False)- 是否为mean和variance做模型均值。 + - **use_global_stats** (bool) – 是否使用全局均值和方差。 在预测或测试模式下,将use_global_stats设置为true或将is_test设置为true,并且行为是等效的。 在训练模式中,当设置use_global_stats为True时,在训练期间也使用全局均值和方差。默认:False。 + - **act_alpha** (float) – 当 ``act`` 参数为None、leaky-relu、elu时,会使用就地批正则化激活算法,可通过此参数给定leaky-relu、elu的 ``alpha`` 值。默认:1.0。 + + +返回: 维度和输入相同的Tensor,在输入中运用批正则后的结果。 + +返回类型:Variable + +**代码示例**: + +.. code-block:: python + + import paddle.fluid as fluid + x = fluid.data(name='x', shape=[3, 7, 3, 7], dtype='float32') + hidden1 = fluid.layers.fc(input=x, size=200, param_attr='fc1.w') + hidden2 = fluid.layers.inplace_abn(input=hidden1) + hidden3 = fluid.layers.inplace_abn(input=hidden2, act='leaky_relu', act_alpha=0.2) diff --git a/doc/fluid/api_cn/layers_cn/lstm_cn.rst b/doc/fluid/api_cn/layers_cn/lstm_cn.rst index 206d8227ff84ee97ad1c6f914b4f71e69dca2e68..87ff8fdce4fc62b31465fab912df840a6f9a78de 100644 --- a/doc/fluid/api_cn/layers_cn/lstm_cn.rst +++ b/doc/fluid/api_cn/layers_cn/lstm_cn.rst @@ -57,7 +57,7 @@ lstm 返回: 经过lstm运算输出的三个Tensor的tuple,包括 -- rnn_out:LSTM hidden的输出结果的Tensor,数据类型与input一致,维度为 :math:`[seq\_len, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True,则维度为 :math:`[seq\_len, batch\_size, hidden\_size*2]` +- rnn_out:LSTM hidden的输出结果的Tensor,数据类型与input一致,维度为 :math:`[batch\_size, seq\_len, hidden\_size]` 。如果 ``is_bidirec`` 设置为True,则维度为 :math:`[batch\_size, seq\_len, hidden\_size*2]` - last_h:LSTM最后一步的hidden状态的Tensor,数据类型与input一致,维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True,则维度为 :math:`[num\_layers*2, batch\_size, hidden\_size]` - last_c:LSTM最后一步的cell状态的Tensor,数据类型与input一致,维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True,则维度为 :math:`[num\_layers*2, batch\_size, hidden\_size]` @@ -73,12 +73,11 @@ lstm emb_dim = 256 vocab_size = 10000 data = fluid.layers.data(name='x', shape=[-1, 100, 1], - dtype='int32') + dtype='int64') emb = fluid.layers.embedding(input=data, size=[vocab_size, emb_dim], is_sparse=True) batch_size = 20 max_len = 100 dropout_prob = 0.2 - seq_len = 100 hidden_size = 150 num_layers = 1 init_h = layers.fill_constant( [num_layers, batch_size, hidden_size], 'float32', 0.0 ) @@ -87,7 +86,7 @@ lstm rnn_out, last_h, last_c = layers.lstm(emb, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=dropout_prob) rnn_out.shape # (-1, 100, 150) last_h.shape # (1, 20, 150) - layt_c.shape # (1, 20, 150) + last_c.shape # (1, 20, 150) diff --git a/doc/fluid/api_cn/layers_cn/noam_decay_cn.rst b/doc/fluid/api_cn/layers_cn/noam_decay_cn.rst index 4769b6dd7192b523fad7528a6bc0fe30773a2991..9c5717bddd27a1022e3e6715e9e7258ef70f8db7 100644 --- a/doc/fluid/api_cn/layers_cn/noam_decay_cn.rst +++ b/doc/fluid/api_cn/layers_cn/noam_decay_cn.rst @@ -3,7 +3,7 @@ noam_decay ------------------------------- -.. py:function:: paddle.fluid.layers.noam_decay(d_model,warmup_steps) +.. py:function:: paddle.fluid.layers.noam_decay(d_model, warmup_steps) Noam衰减方法 @@ -14,11 +14,12 @@ noam衰减的numpy实现如下: import paddle.fluid as fluid import numpy as np # 设置超参数 + base_lr = 0.01 d_model = 2 current_steps = 20 warmup_steps = 200 # 计算 - lr_value = np.power(d_model, -0.5) * np.min([ + lr_value = base_lr * np.power(d_model, -0.5) * np.min([ np.power(current_steps, -0.5), np.power(warmup_steps, -1.5) * current_steps]) @@ -27,6 +28,7 @@ noam衰减的numpy实现如下: 参数: - **d_model** (Variable|int) - 模型的输入、输出向量特征维度。类型可设置为标量Tensor,或int值。 - **warmup_steps** (Variable|int) - 预热步数,类型可设置为标量Tensor,或int值。 + - **learning_rate** (Variable|float|int,可选) - 初始学习率。如果类型为Variable,则为shape为[1]的Tensor,数据类型为float32或float64;也可以是python的int类型。默认值为1.0。 返回:衰减的学习率 @@ -41,7 +43,8 @@ noam衰减的numpy实现如下: learning_rate = 0.01 lr = fluid.layers.learning_rate_scheduler.noam_decay( 1/(warmup_steps *(learning_rate ** 2)), - warmup_steps) + warmup_steps, + learning_rate) diff --git a/doc/fluid/api_cn/layers_cn/pad2d_cn.rst b/doc/fluid/api_cn/layers_cn/pad2d_cn.rst index 9f2cb0673b10e867239dc68763eb396a0318ebbd..364d3a11bf9fd6fa360831588d62fc14f141fe3d 100644 --- a/doc/fluid/api_cn/layers_cn/pad2d_cn.rst +++ b/doc/fluid/api_cn/layers_cn/pad2d_cn.rst @@ -19,36 +19,34 @@ pad2d 返回类型:Variable -示例: +**示例**: .. code-block:: text - 假设X是输入图像: + Input = [[[[1., 2., 3.], + [4., 5., 6.]]]] - X = [[1, 2, 3], - [4, 5, 6]] + Case 0: + paddings = [0, 1, 2, 3], + mode = 'constant' + pad_value = 0 + Out = [[[[0., 0., 1., 2., 3., 0., 0., 0.], + [0., 0., 4., 5., 6., 0., 0., 0.], + [0., 0., 0., 0., 0., 0., 0., 0.]]]] - Case 0: - paddings = [0, 1, 2, 3], - mode = 'constant' - pad_value = 0 - Out = [[0, 0, 1, 2, 3, 0, 0, 0] - [0, 0, 4, 5, 6, 0, 0, 0] - [0, 0, 0, 0, 0, 0, 0, 0]] + Case 1: + paddings = [0, 1, 2, 1], + mode = 'reflect' + Out = [[[[3., 2., 1., 2., 3., 2.], + [6., 5., 4., 5., 6., 5.], + [3., 2., 1., 2., 3., 2.]]]] - Case 1: - paddings = [0, 1, 2, 1], - mode = 'reflect' - Out = [[3, 2, 1, 2, 3, 2] - [6, 5, 4, 5, 6, 5] - [3, 2, 1, 2, 3, 2]] - - Case 2: - paddings = [0, 1, 2, 1], - mode = 'edge' - Out = [[1, 1, 1, 2, 3, 3] - [4, 4, 4, 5, 6, 6] - [4, 4, 4, 5, 6, 6]] + Case 2: + paddings = [0, 1, 2, 1], + mode = 'edge' + Out = [[[[1., 1., 1., 2., 3., 3.], + [4., 4., 4., 5., 6., 6.], + [4., 4., 4., 5., 6., 6.]]]] @@ -56,8 +54,6 @@ pad2d .. code-block:: python - import paddle.fluid as fluid - data = fluid.layers.data(name='data', shape=[3, 32, 32], dtype='float32') - result = fluid.layers.pad2d(input=data, paddings=[1,2,3,4], mode='reflect') - - + import paddle.fluid as fluid + data = fluid.data(name='data', shape=[None, 3, 32, 32], dtype='float32') + result = fluid.layers.pad2d(input=data, paddings=[0, 1, 2, 3], mode='reflect') diff --git a/doc/fluid/api_cn/layers_cn/pad_cn.rst b/doc/fluid/api_cn/layers_cn/pad_cn.rst index 04ff8cd0fc6b0e103dcabeeed99336e552ca5f9f..38acffa6479cc27ac082d616f75e867d7c81bb8d 100644 --- a/doc/fluid/api_cn/layers_cn/pad_cn.rst +++ b/doc/fluid/api_cn/layers_cn/pad_cn.rst @@ -8,23 +8,21 @@ pad 该OP在Tensor上填充一个由 ``pad_value`` 给出的常数值,填充宽度由 ``paddings`` 指定。 其中,维度 ``i`` 中 ``x`` 内容前填充的值个数用 ``paddings[2*i]`` 表示,维度 ``i`` 中 ``x`` 内容后填充的值个数用 ``paddings[2*i+1]`` 表示。 -**样例**: +**示例**: -:: +.. code-block:: text Given: + x = [[1, 2], [3, 4]] - x = [[1, 2], [3, 4]] + paddings = [0, 1, 1, 2] - paddings = [0, 1, 1, 2] - - pad_value = 0 + pad_value = 0 Return: - - out = [[0, 1, 2, 0, 0] - [0, 3, 4, 0, 0] - [0, 0, 0, 0, 0]] + out = [[0, 1, 2, 0, 0] + [0, 3, 4, 0, 0] + [0, 0, 0, 0, 0]] 参数: @@ -44,15 +42,7 @@ pad # x 为一个秩为2的张量 import paddle.fluid as fluid - x = fluid.layers.data(name='data', shape=[224], dtype='float32') + x = fluid.data(name='data', shape=[300, 300], dtype='float32') out = fluid.layers.pad(x=x, paddings=[0, 1, 1, 2], pad_value=0.) - - - - - - - - diff --git a/doc/fluid/api_cn/layers_cn/pad_constant_like_cn.rst b/doc/fluid/api_cn/layers_cn/pad_constant_like_cn.rst index 3172afab76e7bd0343978a7f999a97fd9c89009a..de0b701ad6c00e0f2c7f485a8c0a5c17609c8a73 100644 --- a/doc/fluid/api_cn/layers_cn/pad_constant_like_cn.rst +++ b/doc/fluid/api_cn/layers_cn/pad_constant_like_cn.rst @@ -7,9 +7,9 @@ pad_constant_like 该OP使用 ``pad_value`` 填充 ``y`` ,填充到每个维度值的数量由x和y的形状而指定,((0,x.shape[0] - y.shape[0]), ..., (0, x.shape[i] - y.shape[i]), ..., (0, x.shape[n] - y.shape[n]))是每个维度填充的宽度,对于维度i,填充宽度 ``(0, x.shape[i] - y.shape[i])`` ,表示在y的第i维开头不填充,而在末尾填充 ``x.shape[i] - y.shape[i]`` 个位置。该OP要求y与x具有相同的秩,并且对每个维度i, ``y.shape[i] <= x.shape[i]`` 。 -**样例** +**示例**: -:: +.. code-block:: text Given: X = [[[[ 0, 1, 2], @@ -24,30 +24,34 @@ pad_constant_like [27, 28, 29]], [[30, 31, 32], [33, 34, 35]]]] + X.shape = (2, 3, 2, 3) Y = [[[[35, 36, 37]], [[38, 39, 40]], [[41, 42, 43]]]] + Y.shape = (1, 3, 1, 3) - and + And pad_value = 0. - Output is: - out = [[[[35, 36, 37], - [0, 0, 0]], + Return: + Out = [[[[35, 36, 37], + [ 0, 0, 0]], [[38, 39, 40], - [0, 0, 0]], + [ 0, 0, 0]], [[41, 42, 43], - [0, 0, 0]]], - [[[0, 0, 0], - [0, 0, 0]], - [[0, 0, 0], - [0, 0, 0]], - [[0, 0, 0], - [0, 0, 0]]]] - out.shape = [2, 3, 2, 3] + [ 0, 0, 0]]], + [[[ 0, 0, 0], + [ 0, 0, 0]], + [[ 0, 0, 0], + [ 0, 0, 0]], + [[ 0, 0, 0], + [ 0, 0, 0]]]] + + Out.shape = [2, 3, 2, 3] + 参数: - **x** (Variable)- 多维Tensor @@ -66,8 +70,8 @@ pad_constant_like # x是秩为4的tensor, x.shape = (2, 3, 2, 3) # y是秩为4的tensor, y.shape = (1, 3, 1, 3) import paddle.fluid as fluid - x = fluid.layers.data(name='x', shape=[2,3,2,3], dtype='float32') - y = fluid.layers.data(name='y', shape=[1,3,1,3], dtype='float32') + x = fluid.data(name='x', shape=[2,3,2,3], dtype='float32') + y = fluid.data(name='y', shape=[1,3,1,3], dtype='float32') out = fluid.layers.pad_constant_like(x=x, y=y, pad_value=0.) # out是秩为4的tensor, out.shape = [2, 3 ,2 , 3] diff --git a/doc/fluid/api_cn/layers_cn/reshape_cn.rst b/doc/fluid/api_cn/layers_cn/reshape_cn.rst index c0c8f256e2c76a2a4ea550dcb884bbc1e3832a1e..58e204e30aef694cfa6f98d843fcb516083ca2eb 100644 --- a/doc/fluid/api_cn/layers_cn/reshape_cn.rst +++ b/doc/fluid/api_cn/layers_cn/reshape_cn.rst @@ -54,10 +54,10 @@ reshape # example 1: # attr shape is a list which doesn't contain tensor Variable. - data_1 = fluid.layers.data( - name='data_1', shape=[2, 4, 6], dtype='float32') + data_1 = fluid.data( + name='data_1', shape=[2, 4, 6], dtype='float32') reshaped_1 = fluid.layers.reshape( - x=data_1, shape=[-1, 0, 3, 2], inplace=True) + x=data_1, shape=[-1, 0, 3, 2], inplace=True) # the shape of reshaped_1 is [2,4,3,2]. # example 2: @@ -69,7 +69,7 @@ reshape # example 3: data_3 = fluid.data( - name="data_3", shape=[2,4,6], dtype='float32') + name="data_3", shape=[2,4,6], dtype='float32') reshaped_3 = fluid.layers.reshape(x=data_3, shape=[6,8]) # the shape of reshaped_3 is [6,8]. diff --git a/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst b/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst index badd3bc07451b013f7130178ef69b680a5e4763a..205f7ccdb18be18f886c04922af083516c203f76 100644 --- a/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst +++ b/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst @@ -34,27 +34,27 @@ retinanet_detection_output import paddle.fluid as fluid - bboxes_low = fluid.data(name='bboxes_low', shape=[1, 44, 4], - dtype='float32') - bboxes_high = fluid.data(name='bboxes_high', shape=[1, 11, 4], - dtype='float32') - scores_low = fluid.data(name='scores_low', shape=[1, 44, 10], - dtype='float32') - scores_high = fluid.data(name='scores_high', shape=[1, 11, 10], - dtype='float32') - anchors_low = fluid.data(name='anchors_low', shape=[44, 4], - dtype='float32') - anchors_high = fluid.data(name='anchors_high', shape=[11, 4], - dtype='float32') - im_info = fluid.data(name="im_info", shape=[1, 3], - dtype='float32') + bboxes_low = fluid.data( + name='bboxes_low', shape=[1, 44, 4], dtype='float32') + bboxes_high = fluid.data( + name='bboxes_high', shape=[1, 11, 4], dtype='float32') + scores_low = fluid.data( + name='scores_low', shape=[1, 44, 10], dtype='float32') + scores_high = fluid.data( + name='scores_high', shape=[1, 11, 10], dtype='float32') + anchors_low = fluid.data( + name='anchors_low', shape=[44, 4], dtype='float32') + anchors_high = fluid.data( + name='anchors_high', shape=[11, 4], dtype='float32') + im_info = fluid.data( + name="im_info", shape=[1, 3], dtype='float32') nmsed_outs = fluid.layers.retinanet_detection_output( - bboxes=[bboxes_low, bboxes_high], - scores=[scores_low, scores_high], - anchors=[anchors_low, anchors_high], - im_info=im_info, - score_threshold=0.05, - nms_top_k=1000, - keep_top_k=100, - nms_threshold=0.45, - nms_eta=1.) + bboxes=[bboxes_low, bboxes_high], + scores=[scores_low, scores_high], + anchors=[anchors_low, anchors_high], + im_info=im_info, + score_threshold=0.05, + nms_top_k=1000, + keep_top_k=100, + nms_threshold=0.45, + nms_eta=1.0) diff --git a/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst b/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst index cf098627d078fa244f96d6ee2a060e7d241c9fcd..a3c136fefdd18796f8fa1b9797c5b7b78230c843 100644 --- a/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst +++ b/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst @@ -50,7 +50,6 @@ retinanet_target_assign .. code-block:: python import paddle.fluid as fluid - import numpy as np bbox_pred = fluid.data(name='bbox_pred', shape=[1, 100, 4], dtype='float32') diff --git a/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst b/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst index b5e33e3b7385808093a1359c9e36b6ed8453c65e..0dd282715846728bfb7d30cb2caccf8d92c186a8 100644 --- a/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst +++ b/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst @@ -3,7 +3,7 @@ sigmoid_focal_loss ------------------------------- -.. py:function:: paddle.fluid.layers.sigmoid_focal_loss(x, label, fg_num, gamma=2, alpha=0.25) +.. py:function:: paddle.fluid.layers.sigmoid_focal_loss(x, label, fg_num, gamma=2.0, alpha=0.25) `Focal Loss `_ 被提出用于解决计算机视觉任务中前景-背景不平衡的问题。该OP先计算输入x中每个元素的sigmoid值,然后计算sigmoid值与类别目标值label之间的Focal Loss。 @@ -49,5 +49,5 @@ Focal Loss的计算过程如下: loss = fluid.layers.sigmoid_focal_loss(x=input, label=label, fg_num=fg_num, - gamma=2., + gamma=2.0, alpha=0.25) diff --git a/doc/fluid/api_cn/nets_cn/scaled_dot_product_attention_cn.rst b/doc/fluid/api_cn/nets_cn/scaled_dot_product_attention_cn.rst index 33d3ac3610e99ce186f448726730fb91181c3a39..070bddfa516aca22c71c50c35cdee357cad70449 100644 --- a/doc/fluid/api_cn/nets_cn/scaled_dot_product_attention_cn.rst +++ b/doc/fluid/api_cn/nets_cn/scaled_dot_product_attention_cn.rst @@ -43,31 +43,12 @@ scaled_dot_product_attention .. code-block:: python - import paddle.fluid as fluid - - queries = fluid.layers.data(name="queries", - shape=[3, 5, 9], - dtype="float32", - append_batch_size=False) - queries.stop_gradient = False - keys = fluid.layers.data(name="keys", - shape=[3, 6, 9], - dtype="float32", - append_batch_size=False) - keys.stop_gradient = False - values = fluid.layers.data(name="values", - shape=[3, 6, 10], - dtype="float32", - append_batch_size=False) - values.stop_gradient = False - contexts = fluid.nets.scaled_dot_product_attention(queries, keys, values) - contexts.shape # [3, 5, 10] - - - - - - + import paddle.fluid as fluid + queries = fluid.data(name="queries", shape=[3, 5, 9], dtype="float32") + keys = fluid.data(name="keys", shape=[3, 6, 9], dtype="float32") + values = fluid.data(name="values", shape=[3, 6, 10], dtype="float32") + contexts = fluid.nets.scaled_dot_product_attention(queries, keys, values) + contexts.shape # [3, 5, 10] diff --git a/doc/fluid/api_cn/nn_cn.rst b/doc/fluid/api_cn/nn_cn.rst new file mode 100644 index 0000000000000000000000000000000000000000..a17a6f6bb15065233095e3aeb44d8d4f246e894f --- /dev/null +++ b/doc/fluid/api_cn/nn_cn.rst @@ -0,0 +1,20 @@ +======================= +paddle.nn +======================= + + + + +.. toctree:: + :maxdepth: 1 + + nn_cn/Conv1D.rst + nn_cn/Conv2D.rst + nn_cn/diag_embed.rst + nn_cn/interpolate.rst + nn_cn/Linear.rst + nn_cn/log_softmax.rst + nn_cn/ReLU.rst + nn_cn/Upsample.rst + nn_cn/activation_cn.rst + nn_cn/loss_cn.rst diff --git a/doc/fluid/api_cn/nn_cn/Conv1D.rst b/doc/fluid/api_cn/nn_cn/Conv1D.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/Conv2D.rst b/doc/fluid/api_cn/nn_cn/Conv2D.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/Linear.rst b/doc/fluid/api_cn/nn_cn/Linear.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/ReLU.rst b/doc/fluid/api_cn/nn_cn/ReLU.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/Upsample.rst b/doc/fluid/api_cn/nn_cn/Upsample.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/activation_cn.rst b/doc/fluid/api_cn/nn_cn/activation_cn.rst new file mode 100644 index 0000000000000000000000000000000000000000..dadd26afbd650666203b019aaf4e62d60b23bd3e --- /dev/null +++ b/doc/fluid/api_cn/nn_cn/activation_cn.rst @@ -0,0 +1,11 @@ +======================= +activation +======================= + + + + +.. toctree:: + :maxdepth: 1 + + activation_cn/Sigmoid.rst diff --git a/doc/fluid/api_cn/nn_cn/activation_cn/Sigmoid.rst b/doc/fluid/api_cn/nn_cn/activation_cn/Sigmoid.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/diag_embed.rst b/doc/fluid/api_cn/nn_cn/diag_embed.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/interpolate.rst b/doc/fluid/api_cn/nn_cn/interpolate.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/log_softmax.rst b/doc/fluid/api_cn/nn_cn/log_softmax.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/loss_cn.rst b/doc/fluid/api_cn/nn_cn/loss_cn.rst new file mode 100644 index 0000000000000000000000000000000000000000..074f66312a2c3dcfeee6d3c16c5f052837fb5018 --- /dev/null +++ b/doc/fluid/api_cn/nn_cn/loss_cn.rst @@ -0,0 +1,15 @@ +======================= +loss +======================= + + + + +.. toctree:: + :maxdepth: 1 + + loss_cn/BCELoss.rst + loss_cn/CrossEntropyLoss.rst + loss_cn/L1Loss.rst + loss_cn/MSELoss.rst + loss_cn/NLLLoss.rst diff --git a/doc/fluid/api_cn/nn_cn/loss_cn/BCELoss.rst b/doc/fluid/api_cn/nn_cn/loss_cn/BCELoss.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/loss_cn/CrossEntropyLoss.rst b/doc/fluid/api_cn/nn_cn/loss_cn/CrossEntropyLoss.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss.rst b/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/loss_cn/MSELoss.rst b/doc/fluid/api_cn/nn_cn/loss_cn/MSELoss.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/nn_cn/loss_cn/NLLLoss.rst b/doc/fluid/api_cn/nn_cn/loss_cn/NLLLoss.rst new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst index dcd4fa67e18c2ac5ddc56af1db1d5a752ba850fd..8915f15d1133b74dc18342d49f66cdd24f663b15 100644 --- a/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst @@ -23,7 +23,9 @@ Adadelta优化器,具体细节可参考论文 `ADADELTA: AN ADAPTIVE LEARNING - **epsilon** (float) - 维持数值稳定性的浮点型值,默认值为1.0e-6。 - **rho** (float) - 算法中的衰减率,默认值为0.95。 - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - - **regularization** (WeightDecayRegularizer,可选) - 正则化方法,例如fluid.regularizer.L2DecayRegularizer等。默认值为None,表示无正则化。 + - **regularization** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 + :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化,这里的正则化设置将被忽略; + 如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化,这里的设置才会生效。默认值为None,表示没有正则化。 - **name** (str,可选) – 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为None。 **代码示例** @@ -49,9 +51,10 @@ Adadelta优化器,具体细节可参考论文 `ADADELTA: AN ADAPTIVE LEARNING - **startup_program** (Program,可选) – 参数所在的startup program。默认值为None,表示 :ref:`cn_api_fluid_default_startup_program` 。 - **parameter_list** (list,可选) – 待更新的Parameter或者Parameter.name组成的列表。默认值为None,表示所有参数均需要更新。 - **no_grad_set** (set,可选) – 不需要更新的Parameter或者Parameter.name组成的集合。默认值为None。 - - **grad_clip** (GradClipBase,可选) – 梯度裁剪的策略,目前仅在动态图模式下有效。 + - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略,支持三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。 + 默认值为None,此时将不进行梯度裁剪。 -返回: tuple(optimize_ops, params_grads),其中optimize_ops为参数优化OP列表;param_grads为由(param, param_grad)组成的列表,其中param和param_grad分别为参数和参数的梯度。 +返回: tuple(optimize_ops, params_grads),其中optimize_ops为参数优化OP列表;param_grads为由(param, param_grad)组成的列表,其中param和param_grad分别为参数和参数的梯度。该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中,若加入,则会重写 ``use_prune`` 参数为True,并根据 ``feed`` 和 ``fetch_list`` 进行剪枝,详见 ``Executor`` 的文档。 返回类型: tuple diff --git a/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst index 53b5b9774cc7d8177493b4f1af31c4950e5b37a0..5ab3a1426428ecbc4ced4e0a3f00c6e760e5241f 100644 --- a/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst @@ -25,7 +25,9 @@ Adaptive Gradient 优化器(自适应梯度优化器,简称Adagrad)可以针 - **learning_rate** (float|Variable) - 学习率,用于参数更新的计算。可以是一个浮点型值或者一个值为浮点型的Variable - **epsilon** (float, 可选) - 维持数值稳定性的浮点型值,默认值为1e-06 - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - - **regularization** (WeightDecayRegularizer, 可选) - 正则化函数,用于减少泛化误差。例如可以是 :ref:`cn_api_fluid_regularizer_L2DecayRegularizer` ,默认值为None + - **regularization** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 + :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化,这里的正则化设置将被忽略; + 如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化,这里的设置才会生效。默认值为None,表示没有正则化。 - **name** (str, 可选) - 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None - **initial_accumulator_value** (float, 可选) - moment累加器的初始值,默认值为0.0 @@ -59,9 +61,10 @@ Adaptive Gradient 优化器(自适应梯度优化器,简称Adagrad)可以针 - **startup_program** (Program, 可选) – 用于初始化parameter_list中参数的 :ref:`cn_api_fluid_Program` , 默认值为None,此时将使用 :ref:`cn_api_fluid_default_startup_program` - **parameter_list** (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表, 默认值为None,此时将更新所有的Parameter - **no_grad_set** (set, 可选) – 不需要更新的Parameter或者Parameter.name组成的集合。默认值为None - - **grad_clip** (GradClipBase, 可选) – 梯度裁剪的策略,静态图模式不需要使用本参数,当前本参数只支持在dygraph模式下的梯度裁剪,未来本参数可能会调整,默认值为None - -返回: (optimize_ops, params_grads),数据类型为(list, list),其中optimize_ops是minimize接口为网络添加的OP列表,params_grads是一个由(param, grad)变量对组成的列表,param是Parameter,grad是该Parameter对应的梯度值 + - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略,支持三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。 + 默认值为None,此时将不进行梯度裁剪。 + +返回: tuple(optimize_ops, params_grads),其中optimize_ops为参数优化OP列表;param_grads为由(param, param_grad)组成的列表,其中param和param_grad分别为参数和参数的梯度。该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中,若加入,则会重写 ``use_prune`` 参数为True,并根据 ``feed`` 和 ``fetch_list`` 进行剪枝,详见 ``Executor`` 的文档。 返回类型: tuple diff --git a/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst index 4b7683ffc2bdb8d50750c6aafc5d6012b2a2efb9..6861c11f39b330b134a9c0746bbcdf1db4ce7ff8 100644 --- a/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst @@ -28,7 +28,9 @@ Adam优化器出自 `Adam论文 `_ 的第二节 - **beta1** (float|Variable, 可选) - 一阶矩估计的指数衰减率,是一个float类型或者一个shape为[1],数据类型为float32的Variable类型。默认值为0.9 - **beta2** (float|Variable, 可选) - 二阶矩估计的指数衰减率,是一个float类型或者一个shape为[1],数据类型为float32的Variable类型。默认值为0.999 - **epsilon** (float, 可选) - 保持数值稳定性的短浮点类型值,默认值为1e-08 - - **regularization** (WeightDecayRegularizer, 可选) - 正则化函数,用于减少泛化误差。例如可以是 :ref:`cn_api_fluid_regularizer_L2DecayRegularizer` ,默认值为None + - **regularization** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 + :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化,这里的正则化设置将被忽略; + 如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化,这里的设置才会生效。默认值为None,表示没有正则化。 - **name** (str, 可选)- 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None - **lazy_mode** (bool, 可选) - 设为True时,仅更新当前具有梯度的元素。官方Adam算法有两个移动平均累加器(moving-average accumulators)。累加器在每一步都会更新。在密集模式和稀疏模式下,两条移动平均线的每个元素都会更新。如果参数非常大,那么更新可能很慢。 lazy mode仅更新当前具有梯度的元素,所以它会更快。但是这种模式与原始的算法有不同的描述,可能会导致不同的结果,默认为False @@ -129,9 +131,10 @@ Adam优化器出自 `Adam论文 `_ 的第二节 - **startup_program** (Program, 可选) – 用于初始化parameter_list中参数的 :ref:`cn_api_fluid_Program` , 默认值为None,此时将使用 :ref:`cn_api_fluid_default_startup_program` - **parameter_list** (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表, 默认值为None,此时将更新所有的Parameter - **no_grad_set** (set, 可选) – 不需要更新的Parameter或者Parameter.name组成的集合,默认值为None - - **grad_clip** (GradClipBase, 可选) – 梯度裁剪的策略,静态图模式不需要使用本参数,当前本参数只支持在dygraph模式下的梯度裁剪,未来本参数可能会调整,默认值为None - -返回: (optimize_ops, params_grads),数据类型为(list, list),其中optimize_ops是minimize接口为网络添加的OP列表,params_grads是一个由(param, grad)变量对组成的列表,param是Parameter,grad是该Parameter对应的梯度值 + - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略,支持三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。 + 默认值为None,此时将不进行梯度裁剪。 + +返回: tuple(optimize_ops, params_grads),其中optimize_ops为参数优化OP列表;param_grads为由(param, param_grad)组成的列表,其中param和param_grad分别为参数和参数的梯度。该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中,若加入,则会重写 ``use_prune`` 参数为True,并根据 ``feed`` 和 ``fetch_list`` 进行剪枝,详见 ``Executor`` 的文档。 返回类型: tuple diff --git a/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst index 4dcfd2c789150771d427f73fded8a5706f07bbfb..dc97ae71845f356b04dd934f22a2d06043e387b2 100644 --- a/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst @@ -30,7 +30,9 @@ Adamax优化器是参考 `Adam论文 `_ 第7节 - **beta2** (float, 可选) - 二阶矩估计的指数衰减率,默认值为0.999 - **epsilon** (float, 可选) - 保持数值稳定性的短浮点类型值,默认值为1e-08 - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - - **regularization** (WeightDecayRegularizer, 可选) - 正则化函数,用于减少泛化误差。例如可以是 :ref:`cn_api_fluid_regularizer_L2DecayRegularizer` ,默认值为None + - **regularization** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 + :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化,这里的正则化设置将被忽略; + 如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化,这里的设置才会生效。默认值为None,表示没有正则化。 - **name** (str, 可选)- 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None .. note:: @@ -73,9 +75,10 @@ Adamax优化器是参考 `Adam论文 `_ 第7节 - **startup_program** (Program, 可选) – 用于初始化parameter_list中参数的 :ref:`cn_api_fluid_Program` , 默认值为None,此时将使用 :ref:`cn_api_fluid_default_startup_program` - **parameter_list** (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表, 默认值为None,此时将更新所有的Parameter - **no_grad_set** (set, 可选) – 不需要更新的Parameter或者Parameter.name组成集合,默认值为None - - **grad_clip** (GradClipBase, 可选) – 梯度裁剪的策略,静态图模式不需要使用本参数,当前本参数只支持在dygraph模式下的梯度裁剪,未来本参数可能会调整,默认值为None - -返回: (optimize_ops, params_grads),数据类型为(list, list),其中optimize_ops是minimize接口为网络添加的OP列表,params_grads是一个由(param, grad)变量对组成的列表,param是Parameter,grad是该Parameter对应的梯度值 + - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略,支持三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。 + 默认值为None,此时将不进行梯度裁剪。 + +返回: tuple(optimize_ops, params_grads),其中optimize_ops为参数优化OP列表;param_grads为由(param, param_grad)组成的列表,其中param和param_grad分别为参数和参数的梯度。该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中,若加入,则会重写 ``use_prune`` 参数为True,并根据 ``feed`` 和 ``fetch_list`` 进行剪枝,详见 ``Executor`` 的文档。 **代码示例**: diff --git a/doc/fluid/api_cn/optimizer_cn/DGCMomentumOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/DGCMomentumOptimizer_cn.rst index fc09d52ef887f3b056814e15b4fe78cca53d8176..683719275b71c511bc214a05064a70cf6e33a8ea 100644 --- a/doc/fluid/api_cn/optimizer_cn/DGCMomentumOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/DGCMomentumOptimizer_cn.rst @@ -33,7 +33,9 @@ DGC还使用动量因子掩藏(momentum factor masking)和预训练(warm-u - **use_nesterov** (bool) - 启用Nesterov momentum。 True意味着使用Nesterov。默认值False。 - **local_grad_clip_norm** (float,可选) - 局部梯度裁减标准值。可选,默认为None,表示不需要裁减。 - **num_trainers** (int,可选) - 训练节点的数量。可选,默认为None。 - - **regularization** (WeightDecayRegularizer,可选) - 正则器, 如 :ref:`cn_api_fluid_regularizer_L2DecayRegularizer`。可选,默认为None。 + - **regularization** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 + :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化,这里的正则化设置将被忽略; + 如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化,这里的设置才会生效。默认值为None,表示没有正则化。 - **name** (str,可选) - 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None。 **代码示例** @@ -119,9 +121,10 @@ DGC还使用动量因子掩藏(momentum factor masking)和预训练(warm-u - **startup_program** (Program) – 用于初始化在parameter_list中参数的startup_program - **parameter_list** (list) – 待更新的Variables组成的列表 - **no_grad_set** (set|None) – 应该被无视的Variables集合 - - **grad_clip** (GradClipBase|None) – 梯度裁剪的策略 - -返回: (optimize_ops, params_grads),分别为附加的算子列表;一个由(param, grad) 变量对组成的列表,用于优化 + - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略,支持三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。 + 默认值为None,此时将不进行梯度裁剪。 + +返回: tuple(optimize_ops, params_grads),其中optimize_ops为参数优化OP列表;param_grads为由(param, param_grad)组成的列表,其中param和param_grad分别为参数和参数的梯度。该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中,若加入,则会重写 ``use_prune`` 参数为True,并根据 ``feed`` 和 ``fetch_list`` 进行剪枝,详见 ``Executor`` 的文档。 返回类型: tuple diff --git a/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst index 6341be07028835b0d1f941cc55f37d9fcd349384..daa2e716880add889f2e99c644e278e16331d9fe 100644 --- a/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst @@ -23,7 +23,9 @@ Decayed Adagrad优化器,可以看做是引入了衰减率的 `Adagrad