提交 0fab5dcc 编写于 作者: D dengkaipeng

Merge branch 'develop' of https://github.com/PaddlePaddle/FluidDoc into refine_dataloader_doc

......@@ -181,7 +181,7 @@ REGISTER_OPERATOR(
- Fluid提供的`DefaultGradOpMaker`,默认会将前向op的所有输入(`Input`)、输出(`Output`)以及输出变量所对应的梯度(`Output@Grad`)作为反向Op的输入,将前向Op输入所对应的梯度(`Input@Grad`)作为反向Op的输出。所以在使用`DefaultGradOpMaker`时需要考虑是否有些变量在计算中不被用到。
- 如果`DefaultGradOpMaker`不能够满足需求,需要用户自己手动构建`GradOpMaker`,具体实现请参考[相关文档](new_op.html#gradopmaker);
- 如果有些反向Op需要依赖前向Op的输入或输出变量的的Shape或LoD,但不依赖于变量中Tensor的Buffer,且不能根据其他变量推断出该Shape和LoD,需要对该变量(以下称该变量为`X`)在反向Op中进行注册`NoNeedBufferVarsInference`**一旦注册了`NoNeedBufferVarsIference`,反向op中就不能读写该变量对应的Tensor中的buffer,只能调用Tensor的dims()和lod()方法,同时,反向Op中的`GetExpectedKernelType()`必须要重写,并且`GetExpectedKernelType()`中不能访问`X`变量中Tensor的type()方法**。比如在`SliceOpGrad`中只会用到`Input`中变量的Shape信息,所以需要为对`Input``SliceOpGrad`上进行注册:
- 如果有些反向Op需要依赖前向Op的输入或输出变量的的Shape或LoD,但不依赖于变量中Tensor的Buffer,且不能根据其他变量推断出该Shape和LoD,则可以通过`DECLARE_NO_NEED_BUFFER_VARS_INFERER`接口对该变量(以下称该变量为`X`)在反向Op中进行注册`NoNeedBufferVars`**一旦注册了`NoNeedBufferVars`,反向op中就不能读写该变量对应的Tensor中的buffer,只能调用Tensor的dims()和lod()方法,同时,反向Op中的`GetExpectedKernelType()`必须要重写,并且`GetExpectedKernelType()`中不能访问`X`变量中Tensor的type()方法**。比如在`SliceOpGrad`中只会用到`Input`中变量的Shape信息,所以需要为对`Input``SliceOpGrad`上进行注册:
```
namespace paddle {
namespace operators {
......@@ -230,8 +230,8 @@ class SliceOpGradMaker : public framework::SingleGradOpMaker<T> {
}
};
DECLARE_NO_NEED_BUFFER_VARS_INFERENCE(SliceOpGradNoNeedBufferVarsInference,
"Input");
DECLARE_NO_NEED_BUFFER_VARS_INFERER(SliceOpGradNoNeedBufferVarsInference,
"Input");
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
......
......@@ -14,7 +14,7 @@
*
[x] 成功安装Paddle Fluid,如果尚未安装,请参考 `快速开始 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/quick_start_cn.html>`_
[x] 成功安装Paddle Fluid,如果尚未安装,请参考 `快速开始 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.7/beginners_guide/quick_start_cn.html>`_
*
[x] 学会最基本的单机训练方法,请参考 `单机训练 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/user_guides/howto/training/single_node.html>`_ 中描述的单卡训练,进行学习
......@@ -113,7 +113,7 @@
main_function(args.is_local)
* 说明:示例中使用的IO方法是dataset,想了解具体的文档和用法请参考 `Dataset API <hhttps://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/api_cn/dataset_cn.html>`_ 。示例中使用的 ``train_from_dataset`` 接口,想了解具体的文档和使用方法请参考 `Executor API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/api_cn/executor_cn.html>`_ 。示例中的 ``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet`` 表示引入参数服务器架构进行分布式训练,如果想更进一步了解Fleet API的更多选项和示例,请参考 `Fleet API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/user_guides/howto/training/fleet_api_howto_cn.html>`_
* 说明:示例中使用的IO方法是dataset,想了解具体的文档和用法请参考 `Dataset API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.7/api_cn/dataset_cn.html>`_ 。示例中使用的 ``train_from_dataset`` 接口,想了解具体的文档和使用方法请参考 `Executor API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.7/api_cn/executor_cn.html>`_ 。示例中的 ``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet`` 表示引入参数服务器架构进行分布式训练,如果想更进一步了解Fleet API的更多选项和示例,请参考 `Fleet API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.6/user_guides/howto/training/fleet_api_howto_cn.html>`_
单机训练启动命令
......
.. _cluster_quick_start_en:
Quick start for distributed training
====================================
Quick Start with Distributed Training
==========================
Distributed training with Fleet API
-----------------------------------
Preparation
--------------------
In this article, we'll show you how to quickly start a PaddlePaddle distributed training task in a cluster. Before you start, do some preparatory work as follows:
1. Prepare a connected training cluster. Here we use 4 training nodes with format ``*.paddlepaddle.com`` to represent the host name of the node. You can modify it according to the actual situation.
2. Make sure you have read :ref:`install_steps` before you start and can run PaddlePaddle on all nodes of the cluster.
Since Paddle Fluid `Release
1.5.1 <https://github.com/PaddlePaddle/Paddle/releases/tag/v1.5.1>`__,
it is officially recommended to use the Fleet API for distributed
training. For the introduction of the Fleet API, please refer to `Fleet
Design Doc <https://github.com/PaddlePaddle/Fleet>`__.
Example code
-------------
Let's use a very simple linear regression model as an example to explain how to start a distributed training task with 2 pserver server nodes and 2 trainer nodes. You can save this code as ``dist_train.py`` .
Preparation
~~~~~~~~~~~
- [x] Install Paddle Fluid. If not already installed, please refer to
`Beginner’s
Guide <https://www.paddlepaddle.org.cn/documentation/docs/en/1.7/beginners_guide/index_en.html>`__.
- [x] Master the most basic single node training method. Please refer
to the single card training described in `Single-node
training <https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/user_guides/howto/training/single_node_en.html>`__.
Click-through rate prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here, we will use a simple example, click-through rate prediction task,
to illustrate how to configure Fleet API for distributed training, and
gives an example by using a single node environment to simulate the
distributed environment. The source code of the example comes from `CTR
with
Fleet <https://github.com/PaddlePaddle/Fleet/tree/develop/examples/ctr>`__.
In order to facilitate learning, the example given here is a mixed code
of single node and multi node. You can start single node or multi node
tasks through different startup commands. For the part of obtaining data
and the logic of data preprocessing, please refer to the source code and
description of `CTR with
Fleet <https://github.com/PaddlePaddle/Fleet/tree/develop/examples/ctr>`__.
.. code:: python
from __future__ import print_function
from args import parse_args
import os
import paddle
import paddle.fluid as fluid
# train reader
BATCH_SIZE = 20
EPOCH_NUM = 30
BATCH_SIZE = 8
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500),
batch_size=BATCH_SIZE)
def train():
y = fluid.layers.data(name='y', shape=[1], dtype='float32')
x = fluid.layers.data(name='x', shape=[13], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
loss = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(loss)
opt = fluid.optimizer.SGD(learning_rate=0.001)
opt.minimize(avg_loss)
place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe = fluid.Executor(place)
# fetch distributed training environment setting
training_role = os.getenv("PADDLE_TRAINING_ROLE", None)
port = os.getenv("PADDLE_PSERVER_PORT", "6174")
pserver_ips = os.getenv("PADDLE_PSERVER_IPS", "")
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
eplist = []
for ip in pserver_ips.split(","):
eplist.append(':'.join([ip, port]))
pserver_endpoints = ",".join(eplist)
trainers = int(os.getenv("PADDLE_TRAINERS"))
current_endpoint = os.getenv("PADDLE_CURRENT_IP", "") + ":" + port
t = fluid.DistributeTranspiler()
t.transpile(
trainer_id = trainer_id,
pservers = pserver_endpoints,
trainers = trainers)
if training_role == "PSERVER":
pserver_prog = t.get_pserver_program(current_endpoint)
startup_prog = t.get_startup_program(current_endpoint, pserver_prog)
exe.run(startup_prog)
exe.run(pserver_prog)
elif training_role == "TRAINER":
trainer_prog = t.get_trainer_program()
import sys
from network_conf import ctr_dnn_model_dataset
import paddle.fluid.incubate.fleet.base.role_maker as role_maker
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
from paddle.fluid.transpiler.distribute_transpiler import DistributeTranspilerConfig
dense_feature_dim = 13
sparse_feature_dim = 10000001
batch_size = 100
thread_num = 10
embedding_size = 10
args = parse_args()
def main_function(is_local):
# common code for local training and distributed training
dense_input = fluid.layers.data(
name="dense_input", shape=[dense_feature_dim], dtype='float32')
sparse_input_ids = [
fluid.layers.data(name="C" + str(i), shape=[1], lod_level=1,
dtype="int64") for i in range(1, 27)]
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_use_var([dense_input] + sparse_input_ids + [label])
pipe_command = "python criteo_reader.py %d" % sparse_feature_dim
dataset.set_pipe_command(pipe_command)
dataset.set_batch_size(batch_size)
dataset.set_thread(thread_num)
whole_filelist = ["raw_data/part-%d" % x
for x in range(len(os.listdir("raw_data")))]
dataset.set_filelist(whole_filelist)
loss, auc_var, batch_auc_var = ctr_dnn_model_dataset(
dense_input, sparse_input_ids, label, embedding_size,
sparse_feature_dim)
exe = fluid.Executor(fluid.CPUPlace())
def train_loop(epoch=20):
for i in range(epoch):
exe.train_from_dataset(program=fluid.default_main_program(),
dataset=dataset,
fetch_list=[auc_var],
fetch_info=["auc"],
debug=False)
# local training
def local_train():
optimizer = fluid.optimizer.SGD(learning_rate=1e-4)
optimizer.minimize(loss)
exe.run(fluid.default_startup_program())
for epoch in range(EPOCH_NUM):
for batch_id, batch_data in enumerate(train_reader()):
avg_loss_value, = exe.run(trainer_prog,
feed=feeder.feed(batch_data),
fetch_list=[avg_loss])
if (batch_id + 1) % 10 == 0:
print("Epoch: {0}, Batch: {1}, loss: {2}".format(
epoch, batch_id, avg_loss_value[0]))
# destory the resource of current trainer node in pserver server node
exe.close()
train_loop()
# distributed training
def dist_train():
role = role_maker.PaddleCloudRoleMaker()
fleet.init(role)
strategy = DistributeTranspilerConfig()
strategy.sync_mode = False
optimizer = fluid.optimizer.SGD(learning_rate=1e-4)
optimizer = fleet.distributed_optimizer(optimizer, strategy)
optimizer.minimize(loss)
if fleet.is_server():
fleet.init_server()
fleet.run_server()
elif fleet.is_worker():
fleet.init_worker()
exe.run(fluid.default_startup_program())
train_loop()
if is_local:
local_train()
else:
raise AssertionError("PADDLE_TRAINING_ROLE should be one of [TRAINER, PSERVER]")
train()
Environment Variables
------------------------------------
When starting a distributed training task, different environment variables are used to represent different node roles, details as follows:
.. list-table::
:header-rows: 1
* - Environment Variable
- Data Type
- Example
- Description
* - :code:`PADDLE_TRAINING_ROLE`
- str
- :code:`PSERVER,TRANERR`
- role of current training node
* - :code:`PADDLE_PSERVER_IPS`
- str
- :code:`ps0.paddlepaddle.com, ps1.paddlepaddle.com`
- The IP addresses or hostnames of all pserver nodes in the distributed training task, separated by ","
* - :code:`PADDLE_PSERVER_PORT`
- int
- 6174
- port that the pserver process listens to
* - :code:`PADDLE_TRAINERS`
- int
- 2
- Number of trainer nodes in a distributed training task
* - :code:`PADDLE_CURRENT_IP`
- str
- :code:`ps0.paddlepaddle.com`
- IP address or hostname of the current pserver node
* - :code:`PADDLE_TRAINER_ID`
- str
- 0
- ID of the current trainer node (unique), in the range of [0, PADDLE_TRAINERS)
**Note:** Environment variables are just a way to get runtime information. In practical tasks, you can use command line parameters to obtain runtime information.
API related to Distributed Training
---------------------------------
DistributeTranspiler
~~~~~~~~~~~~~~~~~~~~~~
The machines in distributed training tasks based on the pserver-trainer architecture are divided into two roles: Parameter Server (pserver) and trainer. In Fluid, users only need to configure the network configuration required for single node training. The ``DistributeTranspiler`` module automatically modifies the single-node network settings into settings on which pserver and trainer needs to run based on the role of current training node:
dist_train()
.. code:: python
if __name__ == '__main__':
main_function(args.is_local)
t = fluid.DistributeTranspiler()
t.transpile(
trainer_id = trainer_id,
pservers = pserver_endpoints,
trainers = trainers)
if PADDLE_TRAINING_ROLE == "TRAINER":
# fetch the trainer program and execute it
trainer_prog = t.get_trainer_program()
...
- Note: The IO method used in this example is dataset, please refer to
`Dataset
API <https://www.paddlepaddle.org.cn/documentation/docs/en/1.7/api/dataset.html>`__
for specific documents and usage. For the ``train_from_dataset``
interface, please refer to `Executor
API <https://www.paddlepaddle.org.cn/documentation/docs/en/1.7/api/executor.html>`__.
``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet``
in this example means to introduce parameter server architecture for
distributed training, which you can refer to `Fleet
API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.6/user_guides/howto/training/fleet_api_howto_cn.html>`__
for getting more about the options and examples of Fleet API.
elif PADDLE_TRAINER_ROLE == "PSERVER":
# fetch the pserver program and execute it
pserver_prog = t.get_pserver_program(current_endpoint)
...
Start command of single node training
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: bash
Exe.close()
~~~~~~~~~~~~~~
python train.py --is_local 1
Start command of single machine simulation distributed training
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The status information of all trainer nodes is saved in the pserver node. When trainer finishes training, ``exe.close()`` should be called to notify all PServer nodes to release the resources of the current Trainer nodes:
Here we use launch\_ps, a built-in launcher of paddle, which users can
specify the number of workers and servers to start the parameter server
tasks.
.. code:: python
.. code:: bash
python -m paddle.distributed.launch_ps --worker_num 2 --server_num 2 train.py
exe = fluid.Executor(fluid.CPUPlace())
# training process ...
exe.close() # notify PServer to destory the resource
Note: every trainer needs to call exe.close() when the trainer finishes.
Start a Distributed Training Task
----------------------------------
.. list-table::
:header-rows: 1
* - Start Node
- Start Command
- Description
* - ps0.paddlepaddle.com
- :code:`PADDLE_TRAINING_ROLE=PSERVER PADDLE_CURRENT_IP=ps0.paddlepaddle.com PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_PSERVER_PORT=6174 python fluid_dist.py`
- Start pserver node
* - ps1.paddlepaddle.com
- :code:`PADDLE_TRAINING_ROLE=PSERVER PADDLE_CURRENT_IP=ps1.paddlepaddle.com PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_PSERVER_PORT=6174 python fluid_dist.py`
- Start pserver node
* - trainer0.paddlepaddle.com
- :code:`PADDLE_TRAINING_ROLE=TRAINER PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_TRAINER_ID=0 PADDLE_PSERVER_PORT=6174 python fluid_dist.py`
- Start the number 0 Trainer Node
* - trainer1.paddlepaddle.com
- :code:`PADDLE_TRAINING_ROLE=TRAINER PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_TRAINER_ID=1 PADDLE_PSERVER_PORT=6174 python fluid_dist.py`
- Start the number 1 trainer node
The task running log can be viewed in the logs directory of the working
directory. When you can use a single machine to simulate distributed
training, you can perform true multi node distributed training. We
recommend that users refer directly to
`百度云运行分布式任务的示例 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/user_guides/howto/training/deploy_ctr_on_baidu_cloud_cn.html>`__.
......@@ -13,9 +13,17 @@ fluid.dygraph
dygraph/Conv3D.rst
dygraph/Conv3DTranspose.rst
dygraph/CosineDecay.rst
dygraph/DataParallel.rst
dygraph/disable_dygraph.rst
dygraph/dygraph_to_static_code.rst
dygraph/dygraph_to_static_func.rst
dygraph/dygraph_to_static_output.rst
dygraph/dygraph_to_static_program.rst
dygraph/Embedding.rst
dygraph/enable_dygraph.rst
dygraph/enabled.rst
dygraph/ExponentialDecay.rst
dygraph/grad.rst
dygraph/GroupNorm.rst
dygraph/GRUUnit.rst
dygraph/guard.rst
......
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_dygraph_DataParallel:
DataParallel
------------
.. autoclass:: paddle.fluid.dygraph.DataParallel
:members:
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_dygraph_disable_dygraph:
disable_dygraph
---------------
.. autofunction:: paddle.fluid.dygraph.disable_dygraph
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_dygraph_dygraph_to_static_code:
dygraph_to_static_code
----------------------
.. autofunction:: paddle.fluid.dygraph.dygraph_to_static_code
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_dygraph_dygraph_to_static_func:
dygraph_to_static_func
----------------------
.. autofunction:: paddle.fluid.dygraph.dygraph_to_static_func
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_dygraph_dygraph_to_static_program:
dygraph_to_static_program
-------------------------
.. autofunction:: paddle.fluid.dygraph.dygraph_to_static_program
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_dygraph_enable_dygraph:
enable_dygraph
--------------
.. autofunction:: paddle.fluid.dygraph.enable_dygraph
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_dygraph_enabled:
enabled
-------
.. autofunction:: paddle.fluid.dygraph.enabled
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_dygraph_grad:
grad
----
.. autofunction:: paddle.fluid.dygraph.grad
:noindex:
......@@ -20,14 +20,15 @@ fluid
fluid/DataFeeder.rst
fluid/default_main_program.rst
fluid/default_startup_program.rst
fluid/disable_dygraph.rst
fluid/device_guard.rst
fluid/disable_dygraph.rst
fluid/DistributeTranspiler.rst
fluid/DistributeTranspilerConfig.rst
fluid/embedding.rst
fluid/enable_dygraph.rst
fluid/ExecutionStrategy.rst
fluid/Executor.rst
fluid/get_flags.rst
fluid/global_scope.rst
fluid/gradients.rst
fluid/in_dygraph_mode.rst
......@@ -47,6 +48,7 @@ fluid
fluid/require_version.rst
fluid/save.rst
fluid/scope_guard.rst
fluid/set_flags.rst
fluid/Tensor.rst
fluid/Variable.rst
fluid/WeightNormParamAttr.rst
......@@ -4,7 +4,7 @@
.. _api_fluid_device_guard:
device_guard
-----------------------
------------
.. autofunction:: paddle.fluid.device_guard
:noindex:
......
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_get_flags:
get_flags
---------
.. autofunction:: paddle.fluid.get_flags
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_set_flags:
set_flags
---------
.. autofunction:: paddle.fluid.set_flags
:noindex:
......@@ -19,6 +19,9 @@ import types
import os
import contextlib
import paddle.fluid as fluid
import paddle.tensor as tensor
import paddle.nn as nn
#import paddle.framework as framework
def parse_arg():
parser = argparse.ArgumentParser()
......@@ -29,8 +32,13 @@ def parse_arg():
'--module_prefix', type=str, help='Generate the prefix of module')
parser.add_argument(
'--output', type=str, help='Output file or output directory for output rst')
parser.add_argument(
'--output_name', type=str, help='Output file or output directory for output rst')
parser.add_argument(
'--output_dir', type=str, help='Output file or output directory for output rst')
parser.add_argument(
'--to_multiple_files', type=bool, default=False, help='Whether to separate to multiple files')
return parser.parse_args()
def print_item(self, name):
......@@ -140,7 +148,7 @@ class DocGenerator(object):
self.stream.write(".. _api_{0}_{1}:\n\n".format("_".join(
self.module_prefix.split(".")), name))
def generate_doc(module_name, module_prefix, output, to_multiple_files):
def generate_doc(module_name, module_prefix, output, output_name, to_multiple_files, output_dir):
if module_name == "":
module_name = None
......@@ -150,24 +158,29 @@ def generate_doc(module_name, module_prefix, output, to_multiple_files):
gen = DocGenerator()
if module_name is None:
gen.module = fluid
gen.module_name = 'fluid'
gen.module = eval(output_name)
gen.module_name = str(output_name)
else:
gen.module = fluid
gen.module = eval(output_name)
for each_module_name in module_name.split('.'):
if not hasattr(gen.module, each_module_name):
raise ValueError("Cannot find fluid.{0}".format(module_name))
else:
gen.module = getattr(gen.module, each_module_name)
gen.module_name = "fluid." + module_name
gen.module_name = output_name + "." + module_name
if module_prefix is None:
gen.module_prefix = gen.module_name
else:
gen.module_prefix = "fluid." + module_prefix
gen.module_prefix = output_name + "." + module_prefix
dirname = output if to_multiple_files else os.path.dirname(output)
if output_dir != None:
dirname = output_dir + "/" + dirname
output = output_dir + "/" + output
if len(dirname) > 0 and (not os.path.exists(dirname) or not os.path.isdir(dirname)):
os.makedirs(dirname)
......@@ -199,7 +212,7 @@ def generate_doc(module_name, module_prefix, output, to_multiple_files):
def main():
args = parse_arg()
generate_doc(args.module_name, args.module_prefix, args.output, args.to_multiple_files)
generate_doc(args.module_name, args.module_prefix, args.output, args.output_name, args.to_multiple_files, args.output_dir)
if __name__ == '__main__':
......
#!/bin/bash
#for module in nn
#do
# python gen_doc.py --module_name layers.${module} --module_prefix layers --output layers/${module} --to_multiple_files True
#done
#for module in control_flow nn io ops tensor learning_rate_scheduler detection metric_op
#do
# python gen_doc.py --module_name layers.${module} --module_prefix layers --output layers/${module}.rst
#done
for module in layers dataset clip metrics executor initializer io nets optimizer profiler regularizer transpiler backward profiler unique_name dygraph
do
python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --to_multiple_files True
python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name fluid --to_multiple_files True
python gen_module_index.py ${module} fluid.${module}
done
python gen_doc.py --module_name "" --module_prefix "" --output fluid --to_multiple_files True
python gen_doc.py --module_name "" --module_prefix "" --output fluid --output_name fluid --to_multiple_files True
python gen_module_index.py fluid fluid
for module in math random stat
do
python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name tensor --to_multiple_files True --output_dir tensor
python gen_module_index.py tensor.${module} ${module}
done
python gen_module_index.py tensor paddle.tensor
for module in loss
do
python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name nn --to_multiple_files True --output_dir nn
python gen_module_index.py nn.${module} ${module}
done
python gen_module_index.py nn paddle.nn
python gen_index.py
......@@ -19,8 +19,10 @@ API Reference
layers.rst
metrics.rst
nets.rst
nn.rst
optimizer.rst
profiler.rst
regularizer.rst
tensor.rst
transpiler.rst
unique_name.rst
......@@ -11,4 +11,3 @@ ComposeNotAligned
:inherited-members:
:noindex:
This indicates an error state of compose API, which will raise when outputs of readers are not aligned.
......@@ -25,6 +25,7 @@ fluid.layers
layers/atan.rst
layers/auc.rst
layers/autoincreased_step_counter.rst
layers/BasicDecoder.rst
layers/batch_norm.rst
layers/beam_search.rst
layers/beam_search_decode.rst
......@@ -68,6 +69,7 @@ fluid.layers
layers/cumsum.rst
layers/data.rst
layers/data_norm.rst
layers/DecodeHelper.rst
layers/Decoder.rst
layers/deformable_conv.rst
layers/deformable_roi_pooling.rst
......@@ -104,6 +106,7 @@ fluid.layers
layers/eye.rst
layers/fc.rst
layers/fill_constant.rst
layers/fill_constant_batch_size_like.rst
layers/filter_by_instag.rst
layers/flatten.rst
layers/floor.rst
......@@ -112,6 +115,7 @@ fluid.layers
layers/gather_nd.rst
layers/gather_tree.rst
layers/gaussian_random.rst
layers/gaussian_random_batch_size_like.rst
layers/gelu.rst
layers/generate_mask_labels.rst
layers/generate_proposal_labels.rst
......@@ -119,6 +123,7 @@ fluid.layers
layers/get_tensor_from_selected_rows.rst
layers/greater_equal.rst
layers/greater_than.rst
layers/GreedyEmbeddingHelper.rst
layers/grid_sampler.rst
layers/group_norm.rst
layers/gru_unit.rst
......@@ -136,6 +141,7 @@ fluid.layers
layers/image_resize.rst
layers/image_resize_short.rst
layers/increment.rst
layers/inplace_abn.rst
layers/instance_norm.rst
layers/inverse_time_decay.rst
layers/iou_similarity.rst
......@@ -237,6 +243,7 @@ fluid.layers
layers/rpn_target_assign.rst
layers/rsqrt.rst
layers/sampled_softmax_with_cross_entropy.rst
layers/SampleEmbeddingHelper.rst
layers/sampling_id.rst
layers/scale.rst
layers/scatter.rst
......@@ -302,10 +309,12 @@ fluid.layers
layers/tensor_array_to_tensor.rst
layers/thresholded_relu.rst
layers/topk.rst
layers/TrainingHelper.rst
layers/transpose.rst
layers/unfold.rst
layers/Uniform.rst
layers/uniform_random.rst
layers/uniform_random_batch_size_like.rst
layers/unique.rst
layers/unique_with_counts.rst
layers/unsqueeze.rst
......
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_BasicDecoder:
BasicDecoder
------------
.. autoclass:: paddle.fluid.layers.BasicDecoder
:members:
:inherited-members:
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_DecodeHelper:
DecodeHelper
------------
.. autoclass:: paddle.fluid.layers.DecodeHelper
:members:
:inherited-members:
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_GreedyEmbeddingHelper:
GreedyEmbeddingHelper
---------------------
.. autoclass:: paddle.fluid.layers.GreedyEmbeddingHelper
:members:
:inherited-members:
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_SampleEmbeddingHelper:
SampleEmbeddingHelper
---------------------
.. autoclass:: paddle.fluid.layers.SampleEmbeddingHelper
:members:
:inherited-members:
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_TrainingHelper:
TrainingHelper
--------------
.. autoclass:: paddle.fluid.layers.TrainingHelper
:members:
:inherited-members:
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_fill_constant_batch_size_like:
fill_constant_batch_size_like
-----------------------------
.. autofunction:: paddle.fluid.layers.fill_constant_batch_size_like
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_gaussian_random_batch_size_like:
gaussian_random_batch_size_like
-------------------------------
.. autofunction:: paddle.fluid.layers.gaussian_random_batch_size_like
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_inplace_abn:
inplace_abn
-----------
.. autofunction:: paddle.fluid.layers.inplace_abn
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_fluid_layers_uniform_random_batch_size_like:
uniform_random_batch_size_like
------------------------------
.. autofunction:: paddle.fluid.layers.uniform_random_batch_size_like
:noindex:
=========
paddle.nn
=========
.. toctree::
:maxdepth: 1
nn/loss.rst
====
loss
====
.. toctree::
:maxdepth: 1
loss/L1Loss.rst
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_nn_loss_L1Loss:
L1Loss
------
.. autoclass:: paddle.nn.loss.L1Loss
:members:
:inherited-members:
:noindex:
=============
paddle.tensor
=============
.. toctree::
:maxdepth: 1
tensor/math.rst
tensor/random.rst
====
math
====
.. toctree::
:maxdepth: 1
math/add.rst
math/atan.rst
math/div.rst
math/elementwise_sum.rst
math/mm.rst
math/mul.rst
math/pow.rst
math/sin.rst
math/sqrt.rst
math/sum.rst
math/tanh.rst
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_add:
add
---
.. autofunction:: paddle.tensor.math.add
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_atan:
atan
----
.. autofunction:: paddle.tensor.math.atan
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_div:
div
---
.. autofunction:: paddle.tensor.math.div
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_elementwise_sum:
elementwise_sum
---------------
.. autofunction:: paddle.tensor.math.elementwise_sum
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_mm:
mm
--
.. autofunction:: paddle.tensor.math.mm
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_mul:
mul
---
.. autofunction:: paddle.tensor.math.mul
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_pow:
pow
---
.. autofunction:: paddle.tensor.math.pow
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_sin:
sin
---
.. autofunction:: paddle.tensor.math.sin
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_sqrt:
sqrt
----
.. autofunction:: paddle.tensor.math.sqrt
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_sum:
sum
---
.. autofunction:: paddle.tensor.math.sum
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_math_tanh:
tanh
----
.. autofunction:: paddle.tensor.math.tanh
:noindex:
======
random
======
.. toctree::
:maxdepth: 1
random/randint.rst
random/randperm.rst
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_random_randint:
randint
-------
.. autofunction:: paddle.tensor.random.randint
:noindex:
.. THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
!DO NOT EDIT THIS FILE MANUALLY!
.. _api_tensor_random_randperm:
randperm
--------
.. autofunction:: paddle.tensor.random.randperm
:noindex:
......@@ -3,13 +3,19 @@
GradientClipByGlobalNorm
-------------------------------
.. py:class:: paddle.fluid.clip.GradientClipByGlobalNorm(clip_norm, group_name='default_group')
.. py:class:: paddle.fluid.clip.GradientClipByGlobalNorm(clip_norm, group_name='default_group', need_clip=None)
通过多个 Tensor 的范数之和的比率,来剪切(clip)多个 Tensor ( Tensor 不是从该类传入, 通过 ``fluid.program_guard`` 的 ``main_program`` 参数传入,即公式中的 :math:`t\_list` 见代码实例)
将一个 Tensor列表 :math:`t\_list` 中所有Tensor的L2范数之和,限定在 ``clip_norm`` 范围内
给定一个 Tensor 列表 :math:`t\_list` 和一个剪切比率 ``clip_norm`` ,返回该类的实例作为 ``set_gradient_clip`` 方法的第一个参数, ``set_gradient_clip`` 第二个参数是用来计算被剪切的 Tensor 列表(该值默认为 ``None`` 会基于所有 Tensor 列表来计算全局范数 ``global_norm`` 。
- 如果范数之和大于 ``clip_norm`` ,则所有 Tensor 会乘以一个系数进行压缩
剪切过程如下:
- 如果范数之和小于或等于 ``clip_norm`` ,则不会进行任何操作。
输入的 Tensor列表 不是从该类里传入, 而是默认会选择 ``Program`` 中全部的梯度,如果 ``need_clip`` 不为None,则可以只选择部分参数进行梯度裁剪。
该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效,可参看 ``optimizer`` 文档(例如: :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
裁剪公式如下:
.. math::
\\t\_list[i]=t\_list[i]∗\frac{clip\_norm}{max(global\_norm,clip\_norm)}\\
......@@ -21,67 +27,73 @@ GradientClipByGlobalNorm
参数:
- **clip_norm** (float) - 范数最大值
- **clip_norm** (float) - 所允许的范数最大值
- **group_name** (str, optional) - 剪切的组名
- **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数,该函数接收一个 ``Parameter`` ,返回一个 ``bool`` (True表示需要裁剪,False不需要裁剪)。默认为None,此时会裁剪网络中全部参数。
**代码示例**
**代码示例1:静态图**
.. code-block:: python
import paddle.fluid as fluid
import paddle.fluid.core as core
import paddle
place = core.CPUPlace()
prog = fluid.framework.Program()
startup_program = fluid.framework.Program()
import paddle.fluid as fluid
import numpy as np
main_prog = fluid.Program()
startup_prog = fluid.Program()
with fluid.program_guard(
main_program=prog, startup_program=startup_program):
image = fluid.layers.data(name='x', shape=[784], dtype='float32')
label = fluid.layers.data(name='y', shape=[1], dtype='int64')
hidden1 = fluid.layers.fc(input=image, size=128, act='relu')
hidden2 = fluid.layers.fc(input=hidden1, size=64, act='relu')
predict = fluid.layers.fc(input=hidden2, size=10, act='softmax')
cost = fluid.layers.cross_entropy(input=predict, label=label)
avg_cost = fluid.layers.mean(cost)
prog_clip = prog.clone()
avg_cost_clip = prog_clip.block(0).var(avg_cost.name)
p_g = fluid.backward.append_backward(loss=avg_cost)
p_g_clip = fluid.backward.append_backward(loss=avg_cost_clip)
with fluid.program_guard(main_program=prog_clip, startup_program=startup_program):
fluid.clip.set_gradient_clip(
fluid.clip.GradientClipByGlobalNorm(clip_norm=2.0))
p_g_clip = fluid.clip.append_gradient_clip_ops(p_g_clip)
grad_list = [elem[1] for elem in p_g]
grad_clip_list = [elem[1] for elem in p_g_clip]
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=8192),
batch_size=128)
exe = fluid.Executor(place)
feeder = fluid.DataFeeder(feed_list=[image, label], place=place)
exe.run(startup_program)
count = 0
for data in train_reader():
count += 1
print("count:%s" % count)
if count > 5:
break
out = exe.run(prog, feed=feeder.feed(data), fetch_list=grad_list)
out_clip = exe.run(prog_clip,
feed=feeder.feed(data),
fetch_list=grad_clip_list)
main_program=main_prog, startup_program=startup_prog):
image = fluid.data(
name='x', shape=[-1, 2], dtype='float32')
predict = fluid.layers.fc(input=image, size=3, act='relu') #Trainable parameters: fc_0.w.0, fc_0.b.0
loss = fluid.layers.mean(predict)
# 裁剪网络中全部参数:
clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0)
# 仅裁剪参数fc_0.w_0时:
# 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为Parameter,返回类型为bool
# def fileter_func(Parameter):
# # 可以较为方便的通过Parameter.name判断(name可以在fluid.ParamAttr中设置,默认为fc_0.w_0、fc_0.b_0)
# return Parameter.name=="fc_0.w_0"
# clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)
sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1)
sgd_optimizer.minimize(loss, grad_clip=clip)
place = fluid.CPUPlace()
exe = fluid.Executor(place)
x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
exe.run(startup_prog)
out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
**代码示例2:动态图**
.. code-block:: python
import paddle
import paddle.fluid as fluid
with fluid.dygraph.guard():
linear = fluid.dygraph.Linear(10, 10) #可训练参数: linear_0.w.0, linear_0.b.0
inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
out = linear(fluid.dygraph.to_variable(inputs))
loss = fluid.layers.reduce_mean(out)
loss.backward()
# 裁剪网络中全部参数:
clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0)
# 仅裁剪参数linear_0.w_0时:
# 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为ParamBase,返回类型为bool
# def fileter_func(ParamBase):
# # 可以通过ParamBase.name判断(name可以在fluid.ParamAttr中设置,默认为linear_0.w_0、linear_0.b_0)
# return ParamBase.name == "linear_0.w_0"
# # 注:linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差,也可以此来判断
# return ParamBase.name == linear.weight.name
# clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)
sgd_optimizer = fluid.optimizer.SGD(
learning_rate=0.1, parameter_list=linear.parameters())
sgd_optimizer.minimize(loss, grad_clip=clip)
\ No newline at end of file
......@@ -3,11 +3,19 @@
GradientClipByNorm
-------------------------------
.. py:class:: paddle.fluid.clip.GradientClipByNorm(clip_norm)
.. py:class:: paddle.fluid.clip.GradientClipByNorm(clip_norm, need_clip=None)
将输入多维Tensor :math:`X` 转换为L2范数不超过给定的二范数最大值( ``clip_norm`` )的多维Tensor。(多维Tensor不是从该类传入, 而是通过 ``fluid.program_guard`` 的 ``main_program`` 参数传入)
将输入的多维Tensor :math:`X` 的L2范数限制在 ``clip_norm`` 范围之内
该类限制了输入多维Tensor :math:`X` 的L2范数不会超过 ``clip_norm`` 。
- 如果L2范数大于 ``clip_norm`` ,则该 Tensor 会乘以一个系数进行压缩
- 如果L2范数小于或等于 ``clip_norm`` ,则不会进行任何操作。
输入的 Tensor 不是从该类里传入, 而是默认会选择 ``Program`` 中全部的梯度,如果 ``need_clip`` 不为None,则可以只选择部分参数进行梯度裁剪。
该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效,可参看 ``optimizer`` 文档(例如: :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
裁剪公式如下:
.. math::
......@@ -26,54 +34,72 @@ GradientClipByNorm
\\norm(X) = (\sum_{i=1}^{n}|x_i|^2)^{\frac{1}{2}}\\
参数:
- **clip_norm** (float) - 二范数最大值
- **clip_norm** (float) - 所允许的二范数最大值。
- **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数,该函数接收一个 ``Parameter`` ,返回一个 ``bool`` (True表示需要裁剪,False不需要裁剪)。默认为None,此时会裁剪网络中全部参数。
**代码示例**
**代码示例1:静态图**
.. code-block:: python
import paddle
import paddle.fluid as fluid
import numpy as np
main_prog = fluid.Program()
startup_prog = fluid.Program()
with fluid.program_guard(
main_program=main_prog, startup_program=startup_prog):
image = fluid.data(
name='x', shape=[-1, 2], dtype='float32')
predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0
loss = fluid.layers.mean(predict)
# 裁剪网络中全部参数:
clip = fluid.clip.GradientClipByNorm(clip_norm=1.0)
# 仅裁剪参数fc_0.w_0时:
# 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为Parameter,返回类型为bool
# def fileter_func(Parameter):
# # 可以较为方便的通过Parameter.name判断(name可以在fluid.ParamAttr中设置,默认为fc_0.w_0、fc_0.b_0)
# return Parameter.name=="fc_0.w_0"
# clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func)
sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1)
sgd_optimizer.minimize(loss, grad_clip=clip)
place = fluid.CPUPlace()
exe = fluid.Executor(place)
x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
exe.run(startup_prog)
out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
**代码示例2:动态图**
.. code-block:: python
import paddle.fluid as fluid
import paddle.fluid.core as core
import paddle
place = core.CPUPlace()
prog = fluid.framework.Program()
startup_program = fluid.framework.Program()
with fluid.program_guard(
main_program=prog, startup_program=startup_program):
image = fluid.layers.data(name='x', shape=[784], dtype='float32')
label = fluid.layers.data(name='y', shape=[1], dtype='int64')
hidden1 = fluid.layers.fc(input=image, size=128, act='relu')
hidden2 = fluid.layers.fc(input=hidden1, size=64, act='relu')
predict = fluid.layers.fc(input=hidden2, size=10, act='softmax')
cost = fluid.layers.cross_entropy(input=predict, label=label)
avg_cost = fluid.layers.mean(cost)
prog_clip = prog.clone()
avg_cost_clip = prog_clip.block(0).var(avg_cost.name)
p_g = fluid.backward.append_backward(loss=avg_cost)
p_g_clip = fluid.backward.append_backward(loss=avg_cost_clip)
with fluid.program_guard(main_program=prog_clip, startup_program=startup_program):
fluid.clip.set_gradient_clip(
fluid.clip.GradientClipByNorm(clip_norm=2.0))
p_g_clip = fluid.clip.append_gradient_clip_ops(p_g_clip)
grad_list = [elem[1] for elem in p_g]
grad_clip_list = [elem[1] for elem in p_g_clip]
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=8192),
batch_size=128)
exe = fluid.Executor(place)
feeder = fluid.DataFeeder(feed_list=[image, label], place=place)
exe.run(startup_program)
count = 0
for data in train_reader():
count += 1
print("count:%s" % count)
if count > 5:
break
out = exe.run(prog, feed=feeder.feed(data), fetch_list=grad_list)
out_clip = exe.run(prog_clip,
feed=feeder.feed(data),
fetch_list=grad_clip_list)
import paddle
import paddle.fluid as fluid
with fluid.dygraph.guard():
linear = fluid.dygraph.Linear(10, 10) #可训练参数: linear_0.w.0, linear_0.b.0
inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
out = linear(fluid.dygraph.to_variable(inputs))
loss = fluid.layers.reduce_mean(out)
loss.backward()
# 裁剪网络中全部参数:
clip = fluid.clip.GradientClipByNorm(clip_norm=1.0)
# 仅裁剪参数linear_0.w_0时:
# 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为ParamBase,返回类型为bool
# def fileter_func(ParamBase):
# # 可以通过ParamBase.name判断(name可以在fluid.ParamAttr中设置,默认为linear_0.w_0、linear_0.b_0)
# return ParamBase.name == "linear_0.w_0"
# # 注:linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差,也可以此来判断
# return ParamBase.name == linear.weight.name
# clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func)
sgd_optimizer = fluid.optimizer.SGD(
learning_rate=0.1, parameter_list=linear.parameters())
sgd_optimizer.minimize(loss, grad_clip=clip)
\ No newline at end of file
......@@ -3,10 +3,14 @@
GradientClipByValue
-------------------------------
.. py:class:: paddle.fluid.clip.GradientClipByValue(max, min=None)
.. py:class:: paddle.fluid.clip.GradientClipByValue(max, min=None, need_clip=None)
将梯度值(gradient values)的范围压缩到 [min, max]。
将输入的多维Tensor :math:`X` 的值限制在 [min, max] 范围。
输入的 Tensor 不是从该类里传入, 而是默认会选择 ``Program`` 中全部的梯度,如果 ``need_clip`` 不为None,则可以只选择部分参数进行梯度裁剪。
该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效,可参看 ``optimizer`` 文档(例如: :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
给定一个 Tensor ``t`` ,该操作将它的值压缩到 ``min`` 和 ``max`` 之间
......@@ -16,25 +20,75 @@ GradientClipByValue
参数:
- **max** (foat) - 要修剪的最大值。
- **min** (float,optional) - 要修剪的最小值。如果用户没有设置,将被 ``framework`` 设置为 ``-max`` 。
- **min** (float,optional) - 要修剪的最小值。如果用户没有设置,将被自动设置为 ``-max`` (此时 ``max`` 必须大于0)。
- **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数,该函数接收一个 ``Parameter`` ,返回一个 ``bool`` (True表示需要裁剪,False不需要裁剪)。默认为None,此时会裁剪网络中全部参数。
**代码示例**
**代码示例1:静态图**
.. code-block:: python
import paddle
import paddle.fluid as fluid
import numpy as np
main_prog = fluid.Program()
startup_prog = fluid.Program()
with fluid.program_guard(
main_program=main_prog, startup_program=startup_prog):
image = fluid.data(
name='x', shape=[-1, 2], dtype='float32')
predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0
loss = fluid.layers.mean(predict)
# 裁剪网络中全部参数:
clip = fluid.clip.GradientClipByValue(min=-1, max=1)
import paddle.fluid as fluid
w_param_attrs = fluid.ParamAttr(name=None,
initializer=fluid.initializer.UniformInitializer(low=-1.0, high=1.0, seed=0),
learning_rate=1.0,
regularizer=fluid.regularizer.L1Decay(1.0),
trainable=True,
gradient_clip=fluid.clip.GradientClipByValue(-1.0, 1.0))
x = fluid.layers.data(name='x', shape=[10], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, param_attr=w_param_attrs)
# 仅裁剪参数fc_0.w_0时:
# 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为Parameter,返回类型为bool
# def fileter_func(Parameter):
# # 可以较为方便的通过Parameter.name判断(name可以在fluid.ParamAttr中设置,默认为fc_0.w_0、fc_0.b_0)
# return Parameter.name=="fc_0.w_0"
# clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func)
sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1)
sgd_optimizer.minimize(loss, grad_clip=clip)
place = fluid.CPUPlace()
exe = fluid.Executor(place)
x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
exe.run(startup_prog)
out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
**代码示例2:动态图**
.. code-block:: python
import paddle
import paddle.fluid as fluid
with fluid.dygraph.guard():
linear = fluid.dygraph.Linear(10, 10) #可训练参数: linear_0.w.0, linear_0.b.0
inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
out = linear(fluid.dygraph.to_variable(inputs))
loss = fluid.layers.reduce_mean(out)
loss.backward()
# 裁剪网络中全部参数:
clip = fluid.clip.GradientClipByValue(min=-1, max=1)
# 仅裁剪参数linear_0.w_0时:
# 为need_clip参数传入一个函数fileter_func,fileter_func接收参数的类型为ParamBase,返回类型为bool
# def fileter_func(ParamBase):
# # 可以通过ParamBase.name判断(name可以在fluid.ParamAttr中设置,默认为linear_0.w_0、linear_0.b_0)
# return ParamBase.name == "linear_0.w_0"
# # 注:linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差,可以此来判断
# return ParamBase.name == linear.weight.name
# clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func)
sgd_optimizer = fluid.optimizer.SGD(
learning_rate=0.1, parameter_list=linear.parameters())
sgd_optimizer.minimize(loss, grad_clip=clip)
......@@ -7,12 +7,17 @@ set_gradient_clip
.. py:function:: paddle.fluid.clip.set_gradient_clip(clip, param_list=None, program=None)
.. warning::
此API对位置使用的要求较高,其必须位于组建网络之后, ``minimize`` 之前,因此在未来版本中可能被删除,故不推荐使用。推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。
有三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。
如果 ``set_gradient_clip(clip)`` 与 ``minimize(loss, grad_clip=clip)`` 被同时使用,``set_gradient_clip`` 将不会生效。
给指定参数做梯度裁剪。
参数:
- **clip** (BaseGradientClipAttr) - BaseGradientClipAttr子类的实例,如 :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 等,用于描述具体的裁剪方法和属性。
- **clip** (GradientClipBase) - 梯度裁剪的策略,如 :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 等,用于描述具体的裁剪方法和属性。
- **param_list** (list(Variable),可选) - 需要裁剪的参数列表,可以是参数或参数名称列表。默认值为None,表示裁剪 ``program`` 中的所有参数。
- **program** (Program,可选) - 参数所在的Program。默认值为None,表示使用 :ref:`cn_api_fluid_default_main_program`。
- **program** (Program,可选) - 参数所在的Program。默认值为None,表示使用 :ref:`cn_api_fluid_default_main_program`
返回: 无。
......@@ -59,3 +64,17 @@ set_gradient_clip
param_list=[param_var1, param_var2])
sgd = fluid.optimizer.SGD(learning_rate=1e-3)
sgd.minimize(loss)
# network 4: use set_gradient_clip and minimize(grad_clip=clip) together
with fluid.program_guard(fluid.Program(), fluid.Program()):
loss = network()
param_var1 = fluid.default_main_program().global_block().var("fc1_param")
param_var2 = fluid.default_main_program().global_block().var("fc2_param")
clip1 = fluid.clip.GradientClipByValue(min=-1.0, max=1.0), param_list=[param_var1, param_var2])
clip2 = fluid.clip.GradientClipByNorm(clip_norm=1.0), param_list=[param_var1, param_var2])
# 设置梯度裁剪策略:clip1
fluid.clip.set_gradient_clip(clip1)
sgd = fluid.optimizer.SGD(learning_rate=1e-3)
# 设置梯度裁剪策略:clip2
sgd.minimize(loss, grad_clip=clip2)
# 有设置冲突时,set_gradient_clip将不会生效,将以clip2的策略进行梯度裁剪
......@@ -19,6 +19,7 @@ fluid.dygraph
dygraph_cn/Embedding_cn.rst
dygraph_cn/ExponentialDecay_cn.rst
dygraph_cn/FC_cn.rst
dygraph_cn/grad_cn.rst
dygraph_cn/GroupNorm_cn.rst
dygraph_cn/GRUUnit_cn.rst
dygraph_cn/guard_cn.rst
......
......@@ -47,7 +47,7 @@ LayerNorm
x = numpy.random.random((3, 32, 32)).astype('float32')
with fluid.dygraph.guard():
x = to_variable(x)
layernorm = fluid.LayerNorm('LayerNorm', begin_norm_axis=1)
ret = layernorm(x)
layerNorm = fluid.LayerNorm([32, 32])
ret = layerNorm(x)
......@@ -21,6 +21,100 @@ Layer的全名。组成方式为: ``name_scope`` + “/” + MyLayer.__class__
返回类型:str
.. py:method:: register_forward_pre_hook(hook)
Layer注册一个 ``forward pre-hook`` 函数,该 ``hook`` 函数将会在 ``forward`` 函数调用之前被调用。
``hook`` 函数具有以下形式:它的 ``input`` ``Layer`` ``input`` ,并且可以返回一个元组或者单个修改值;如果返回单个修改值,则将值包装到一个元组中。用户可以使用该函数来查看或修改 ``Layer`` ``forward`` 函数的输入。
hook(Layer, input) -> None or modified input
参数:
- **hook** (function) - 被注册为 ``forward pre-hook`` 的函数
返回:一个 ``HookRemoveHelper`` 类对象,可通过调用 ``hook_remove_helper.remove()`` 来删除注册的hook函数。
返回类型: ``HookRemoveHelper`` 类对象
**代码示例**
.. code-block:: python
import paddle.fluid as fluid
import numpy as np
# forward_pre_hook函数修改了layer的输入:input = input * 2
def forward_pre_hook(layer, input):
# 改变输入值
input_return = (input[0] * 2)
return input_return
with fluid.dygraph.guard():
linear = fluid.Linear(13, 5, dtype="float32")
# 注册hook
forward_pre_hook_handle = linear.register_forward_pre_hook(forward_pre_hook)
value0 = np.arange(26).reshape(2, 13).astype("float32")
in0 = fluid.dygraph.to_variable(value0)
out0 = linear(in0)
# 移除hook
forward_pre_hook_handle.remove()
value1 = value0 * 2
in1 = fluid.dygraph.to_variable(value1)
out1 = linear(in1)
# hook改变了layer的输入(input = input * 2),所以out0等于out1
assert (out0.numpy() == out1.numpy()).any()
.. py:method:: register_forward_post_hook(hook)
Layer注册一个 ``forward post-hook`` 函数,该 ``hook`` 函数将会在 ``forward`` 函数调用之后被调用。
``hook`` 函数具有以下形式,它的 ``input`` ``output`` ``Layer`` ``input`` ``output`` 。用户可以用该函数来查看和修改 ``Layer`` ``forward`` 函数的输出。
hook(Layer, input, output) -> None or modified output
参数:
- **hook** (function) - 被注册为 ``forward post-hook`` 的函数
返回:一个 ``HookRemoveHelper`` 类对象,可通过调用 ``hook_remove_helper.remove()`` 来删除注册的hook函数。
返回类型: ``HookRemoveHelper`` 类对象
**代码示例**
.. code-block:: python
import paddle.fluid as fluid
import numpy as np
# forward_post_hook函数改变了layer的输出:output = output * 2
def forward_post_hook(layer, input, output):
# 改变输出值
return output * 2
with fluid.dygraph.guard():
linear = fluid.Linear(13, 5, dtype="float32")
# 注册hook
forward_post_hook_handle = linear.register_forward_post_hook(forward_post_hook)
value1 = np.arange(26).reshape(2, 13).astype("float32")
in1 = fluid.dygraph.to_variable(value1)
out0 = linear(in1)
# remove the hook
forward_post_hook_handle.remove()
out1 = linear(in1)
# hook改变了layer的输出(output = output * 2),所以out0等于out1 * 2
assert (out0.numpy() == (out1.numpy()) * 2).any()
.. py:method:: create_parameter(shape, attr=None, dtype="float32", is_bias=False, default_initializer=None)
Layer创建参数。
......
......@@ -5,7 +5,7 @@ NoamDecay
**注意:该API仅支持【动态图】模式**
.. py:class:: paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32')
.. py:class:: paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32', learning_rate=1.0)
该接口提供Noam衰减学习率的功能。
......@@ -13,7 +13,7 @@ Noam衰减的计算方式如下。
.. math::
decayed\_learning\_rate = d_{model}^{-0.5} * min(global\_steps^{-0.5}, global\_steps * warmup\_steps^{-1.5})
decayed\_learning\_rate = learning\_rate * d_{model}^{-0.5} * min(global\_steps^{-0.5}, global\_steps * warmup\_steps^{-1.5})
关于Noam衰减的更多细节请参考 `attention is all you need <https://arxiv.org/pdf/1706.03762.pdf>`_
......@@ -28,6 +28,7 @@ Noam衰减的计算方式如下。
- **begin** (int,可选) – 起始步。即以上运算式子中global_steps的初始值。默认值为0。
- **step** (int,可选) – 步大小。即以上运算式子中global_steps的递增值。默认值为1。
- **dtype** (str,可选) – 学习率值的数据类型,可以为"float32", "float64"。默认值为"float32"。
- **learning_rate** (Variable|float|int,可选) - 初始学习率。如果类型为Variable,则为shape为[1]的Tensor,数据类型为float32或float64;也可以是python的int类型。默认值为1.0。
返回: 无
......@@ -39,7 +40,9 @@ Noam衰减的计算方式如下。
warmup_steps = 100
learning_rate = 0.01
with fluid.dygraph.guard():
emb = fluid.dygraph.Embedding([10, 10])
optimizer = fluid.optimizer.SGD(
learning_rate = fluid.dygraph.NoamDecay(
1/(warmup_steps *(learning_rate ** 2)),
warmup_steps) )
warmup_steps),
parameter_list = emb.parameters())
......@@ -35,8 +35,10 @@ PiecewiseDecay
boundaries = [10000, 20000]
values = [1.0, 0.5, 0.1]
with fluid.dygraph.guard():
emb = fluid.dygraph.Embedding( [10, 10] )
optimizer = fluid.optimizer.SGD(
learning_rate=fluid.dygraph.PiecewiseDecay(boundaries, values, 0) )
learning_rate=fluid.dygraph.PiecewiseDecay(boundaries, values, 0),
parameter_list = emb.parameters() )
......
.. _cn_api_fluid_dygraph_grad:
grad
-------------------------------
**注意:该API仅支持【动态图】模式**
.. py:method:: paddle.fluid.dygraph.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False, no_grad_vars=None, backward_strategy=None)
对于每个 `inputs` ,计算所有 `outputs` 相对于其的梯度和。
参数:
- **outputs** (Variable|list(Variable)|tuple(Variable)) – 用于计算梯度的图的输出变量,或多个输出变量构成的list/tuple。
- **inputs** (Variable|list(Variable)|tuple(Variable)) - 用于计算梯度的图的输入变量,或多个输入变量构成的list/tuple。该API的每个返回值对应每个 `inputs` 的梯度。
- **grad_outputs** (Variable|list(Variable|None)|tuple(Variable|None), 可选) - `outputs` 变量梯度的初始值。若 `grad_outputs` 为None,则 `outputs` 梯度的初始值均为全1的Tensor。若 `grad_outputs` 不为None,它必须与 `outputs` 的长度相等,此时,若 `grad_outputs` 的第i个元素为None,则第i个 `outputs` 的梯度初始值为全1的Tensor;若 `grad_outputs` 的第i个元素为Variable,则第i个 `outputs` 的梯度初始值为 `grad_outputs` 的第i个元素。默认值为None。
- **retain_graph** (bool, 可选) - 是否保留计算梯度的前向图。若值为True,则前向图会保留,用户可对同一张图求两次反向。若值为False,则前向图会释放。默认值为None,表示值与 `create_graph` 相等。
- **create_graph** (bool, 可选) - 是否创建计算过程中的反向图。若值为True,则可支持计算高阶导数。若值为False,则计算过程中的反向图会释放。默认值为False。
- **only_inputs** (bool, 可选) - 是否只计算 `inputs` 的梯度。若值为False,则图中所有叶节点变量的梯度均会计算,并进行累加。若值为True,则只会计算 `inputs` 的梯度。默认值为True。only_inputs=False功能正在开发中,目前尚不支持。
- **allow_unused** (bool, 可选) - 决定当某些 `inputs` 变量不在计算图中时抛出错误还是返回None。若某些 `inputs` 变量不在计算图中(即它们的梯度为None),则当allowed_unused=False时会抛出错误,当allow_unused=True时会返回None作为这些变量的梯度。默认值为False。
- **no_grad_vars** (Variable|list(Variable)|tuple(Variable)|set(Variable), 可选) - 指明不需要计算梯度的变量。默认值为None。
- **backward_strategy** (BackwardStrategy, 可选) - 计算梯度的策略。详见 :ref:`cn_api_fluid_dygraph_BackwardStrategy` 。默认值为None。
返回: 变量构成的tuple,其长度等于 `inputs` 中的变量个数,且第i个返回的变量是所有 `outputs` 相对于第i个 `inputs` 的梯度之和。
返回类型: tuple
**示例代码 1**
.. code-block:: python
import paddle.fluid as fluid
def test_dygraph_grad(create_graph):
with fluid.dygraph.guard():
x = fluid.layers.ones(shape=[1], dtype='float32')
x.stop_gradient = False
y = x * x
# Since y = x * x, dx = 2 * x
dx = fluid.dygraph.grad(
outputs=[y],
inputs=[x],
create_graph=create_graph,
retain_graph=True)[0]
z = y + dx
# If create_graph = False, the gradient of dx
# would not be backpropagated. Therefore,
# z = x * x + dx, and x.gradient() = 2 * x = 2.0
# If create_graph = True, the gradient of dx
# would be backpropagated. Therefore,
# z = x * x + dx = x * x + 2 * x, and
# x.gradient() = 2 * x + 2 = 4.0
z.backward()
return x.gradient()
print(test_dygraph_grad(create_graph=False)) # [2.]
print(test_dygraph_grad(create_graph=True)) # [4.]
**示例代码 2**
.. code-block:: python
import paddle.fluid as fluid
fluid.enable_dygraph()
def test_dygraph_grad(grad_outputs=None):
x = fluid.layers.fill_constant(shape=[1], value=2.0, dtype='float32')
x.stop_gradient = False
y1 = x * x
y2 = x * 3
# If grad_outputs=None, dy1 = [1], dy2 = [1].
# If grad_outputs=[g1, g2], then:
# - dy1 = [1] if g1 is None else g1
# - dy2 = [1] if g2 is None else g2
# Since y1 = x * x, dx = 2 * x * dy1.
# Since y2 = x * 3, dx = 3 * dy2.
# Therefore, the final result would be:
# dx = 2 * x * dy1 + 3 * dy2 = 4 * dy1 + 3 * dy2.
dx = fluid.dygraph.grad(
outputs=[y1, y2],
inputs=[x],
grad_outputs=grad_outputs)[0]
return dx.numpy()
THREE = fluid.layers.fill_constant(shape=[1], value=3.0, dtype='float32')
FOUR = fluid.layers.fill_constant(shape=[1], value=4.0, dtype='float32')
# dy1 = [1], dy2 = [1]
print(test_dygraph_grad(None)) # [7.]
# dy1 = [1], dy2 = [4]
print(test_dygraph_grad([None, FOUR])) # [16.]
# dy1 = [4], dy2 = [1]
print(test_dygraph_grad([FOUR, None])) # [19.]
# dy1 = [3], dy2 = [4]
print(test_dygraph_grad([THREE, FOUR])) # [24.]
\ No newline at end of file
......@@ -96,6 +96,7 @@ Executor支持单GPU、多GPU以及CPU运行。在Executor构造时,需要传
- **scope** (Scope) – 该参数表示执行当前program所使用的作用域,用户可以为不同的program指定不同的作用域。默认值:fluid.global_scope()。
- **return_numpy** (bool) – 该参数表示是否将返回返回的计算结果(fetch list中指定的变量)转化为numpy;如果为False,则每个变量返回的类型为LoDTensor,否则返回变量的类型为numpy.ndarray。默认为:True。
- **use_program_cache** (bool) – 该参数表示是否对输入的Program进行缓存。如果该参数为True,在以下情况时,模型运行速度可能会更快:输入的program为 ``fluid.Program`` ,并且模型运行过程中,调用该接口的参数(program、 feed变量名和fetch_list变量)名始终不变。默认为:False。
- **use_prune** (bool) – 该参数表示是否对输入的Program进行剪枝。如果该参数为True,输入的Program会在run之前根据 ``feed`` 和 ``fetch_list`` 进行剪枝,剪枝的逻辑是将产生 ``feed`` 的 ``Variable`` 和 ``Operator`` 以及不产生 ``fetch_list`` 的 ``Variable`` 和 ``Operator`` 进行裁剪。默认为:False,表示不进行剪枝。请注意,如果将 ``Optimizer.minimize()`` 方法返回的 ``tuple`` 传入 ``fetch_list`` 中,则 ``use_prune`` 会被重写为True,并且会开启剪枝。
返回:返回fetch_list中指定的变量值
......
......@@ -7,7 +7,7 @@ BuildStrategy
.. py:class:: paddle.fluid.BuildStrategy
``BuildStrategy`` 使用户更方便地控制[ ``ParallelExecutor`` ](../fluid_cn.html\#parallelexecutor)中计算图的建造方法,可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。
``BuildStrategy`` 使用户更方便地控制 :ref:`cn_api_fluid_ParallelExecutor` 中计算图的建造方法,可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。
**代码示例**
......@@ -68,6 +68,7 @@ bool类型。表明是否融合(fuse) broadcast ops。该选项指在Reduce模
**代码示例**
.. code-block:: python
import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_broadcast_ops = True
......@@ -108,6 +109,7 @@ bool类型。表明是否融合(fuse) relu和depthwise_conv2d,节省GPU内存
import os
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.compiler as compiler
use_cuda = True
......
......@@ -22,34 +22,29 @@ CompiledProgram根据 `build_strategy` 的配置将输入的Program或Graph进
.. code-block:: python
import paddle.fluid as fluid
import paddle.fluid.compiler as compiler
import numpy
import os
place = fluid.CUDAPlace(0) # fluid.CPUPlace()
exe = fluid.Executor(place)
data = fluid.layers.data(name='X', shape=[1], dtype='float32')
data = fluid.data(name='X', shape=[None, 1], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10)
loss = fluid.layers.mean(hidden)
fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)
exe.run(fluid.default_startup_program())
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_all_optimizer_ops = True
compiled_prog = compiler.CompiledProgram(
fluid.default_main_program(),
build_strategy=build_strategy)
compiled_prog = fluid.CompiledProgram(
fluid.default_main_program())
x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = exe.run(compiled_prog,
feed={"X": x},
fetch_list=[loss.name])
feed={"X": x},
fetch_list=[loss.name])
.. py:method:: with_data_parallel(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None, places=None)
该接口用于将输入的Program或Graph进行转换,以便通过数据并行模式运行该模型。用户可以通过 `build_strategy` 和 `exec_strategy` 设置计算图构建和计算图执行过程中可以进行的一些优化,例如:将梯度聚合的AllReduce操作进行融合、指定计算图运行过程中使用的线程池大小等。**注意:如果在构建CompiledProgram和调用with_data_parallel时都指定了build_strategy,在CompiledProgram中的build_strategy会被复写,因此,如果是数据并行训练,建议在调用with_data_parallel接口设置build_strategy**。
该接口用于将输入的Program或Graph进行转换,以便通过数据并行模式运行该模型。用户可以通过 `build_strategy` 和 `exec_strategy` 设置计算图构建和计算图执行过程中可以进行的一些优化,例如:将梯度聚合的AllReduce操作进行融合、指定计算图运行过程中使用的线程池大小等。**注意:如果在构建CompiledProgram和调用with_data_parallel时都指定了build_strategy,在CompiledProgram中的build_strategy会被复写,因此,如果是数据并行训练,建议在调用with_data_parallel接口设置build_strategy**。
参数:
- **loss_name** (str) - 该参数为模型最后得到的损失变量的名字,**注意:如果是模型训练,必须设置loss_name,否则计算结果可能会有问题。** 默认为:None。
......@@ -70,45 +65,47 @@ CompiledProgram根据 `build_strategy` 的配置将输入的Program或Graph进
**代码示例**
.. code-block:: python
import paddle.fluid as fluid
import paddle.fluid.compiler as compiler
import numpy
import os
use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
parallel_places = [fluid.CUDAPlace(0), fluid.CUDAPlace(1)] if use_cuda else [fluid.CPUPlace()] * 2
# 注意:如果你使用CPU运行程序,需要具体设置CPU_NUM,
# 否则fluid会把逻辑核的所有数目设为CPU_NUM,
# 在这种情况下,输入的batch size应大于CPU_NUM,
# 否则程序会异常中断。
if not use_cuda:
os.environ['CPU_NUM'] = str(2)
exe = fluid.Executor(place)
data = fluid.layers.data(name='X', shape=[1], dtype='float32')
data = fluid.data(name='X', shape=[None, 1], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10)
loss = fluid.layers.mean(hidden)
test_program = fluid.default_main_program().clone(for_test=True)
fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)
exe.run(fluid.default_startup_program())
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_all_reduce_ops = True
compiled_train_prog = compiler.CompiledProgram(
fluid.default_main_program()).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
# 注意:如果此处不设置share_vars_from=compiled_train_prog,测试过程中用的参数与训练使用的参数是不一致
compiled_test_prog = compiler.CompiledProgram(
test_program).with_data_parallel(
share_vars_from=compiled_train_prog)
compiled_train_prog = fluid.CompiledProgram(
fluid.default_main_program()).with_data_parallel(
loss_name=loss.name, places=parallel_places)
# 注意:如果此处不设置share_vars_from=compiled_train_prog,
# 测试过程中用的参数与训练使用的参数是不一致
compiled_test_prog = fluid.CompiledProgram(
test_program).with_data_parallel(
share_vars_from=compiled_train_prog,
places=parallel_places)
train_data = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = exe.run(compiled_train_prog,
feed={"X": train_data},
fetch_list=[loss.name])
feed={"X": train_data},
fetch_list=[loss.name])
test_data = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = exe.run(compiled_test_prog,
feed={"X": test_data},
fetch_list=[loss.name])
\ No newline at end of file
feed={"X": test_data},
fetch_list=[loss.name])
\ No newline at end of file
......@@ -33,7 +33,7 @@ ExecutionStrategy
train_exe = fluid.ParallelExecutor(use_cuda=False,
loss_name=avg_loss.name,
exec_strategy=exec_strategy)
exec_strategy=exec_strategy)
.. py:attribute:: num_iteration_per_drop_scope
......
......@@ -5,7 +5,11 @@ ParamAttr
-------------------------------
.. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)
.. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, do_model_average=False)
.. note::
该类中的 ``gradient_clip`` 属性在2.0版本会废弃,推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。共有三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、
:ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。
创建一个参数属性对象,用户可设置参数的名称、初始化方式、学习率、正则化规则、是否需要训练、梯度裁剪方式、是否做模型平均等属性。
......@@ -13,9 +17,10 @@ ParamAttr
- **name** (str,可选) - 参数的名称。默认值为None,表示框架自动创建参数的名称。
- **initializer** (Initializer,可选) - 参数的初始化方式。默认值为None,表示权重参数采用Xavier初始化方式,偏置参数采用全0初始化方式。
- **learning_rate** (float) - 参数的学习率。实际参数的学习率等于全局学习率乘以参数的学习率,再乘以learning rate schedule的系数。
- **regularizer** (WeightDecayRegularizer,可选) - 正则化因子。默认值为None,表示没有正则化因子。
- **regularizer** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、
:ref:`cn_api_fluid_regularizer_L2Decay` ,如果在 ``optimizer`` (例如 :ref:`cn_api_fluid_optimizer_SGDOptimizer` ) 中也
设置了正则化,``optimizer`` 中的正则化将被忽略。默认值为None,表示没有正则化。
- **trainable** (bool) - 参数是否需要训练。默认值为True,表示需要训练。
- **gradient_clip** (BaseGradientClipAttr,可选) - 梯度裁剪方式。默认值为None,表示不需要梯度裁剪。
- **do_model_average** (bool) - 是否做模型平均。默认值为False,表示不做模型平均。
返回: 表示参数属性的对象。
......
......@@ -57,13 +57,12 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构
import paddle.fluid as fluid
prog = fluid.default_main_program()
a = fluid.layers.data(name="X", shape=[2,3], dtype="float32", append_batch_size=False)
c = fluid.layers.fc(a, size=3)
x = fluid.layers.data(name="X", shape=[2,3], dtype="float32", append_batch_size=False)
pred = fluid.layers.fc(x, size=3)
prog_string = prog.to_string(throw_on_error=True, with_details=False)
prog_string_with_details = prog.to_string(throw_on_error=False, with_details=True)
print(prog_string)
print("\n =============== with_details =============== \n")
print(prog_string_with_details)
print("program string without detail: {}".format(prog_string))
print("program string with detail: {}".format(prog_string_with_details))
.. py:method:: clone(for_test=False)
......@@ -82,16 +81,19 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构
**代码示例**
.. code-block:: python
::
import paddle.fluid as fluid
## 我们推荐在使用 Optimizer前使用clone()接口
test_program = fluid.default_main_program().clone(for_test=True)
optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
optimizer.minimize()
import paddle.fluid as fluid
img = fluid.layers.data(name='image', shape=[784])
pred = fluid.layers.fc(input=img, size=10, act='relu')
loss = fluid.layers.mean(pred)
## 我们推荐在使用 Optimizer前使用clone()接口
test_program = fluid.default_main_program().clone(for_test=True)
optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
optimizer.minimize(loss)
参数:
- **for_test** (bool) – 取值为True时,clone方法内部会把operator的属性 ``is_test`` 设置为 True, 并裁剪反向OP和参数优化OP
- **for_test** (bool) – 取值为True时,clone方法内部会把operator的属性 ``is_test`` 设置为 True, 并裁剪反向OP和参数优化OP,默认值为False
返回:当 ``for_test=True`` 时返回一个新的、仅包含当前Program前向内容的Program。否则返回一个新的,和当前Program完全相同的Program
......@@ -150,7 +152,7 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构
input=fluid.layers.fc(hidden, size=10, act='softmax'),
label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
avg_loss = fluid.layers.mean(loss)
test_program = train_program.clone(for_test=False)
test_program = train_program.clone(for_test=True)
print_prog(test_program)
# 由于需要使训练和测试参数共享,我们需要使用训练的 ``startup_program``
......@@ -182,7 +184,8 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构
for key, value in sorted(six.iteritems(op.all_attrs())):
if key not in ['op_callstack', 'op_role_var']:
print(" [ attrs: {}: {} ]".format(key, value))
def network(is_test):
def network():
img = fluid.layers.data(name='image', shape=[784])
hidden = fluid.layers.fc(input=img, size=200, act='relu')
hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
......@@ -192,19 +195,19 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构
avg_loss = fluid.layers.mean(loss)
return avg_loss
train_program_2 = fluid.Program()
startup_program_2 = fluid.Program()
test_program_2 = fluid.Program()
with fluid.program_guard(train_program_2, startup_program_2):
with fluid.unique_name.guard():
sgd = fluid.optimizer.SGD(learning_rate=1e-3)
sgd.minimize(avg_loss)
avg_loss = network()
sgd = fluid.optimizer.SGD(learning_rate=1e-3)
sgd.minimize(avg_loss)
# 不使用测试阶段的启动程序
with fluid.program_guard(test_program_2, fluid.Program()):
with fluid.program_guard(test_program_2, startup_program_2):
with fluid.unique_name.guard():
loss = network(is_test=True)
print(test_program_2)
avg_loss = network()
print_prog(test_program_2)
上边两个代码片段生成和打印的Program是一样的。
......@@ -268,24 +271,7 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构
.. py:attribute:: random_seed
**注意:必须在相关OP被添加之前设置。例如**
**代码示例**
.. code-block:: python
import paddle.fluid as fluid
prog = fluid.default_main_program()
random_seed = prog.random_seed
x_var = fluid.layers.data(name="X", shape=[3,3], dtype="float32", append_batch_size=False)
# 这里我们必须要在fluid.layers.dropout之前设置random_seed
print(random_seed)
prog.random_seed = 1
z_var = fluid.layers.dropout(x_var, 0.7)
print(prog.random_seed)
**注意:必须在相关OP被添加之前设置。**
程序中随机运算符的默认随机种子。0意味着随机生成随机种子。
......@@ -301,12 +287,16 @@ Program是Paddle Fluid对于计算图的一种静态描述,使用Program的构
prog = fluid.default_main_program()
random_seed = prog.random_seed
x_var = fluid.layers.data(name="X", shape=[3,3], dtype="float32", append_batch_size=False)
print(random_seed)
prog.random_seed = 1
print(prog.random_seed)
## 0
## 默认的random seed是 0
# 这里我们必须要在fluid.layers.dropout之前设置random_seed
prog.random_seed = 1
z_var = fluid.layers.dropout(x_var, 0.7)
print(prog.random_seed)
## 1
## 修改后random seed变成了 1
......
......@@ -5,8 +5,11 @@ WeightNormParamAttr
**注意:该API仅支持【静态图】模式**
.. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)
.. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, do_model_average=False)
.. note::
该类中的 ``gradient_clip`` 属性在2.0版本会废弃,推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。共有三种裁剪策略: :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、
:ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。
该类定义了权重归一化(Weight Normalization)的参数。权重归一化可以将神经网络中权重向量的长度与其方向解耦,详细的定义与实现可以参考论文:`Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks <https://arxiv.org/pdf/1602.07868.pdf>`_
......@@ -15,9 +18,10 @@ WeightNormParamAttr
- **name** (None|str) - 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认为None。
- **initializer** (Initializer) - 初始化参数方法,例如 ``initializer = fluid.initializer.ConstantInitializer(1.0)`` 。默认为None,如果为None则使用默认初始化函数 `Xavier()` 。
- **learning_rate** (float32) - 学习率,优化过程 :math:`global\_lr∗parameter\_lr∗scheduler\_factor` 的学习速率,默认为1.0。
- **regularizer** (WeightDecayRegularizer) - 正则化方法,例如 ``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)`` 。默认为None,如果为None则对权重不做正则化。
- **regularizer** (WeightDecayRegularizer,可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、
:ref:`cn_api_fluid_regularizer_L2Decay` ,如果在 ``optimizer`` (例如 :ref:`cn_api_fluid_optimizer_SGDOptimizer` ) 中也
设置了正则化,``optimizer`` 中的正则化将被忽略。默认值为None,表示没有正则化。
- **trainable** (bool) - 可选,指明参数是否可训练,默认为True。
- **gradient_clip** - 梯度裁剪(Gradient Clipping)的方法,例如 ``gradient_clip = fluid.clip.GradientClipByNorm(clip_norm=2.0))`` 。默认为None,如果为None则对权重不做裁剪。
- **do_model_average** (bool) - 可选,指明参数是否需要模型平均化操作(Model Average),默认为False。
......@@ -36,7 +40,6 @@ WeightNormParamAttr
learning_rate=1.0,
regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1),
trainable=True,
gradient_clip=fluid.clip.GradientClipByNorm(clip_norm=2.0),
do_model_average=False))
......
......@@ -26,7 +26,7 @@ gradients
import paddle.fluid as fluid
x = fluid.layers.data(name='x', shape=[2,8,8], dtype='float32')
x = fluid.data(name='x', shape=[None,2,8,8], dtype='float32')
x.stop_gradient=False
y = fluid.layers.conv2d(x, 4, 1, bias_attr=False)
y = fluid.layers.relu(y)
......
......@@ -23,7 +23,7 @@ program_guard
main_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(main_program, startup_program):
data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
data = fluid.data(name='image', shape=[None, 784, 784], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10, act='relu')
例如,当组的网不需要startup_program初始化各变量时,可以传入一个临时的program。
......@@ -36,5 +36,5 @@ program_guard
main_program = fluid.Program()
# 如果您不需要关心startup program,传入一个临时值即可
with fluid.program_guard(main_program, fluid.Program()):
data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
data = fluid.data(name='image', shape=[None, 784, 784], dtype='float32')
=======================
paddle.framework
=======================
.. toctree::
:maxdepth: 1
framework_cn/get_default_dtype.rst
framework_cn/manual_seed.rst
framework_cn/set_default_dtype.rst
......@@ -7,6 +7,7 @@ API Reference
../api_guides/index_cn.rst
fluid_cn.rst
api_tree_cn.rst
backward_cn.rst
clip_cn.rst
dataset_cn.rst
......
......@@ -138,7 +138,10 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本
# -------------------------------------------------------
.. py:method:: from_generator(feed_list=None, capacity=None, use_double_buffer=True, iterable=True, return_list=False, use_multiprocess=False)
.. py:method:: from_generator(feed_list=None, capacity=None, use_double_buffer=True, iterable=True, return_list=False, use_multiprocess=False, drop_last=True)
.. note::
框架保证DataLoader的数据加载顺序与用户提供的数据源读取顺序一致。
创建一个DataLoader对象用于加载Python生成器产生的数据。数据会由Python线程预先读取,并异步送入一个队列中。
......@@ -158,12 +161,13 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本
- **iterable** (bool) - 所创建的DataLoader对象是否可迭代。
- **return_list** (bool) - 每个设备上的数据是否以list形式返回。仅在iterable = True模式下有效。若return_list = False,每个设备上的返回数据均是str -> LoDTensor的映射表,其中映射表的key是每个输入变量的名称。若return_list = True,则每个设备上的返回数据均是list(LoDTensor)。推荐在静态图模式下使用return_list = False,在动态图模式下使用return_list = True。
- **use_multiprocess** (bool) - 设置是否是用多进程加速动态图的数据载入过程。注意:该参数的设置仅在动态图模式下有效, 在静态图模式下,该参数设置与否均无任何影响。默认值为False。
- **drop_last** (bool): 是否丢弃最后的不足CPU/GPU设备数的批次。默认值为True。在网络训练时,用户不能设置drop_last=False,此时所有CPU/GPU设备均应从DataLoader中读取到数据。在网络预测时,用户可以设置drop_last=False,此时最后不足CPU/GPU设备数的批次可以进行预测。
返回: 被创建的DataLoader对象
返回类型: loader (DataLoader)
**代码示例**
**代码示例 1**
.. code-block:: python
......@@ -297,6 +301,50 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本
assert relu.shape == [BATCH_SIZE, 784]
**代码示例 2**
.. code-block:: python
import paddle.fluid as fluid
import numpy as np
import os
# We use 2 CPU cores to run inference network
os.environ['CPU_NUM'] = '2'
# The data source has only 3 batches, which can not be
# divided evenly to each CPU core
def batch_generator():
for i in range(3):
yield np.array([i+1]).astype('float32'),
x = fluid.data(name='x', shape=[None], dtype='float32')
y = x * x
def run_inference(drop_last):
loader = fluid.io.DataLoader.from_generator(feed_list=[x],
capacity=8, drop_last=drop_last)
loader.set_batch_generator(batch_generator, fluid.cpu_places())
exe = fluid.Executor(fluid.CPUPlace())
prog = fluid.CompiledProgram(fluid.default_main_program())
prog = prog.with_data_parallel()
result = []
for data in loader():
each_ret, = exe.run(prog, feed=data, fetch_list=[y])
result.extend(each_ret)
return result
# Set drop_last to True, so that the last batch whose
# number is less than CPU core number would be discarded.
print(run_inference(drop_last=True)) # [1.0, 4.0]
# Set drop_last to False, so that the last batch whose
# number is less than CPU core number can be tested.
print(run_inference(drop_last=False)) # [1.0, 4.0, 9.0]
.. py:method:: from_dataset(dataset, places, drop_last=True)
创建一个DataLoader对象用于加载Dataset产生的数据。目前,Dataset仅支持Linux系统下使用。
......
......@@ -38,7 +38,7 @@ LSTMCell
.. code-block:: python
import paddle.fluid.layers as layers
cell = layers.rnn.LSTMCell(hidden_size=256)
cell = layers.LSTMCell(hidden_size=256)
.. py:method:: call(inputs, states)
......@@ -61,4 +61,4 @@ LSTMCell的 :code:`state_shape` 是一个具有两个形状的列表::math:`[[
返回:LSTMCell的 :code:`state_shape`
返回类型:list
\ No newline at end of file
返回类型:list
......@@ -34,14 +34,13 @@ add_position_encoding
.. code-block:: python
import paddle.fluid as fluid
tensor = fluid.layers.data(
import paddle.fluid as fluid
tensor = fluid.data(
name='tensor',
shape=[32, 64, 512],
dtype='float32',
append_batch_size=False)
position_tensor = fluid.layers.add_position_encoding(
shape=[None, 64, 512],
dtype='float32')
position_tensor = fluid.layers.add_position_encoding(
input=tensor, alpha=1.0, beta=1.0)
......@@ -53,4 +52,3 @@ add_position_encoding
......@@ -9,7 +9,7 @@ argsort
参数:
- **input** (Variable) - 输入的多维 ``Tensor`` ,支持的数据类型:float32、float64。
- **input** (Variable) - 输入的多维 ``Tensor`` ,支持的数据类型:float32、float64、int16、int32、int64、uint8
- **axis** (int,可选) - 指定对输入Tensor进行运算的轴, ``axis`` 的有效范围是[-R, R),R是输入 ``x`` 的Rank, ``axis`` 为负时与 ``axis`` +R 等价。默认值为0。
- **descending** (bool,可选) - 指定算法排序的方向。如果设置为True,算法按照降序排序。如果设置为False或者不设置,按照升序排序。默认值为False。
- **name** (str,可选) – 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为None。
......
......@@ -26,7 +26,7 @@ create_global_var
import paddle.fluid as fluid
import paddle.fluid.layers as layers
var = layers.create_global_var(shape=[2,3], value=1.0, dtype='float32',
persistable=True, force_cpu=True, name='new_var')
persistable=True, force_cpu=True, name='new_var')
......
......@@ -5,17 +5,18 @@ ctc_greedy_decoder
.. py:function:: paddle.fluid.layers.ctc_greedy_decoder(input, blank, name=None)
**注意:该OP的输入input必须是2维LoDTensor, lod_level为1**
该OP用于贪婪策略解码序列,步骤如下:
1. 获取输入中的每一行的最大值索引,也就是numpy.argmax(input, axis=0)。
2. 对于step1结果中的每个序列,合并两个空格之间的重复部分并删除所有空格。
该API支持两种输入,LoDTensor和Tensor输入,不同输入的代码样例如下:
**样例**:
::
# for lod tensor input
已知:
input.data = [[0.6, 0.1, 0.3, 0.1],
......@@ -45,13 +46,38 @@ ctc_greedy_decoder
output.lod = [[2, 1]]
# for tensor input
input.data = [[[0.6, 0.1, 0.3, 0.1],
[0.3, 0.2, 0.4, 0.1],
[0.1, 0.5, 0.1, 0.3],
[0.5, 0.1, 0.3, 0.1]],
[[0.5, 0.1, 0.3, 0.1],
[0.2, 0.2, 0.2, 0.4],
[0.2, 0.2, 0.1, 0.5],
[0.5, 0.1, 0.3, 0.1]]]
input_length.data = [[4], [4]]
input.shape = [2, 4, 4]
step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get:
[[0], [2], [1], [0]], for input.data[4:8] is [[0], [3], [3], [0]], shape is [2,4,1]
step2: Change the argmax result to use padding mode, then argmax result is
[[0, 2, 1, 0], [0, 3, 3, 0]], shape is [2, 4], lod is [], input_length is [[4], [4]]
step3: Apply ctc_align to padding argmax result, padding_value is 0
Finally:
output.data = [[2, 1, 0, 0],
[3, 0, 0, 0]]
output_length.data = [[2], [1]]
参数:
- **input** (Variable) — 变长序列的概率,2维LoDTensor, lod_level为1。它的形状是[Lp, num_classes + 1],其中Lp是所有输入序列长度的和,num_classes是类别数目(不包括空白标签)。数据类型是float32或者float64
- **input** (Variable) — 变长序列的概率, 在输入为LoDTensor情况下,它是具有LoD信息的二维LoDTensor。 形状为[Lp,num_classes +1],其中Lp是所有输入序列的长度之和,num_classes是真实的类数。 在输入为Tensor情况下,它是带有填充的3-D张量,其形状为[batch_size,N,num_classes +1]。 (不包括空白标签)。 数据类型可以是float32或float64。
- **blank** (int) — Connectionist Temporal Classification (CTC) loss空白标签索引, 其数值属于半开区间[0,num_classes + 1)
- **name** (str) — (str|None,可选) – 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None
返回: CTC贪婪解码结果是一个形为(Lp,1)的2维LoDTensor,lod_level为1,其中Lp是所有输出序列的长度之和。如果结果中的所有序列都为空,则输出LoDTensor为[-1],其lod信息为空
返回:对于输入为LoDTensor的情况,返回CTC贪婪解码器的结果,即2-D LoDTensor,形状为[Lp,1],数据类型为int64。 “ Lp”是所有输出序列长度的总和。 如果结果中的所有序列均为空,则结果LoDTensor将为[-1],其中LoD为[[]]。对于输入为Tensor的情况,返回一个元组,(output, output_length), 其中,output是一个形状为 [batch_size, N],类型为int64的Tensor。output_length是一个形状为[batch_size, 1],类型为int64的Tensor,表示Tensor输入下,每个输出序列的长度
返回类型: Variable
......@@ -60,9 +86,15 @@ ctc_greedy_decoder
.. code-block:: python
# for lod mode
import paddle.fluid as fluid
x = fluid.layers.data(name='x', shape=[8], dtype='float32')
x = fluid.data(name='x', shape=[None, 8], dtype='float32', lod_level=1)
cost = fluid.layers.ctc_greedy_decoder(input=x, blank=0)
# for padding mode
x_pad = fluid.data(name='x_pad', shape=[10, 4, 8], dtype='float32')
x_pad_len = fluid.data(name='x_pad_len', shape=[10, 1], dtype='int64')
out, out_len = fluid.layers.ctc_greedy_decoder(input=x_pad, blank=0,
input_length=x_pad_len)
......
.. _cn_api_fluid_layers_inplace_abn:
inplace_abn
-------------------------------
**注意:该API仅支持【静态图】模式**
.. py:function:: paddle.fluid.layers.inplace_abn(input, act=None, is_test=False, momentum=0.9, epsilon=1e-05, param_attr=None, bias_attr=None, data_layout='NCHW', name=None, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False, use_global_stats=False, act_alpha=1.0)
就地批正则化化激活层(Inplace Activation Batch Normalization Layer)
此层使用就地内存计算批处理正则化和激活来实现节省内存,有关批量正则化计算,请参见 ``fluid.layers.batch_norm`` ,有关就地激活批正则化化的计算,请参考 `In-Place Activated BatchNorm for Memory-Optimized Training of DNNs <https://arxiv.org/abs/1712.02616>`_。
参数:
- **input** (Variable) - inplace_abn算子的输入特征,是一个Variable类型,输入维度可以是 2, 3, 4, 5。数据类型:flaot16, float32, float64。
- **act** (string)- 激活函数类型,可以是leaky_realu、relu、prelu等。默认:None。
- **is_test** (bool) - 指示它是否在测试阶段,非训练阶段使用训练过程中统计到的全局均值和全局方差。默认:False。
- **momentum** (float|Variable)- 此值用于计算 moving_mean 和 moving_var,是一个float类型或者一个shape为[1],数据类型为float32的Variable类型。更新公式为: :math:`moving\_mean = moving\_mean * momentum + new\_mean * (1. - momentum)` , :math:`moving\_var = moving\_var * momentum + new\_var * (1. - momentum)` , 默认:0.9。
- **epsilon** (float)- 加在分母上为了数值稳定的值。默认:1e-5。
- **param_attr** (ParamAttr|None) :指定权重参数属性的对象。默认值为None,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。inplace_abn算子默认的权重初始化是1.0。
- **bias_attr** (ParamAttr|None)- 指定偏置参数属性的对象。默认值为None,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。inplace_abn算子默认的偏置初始化是0.0。
- **data_layout** (string) - 指定输入的数据格式,输出的数据格式将与输入保持一致,可以是"NCHW"和"NHWC"。N是批尺寸,C是通道数,H是特征高度,W是特征宽度。默认值:"NCHW"。
- **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ,一般无需设置,默认值为None。
- **moving_mean_name** (string)- moving_mean的名称,存储全局均值。如果将其设置为None, ``inplace_abn`` 将随机命名全局均值;否则, ``inplace_abn`` 将命名全局均值为 ``moving_mean_name`` 。默认:None。
- **moving_variance_name** (string)- moving_variance的名称,存储全局变量。如果将其设置为None, ``inplace_abn`` 将随机命名全局方差;否则, ``inplace_abn`` 将命名全局方差为 ``moving_variance_name`` 。默认:None。
- **do_model_average_for_mean_and_var** (bool,默认False)- 是否为mean和variance做模型均值。
- **use_global_stats** (bool) – 是否使用全局均值和方差。 在预测或测试模式下,将use_global_stats设置为true或将is_test设置为true,并且行为是等效的。 在训练模式中,当设置use_global_stats为True时,在训练期间也使用全局均值和方差。默认:False。
- **act_alpha** (float) – 当 ``act`` 参数为None、leaky-relu、elu时,会使用就地批正则化激活算法,可通过此参数给定leaky-relu、elu的 ``alpha`` 值。默认:1.0。
返回: 维度和输入相同的Tensor,在输入中运用批正则后的结果。
返回类型:Variable
**代码示例**:
.. code-block:: python
import paddle.fluid as fluid
x = fluid.data(name='x', shape=[3, 7, 3, 7], dtype='float32')
hidden1 = fluid.layers.fc(input=x, size=200, param_attr='fc1.w')
hidden2 = fluid.layers.inplace_abn(input=hidden1)
hidden3 = fluid.layers.inplace_abn(input=hidden2, act='leaky_relu', act_alpha=0.2)
......@@ -57,7 +57,7 @@ lstm
返回: 经过lstm运算输出的三个Tensor的tuple,包括
- rnn_out:LSTM hidden的输出结果的Tensor,数据类型与input一致,维度为 :math:`[seq\_len, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True,则维度为 :math:`[seq\_len, batch\_size, hidden\_size*2]`
- rnn_out:LSTM hidden的输出结果的Tensor,数据类型与input一致,维度为 :math:`[batch\_size, seq\_len, hidden\_size]` 。如果 ``is_bidirec`` 设置为True,则维度为 :math:`[batch\_size, seq\_len, hidden\_size*2]`
- last_h:LSTM最后一步的hidden状态的Tensor,数据类型与input一致,维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True,则维度为 :math:`[num\_layers*2, batch\_size, hidden\_size]`
- last_c:LSTM最后一步的cell状态的Tensor,数据类型与input一致,维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True,则维度为 :math:`[num\_layers*2, batch\_size, hidden\_size]`
......@@ -73,12 +73,11 @@ lstm
emb_dim = 256
vocab_size = 10000
data = fluid.layers.data(name='x', shape=[-1, 100, 1],
dtype='int32')
dtype='int64')
emb = fluid.layers.embedding(input=data, size=[vocab_size, emb_dim], is_sparse=True)
batch_size = 20
max_len = 100
dropout_prob = 0.2
seq_len = 100
hidden_size = 150
num_layers = 1
init_h = layers.fill_constant( [num_layers, batch_size, hidden_size], 'float32', 0.0 )
......@@ -87,7 +86,7 @@ lstm
rnn_out, last_h, last_c = layers.lstm(emb, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=dropout_prob)
rnn_out.shape # (-1, 100, 150)
last_h.shape # (1, 20, 150)
layt_c.shape # (1, 20, 150)
last_c.shape # (1, 20, 150)
......
......@@ -3,7 +3,7 @@
noam_decay
-------------------------------
.. py:function:: paddle.fluid.layers.noam_decay(d_model,warmup_steps)
.. py:function:: paddle.fluid.layers.noam_decay(d_model, warmup_steps)
Noam衰减方法
......@@ -14,11 +14,12 @@ noam衰减的numpy实现如下:
import paddle.fluid as fluid
import numpy as np
# 设置超参数
base_lr = 0.01
d_model = 2
current_steps = 20
warmup_steps = 200
# 计算
lr_value = np.power(d_model, -0.5) * np.min([
lr_value = base_lr * np.power(d_model, -0.5) * np.min([
np.power(current_steps, -0.5),
np.power(warmup_steps, -1.5) * current_steps])
......@@ -27,6 +28,7 @@ noam衰减的numpy实现如下:
参数:
- **d_model** (Variable|int) - 模型的输入、输出向量特征维度。类型可设置为标量Tensor,或int值。
- **warmup_steps** (Variable|int) - 预热步数,类型可设置为标量Tensor,或int值。
- **learning_rate** (Variable|float|int,可选) - 初始学习率。如果类型为Variable,则为shape为[1]的Tensor,数据类型为float32或float64;也可以是python的int类型。默认值为1.0。
返回:衰减的学习率
......@@ -41,7 +43,8 @@ noam衰减的numpy实现如下:
learning_rate = 0.01
lr = fluid.layers.learning_rate_scheduler.noam_decay(
1/(warmup_steps *(learning_rate ** 2)),
warmup_steps)
warmup_steps,
learning_rate)
......
......@@ -19,36 +19,34 @@ pad2d
返回类型:Variable
示例
**示例**
.. code-block:: text
假设X是输入图像:
Input = [[[[1., 2., 3.],
[4., 5., 6.]]]]
X = [[1, 2, 3],
[4, 5, 6]]
Case 0:
paddings = [0, 1, 2, 3],
mode = 'constant'
pad_value = 0
Out = [[[[0., 0., 1., 2., 3., 0., 0., 0.],
[0., 0., 4., 5., 6., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]]]]
Case 0:
paddings = [0, 1, 2, 3],
mode = 'constant'
pad_value = 0
Out = [[0, 0, 1, 2, 3, 0, 0, 0]
[0, 0, 4, 5, 6, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]]
Case 1:
paddings = [0, 1, 2, 1],
mode = 'reflect'
Out = [[[[3., 2., 1., 2., 3., 2.],
[6., 5., 4., 5., 6., 5.],
[3., 2., 1., 2., 3., 2.]]]]
Case 1:
paddings = [0, 1, 2, 1],
mode = 'reflect'
Out = [[3, 2, 1, 2, 3, 2]
[6, 5, 4, 5, 6, 5]
[3, 2, 1, 2, 3, 2]]
Case 2:
paddings = [0, 1, 2, 1],
mode = 'edge'
Out = [[1, 1, 1, 2, 3, 3]
[4, 4, 4, 5, 6, 6]
[4, 4, 4, 5, 6, 6]]
Case 2:
paddings = [0, 1, 2, 1],
mode = 'edge'
Out = [[[[1., 1., 1., 2., 3., 3.],
[4., 4., 4., 5., 6., 6.],
[4., 4., 4., 5., 6., 6.]]]]
......@@ -56,8 +54,6 @@ pad2d
.. code-block:: python
import paddle.fluid as fluid
data = fluid.layers.data(name='data', shape=[3, 32, 32], dtype='float32')
result = fluid.layers.pad2d(input=data, paddings=[1,2,3,4], mode='reflect')
import paddle.fluid as fluid
data = fluid.data(name='data', shape=[None, 3, 32, 32], dtype='float32')
result = fluid.layers.pad2d(input=data, paddings=[0, 1, 2, 3], mode='reflect')
......@@ -8,23 +8,21 @@ pad
该OP在Tensor上填充一个由 ``pad_value`` 给出的常数值,填充宽度由 ``paddings`` 指定。
其中,维度 ``i`` 中 ``x`` 内容前填充的值个数用 ``paddings[2*i]`` 表示,维度 ``i`` 中 ``x`` 内容后填充的值个数用 ``paddings[2*i+1]`` 表示。
**例**:
**例**:
::
.. code-block:: text
Given:
x = [[1, 2], [3, 4]]
x = [[1, 2], [3, 4]]
paddings = [0, 1, 1, 2]
paddings = [0, 1, 1, 2]
pad_value = 0
pad_value = 0
Return:
out = [[0, 1, 2, 0, 0]
[0, 3, 4, 0, 0]
[0, 0, 0, 0, 0]]
out = [[0, 1, 2, 0, 0]
[0, 3, 4, 0, 0]
[0, 0, 0, 0, 0]]
参数:
......@@ -44,15 +42,7 @@ pad
# x 为一个秩为2的张量
import paddle.fluid as fluid
x = fluid.layers.data(name='data', shape=[224], dtype='float32')
x = fluid.data(name='data', shape=[300, 300], dtype='float32')
out = fluid.layers.pad(x=x, paddings=[0, 1, 1, 2], pad_value=0.)
......@@ -7,9 +7,9 @@ pad_constant_like
该OP使用 ``pad_value`` 填充 ``y`` ,填充到每个维度值的数量由x和y的形状而指定,((0,x.shape[0] - y.shape[0]), ..., (0, x.shape[i] - y.shape[i]), ..., (0, x.shape[n] - y.shape[n]))是每个维度填充的宽度,对于维度i,填充宽度 ``(0, x.shape[i] - y.shape[i])`` ,表示在y的第i维开头不填充,而在末尾填充 ``x.shape[i] - y.shape[i]`` 个位置。该OP要求y与x具有相同的秩,并且对每个维度i, ``y.shape[i] <= x.shape[i]`` 。
**样例**
**示例**:
::
.. code-block:: text
Given:
X = [[[[ 0, 1, 2],
......@@ -24,30 +24,34 @@ pad_constant_like
[27, 28, 29]],
[[30, 31, 32],
[33, 34, 35]]]]
X.shape = (2, 3, 2, 3)
Y = [[[[35, 36, 37]],
[[38, 39, 40]],
[[41, 42, 43]]]]
Y.shape = (1, 3, 1, 3)
and
And
pad_value = 0.
Output is:
out = [[[[35, 36, 37],
[0, 0, 0]],
Return:
Out = [[[[35, 36, 37],
[ 0, 0, 0]],
[[38, 39, 40],
[0, 0, 0]],
[ 0, 0, 0]],
[[41, 42, 43],
[0, 0, 0]]],
[[[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0]]]]
out.shape = [2, 3, 2, 3]
[ 0, 0, 0]]],
[[[ 0, 0, 0],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 0]]]]
Out.shape = [2, 3, 2, 3]
参数:
- **x** (Variable)- 多维Tensor
......@@ -66,8 +70,8 @@ pad_constant_like
# x是秩为4的tensor, x.shape = (2, 3, 2, 3)
# y是秩为4的tensor, y.shape = (1, 3, 1, 3)
import paddle.fluid as fluid
x = fluid.layers.data(name='x', shape=[2,3,2,3], dtype='float32')
y = fluid.layers.data(name='y', shape=[1,3,1,3], dtype='float32')
x = fluid.data(name='x', shape=[2,3,2,3], dtype='float32')
y = fluid.data(name='y', shape=[1,3,1,3], dtype='float32')
out = fluid.layers.pad_constant_like(x=x, y=y, pad_value=0.)
# out是秩为4的tensor, out.shape = [2, 3 ,2 , 3]
......
......@@ -54,10 +54,10 @@ reshape
# example 1:
# attr shape is a list which doesn't contain tensor Variable.
data_1 = fluid.layers.data(
name='data_1', shape=[2, 4, 6], dtype='float32')
data_1 = fluid.data(
name='data_1', shape=[2, 4, 6], dtype='float32')
reshaped_1 = fluid.layers.reshape(
x=data_1, shape=[-1, 0, 3, 2], inplace=True)
x=data_1, shape=[-1, 0, 3, 2], inplace=True)
# the shape of reshaped_1 is [2,4,3,2].
# example 2:
......@@ -69,7 +69,7 @@ reshape
# example 3:
data_3 = fluid.data(
name="data_3", shape=[2,4,6], dtype='float32')
name="data_3", shape=[2,4,6], dtype='float32')
reshaped_3 = fluid.layers.reshape(x=data_3, shape=[6,8])
# the shape of reshaped_3 is [6,8].
......
......@@ -34,27 +34,27 @@ retinanet_detection_output
import paddle.fluid as fluid
bboxes_low = fluid.data(name='bboxes_low', shape=[1, 44, 4],
dtype='float32')
bboxes_high = fluid.data(name='bboxes_high', shape=[1, 11, 4],
dtype='float32')
scores_low = fluid.data(name='scores_low', shape=[1, 44, 10],
dtype='float32')
scores_high = fluid.data(name='scores_high', shape=[1, 11, 10],
dtype='float32')
anchors_low = fluid.data(name='anchors_low', shape=[44, 4],
dtype='float32')
anchors_high = fluid.data(name='anchors_high', shape=[11, 4],
dtype='float32')
im_info = fluid.data(name="im_info", shape=[1, 3],
dtype='float32')
bboxes_low = fluid.data(
name='bboxes_low', shape=[1, 44, 4], dtype='float32')
bboxes_high = fluid.data(
name='bboxes_high', shape=[1, 11, 4], dtype='float32')
scores_low = fluid.data(
name='scores_low', shape=[1, 44, 10], dtype='float32')
scores_high = fluid.data(
name='scores_high', shape=[1, 11, 10], dtype='float32')
anchors_low = fluid.data(
name='anchors_low', shape=[44, 4], dtype='float32')
anchors_high = fluid.data(
name='anchors_high', shape=[11, 4], dtype='float32')
im_info = fluid.data(
name="im_info", shape=[1, 3], dtype='float32')
nmsed_outs = fluid.layers.retinanet_detection_output(
bboxes=[bboxes_low, bboxes_high],
scores=[scores_low, scores_high],
anchors=[anchors_low, anchors_high],
im_info=im_info,
score_threshold=0.05,
nms_top_k=1000,
keep_top_k=100,
nms_threshold=0.45,
nms_eta=1.)
bboxes=[bboxes_low, bboxes_high],
scores=[scores_low, scores_high],
anchors=[anchors_low, anchors_high],
im_info=im_info,
score_threshold=0.05,
nms_top_k=1000,
keep_top_k=100,
nms_threshold=0.45,
nms_eta=1.0)
......@@ -50,7 +50,6 @@ retinanet_target_assign
.. code-block:: python
import paddle.fluid as fluid
import numpy as np
bbox_pred = fluid.data(name='bbox_pred', shape=[1, 100, 4],
dtype='float32')
......
......@@ -3,7 +3,7 @@
sigmoid_focal_loss
-------------------------------
.. py:function:: paddle.fluid.layers.sigmoid_focal_loss(x, label, fg_num, gamma=2, alpha=0.25)
.. py:function:: paddle.fluid.layers.sigmoid_focal_loss(x, label, fg_num, gamma=2.0, alpha=0.25)
`Focal Loss <https://arxiv.org/abs/1708.02002>`_ 被提出用于解决计算机视觉任务中前景-背景不平衡的问题。该OP先计算输入x中每个元素的sigmoid值,然后计算sigmoid值与类别目标值label之间的Focal Loss。
......@@ -49,5 +49,5 @@ Focal Loss的计算过程如下:
loss = fluid.layers.sigmoid_focal_loss(x=input,
label=label,
fg_num=fg_num,
gamma=2.,
gamma=2.0,
alpha=0.25)
......@@ -43,31 +43,12 @@ scaled_dot_product_attention
.. code-block:: python
import paddle.fluid as fluid
queries = fluid.layers.data(name="queries",
shape=[3, 5, 9],
dtype="float32",
append_batch_size=False)
queries.stop_gradient = False
keys = fluid.layers.data(name="keys",
shape=[3, 6, 9],
dtype="float32",
append_batch_size=False)
keys.stop_gradient = False
values = fluid.layers.data(name="values",
shape=[3, 6, 10],
dtype="float32",
append_batch_size=False)
values.stop_gradient = False
contexts = fluid.nets.scaled_dot_product_attention(queries, keys, values)
contexts.shape # [3, 5, 10]
import paddle.fluid as fluid
queries = fluid.data(name="queries", shape=[3, 5, 9], dtype="float32")
keys = fluid.data(name="keys", shape=[3, 6, 9], dtype="float32")
values = fluid.data(name="values", shape=[3, 6, 10], dtype="float32")
contexts = fluid.nets.scaled_dot_product_attention(queries, keys, values)
contexts.shape # [3, 5, 10]
=======================
paddle.nn
=======================
.. toctree::
:maxdepth: 1
nn_cn/Conv1D.rst
nn_cn/Conv2D.rst
nn_cn/diag_embed.rst
nn_cn/interpolate.rst
nn_cn/Linear.rst
nn_cn/log_softmax.rst
nn_cn/ReLU.rst
nn_cn/Upsample.rst
nn_cn/activation_cn.rst
nn_cn/loss_cn.rst
=======================
activation
=======================
.. toctree::
:maxdepth: 1
activation_cn/Sigmoid.rst
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册