提交 39e1c742 编写于 作者: M Mr.Lee 提交者: Hao Wang

upload memory_optimize_en.rst and nets_en.rst (#628)

* upload memory_optimize_en.rst

* Review

* fix jargon

* Fix memory ==> video memory
上级 02321682
.. _api_guide_memory_optimize_en:
#########################
Video Memory Optimization
#########################
**This guide is for GPU training.**
Video Memory optimization is to reduce the video memory consumption of :code:`Program` during execution by analyzing and reusing the video memory occupied by :code:`Variable` in :code:`Program`. Users can use the :code:`memory_optimize` interface to perform video memory optimization through Python scripts. The execution strategy of video memory optimization is as follows:
- Firstly, analyze the remaining lifetime of :code:`Variable` according to the relationship between :code:`Operator` in :code:`Program` to get the remaining lifetime of each :code:`Variable`;
- Secondly, according to the remaining lifetime of each :code:`Variable`, the future :code:`Variable` will reuse the video memory which is used by the :code:`Variable` that approches the end of its remaining lifetime or ceases to exist.
.. code-block:: python
z = fluid.layers.sum([x, y])
m = fluid.layers.matmul(y, z)
In this example, the lifetime of :code:`x` lasts until :code:`fluid.layers.sum`, so its video memory can be reused by :code:`m`.
Disable video memory optimization for specific parts
=======================================================
:code:`memory_optimize` supports disabling video memory optimization for specific sections. You can specify the :code:`Variable` whose video memory space is not going to be reused by passing in a collection of :code:`Variable` names;
At the same time, :code:`memory_optimize` disables video memory optimization for the backward part of the network, and the user can enable this function by passing in the :code:`skip_grads` parameter.
.. code-block:: python
fluid.memory_optimize(fluid.default_main_program(),
skip_opt_set=("fc"), skip_grads=True)
In this example, the :code:`fluid.memory_optimize` interface performs analysis of remaining lifetime of :code:`Variable` for the default :code:`Program` , and skips the :code:`Variable` with the name :code:`fc` and all the :code:`Variable` in the backward part of the network .
This part of the :code:`Variable` video memory will not be used again by other :code:`Variable`.
Specify the video memory optimization level
==============================================
:code:`memory_optimize` supports printing information for video memory reusing to facilitate debugging. Users can enable debugging video memory reusing by specifying :code:`print_log=True`;
:code:`memory_optimize` supports two levels of video memory optimization, namely :code:`0` or :code:`1` :
- When the optimization level is :code:`0`: After :code:`memory_optimize` analyzes the remaining lifetime of :code:`Variable`, it will judge the :code:`shape` of :code:`Variable` . Memory reusing can only happens to the :code:`Variable` with the same :code:`shape`;
- When the optimization level is :code:`1`: the :code:`memory_optimize` will perform video memory reusing as much as possible. After analyzing the remaining survival time of :code:`Variable`, even with different :code:`shape`, the :code:`Variable` will also perform the maximum amount of video memory reusing.
.. code-block:: python
fluid.memory_optimize(fluid.default_main_program(),
level=0, print_log=True)
In this example, the :code:`fluid.memory_optimize` interface performs analysis of remaining lifetime of :code:`Variable` for the default :code:`Program` . Only when the :code:`shape` is exactly the same, will the :code:`Variable` enjoy video memory reusing. After the analysis is finished, all the debugging information related to video memory reusing will be printed out.
.. _api_guide_nets_en:
################
Complex Networks
################
When dealing with complex functions, we usually need to code a lot to build a complex `Neural Network <https://en.wikipedia.org/wiki/Artificial_neural_network>`_ .
Therefore, in order to make it easier for users to build complex network models, we provide some common basic function modules to simplify the user's code and reduce development costs.
These modules are usually composed of fine-grained functions combined based on certain logics. For implementation, please refer to `nets <https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/nets.py>`_ .
1.simple_img_conv_pool
----------------------
:code:`simple_img_conv_pool` is got by concatenating :ref:`api_fluid_layers_conv2d` with :ref:`api_fluid_layers_pool2d` .
This module is widely used in image classification models, such as the `MNIST <https://en.wikipedia.org/wiki/MNIST_database>`_ number classification.
For API Reference, please refer to :ref:`api_fluid_nets_simple_img_conv_pool`
2.img_conv_group
----------------
:code:`img_conv_group` is composed of :ref:`api_fluid_layers_conv2d` , :ref:`api_fluid_layers_batch_norm`, :ref:`api_fluid_layers_dropout` and :ref:`api_fluid_layers_pool2d`.
This module can implement the combination of multiple :ref:`api_fluid_layers_conv2d` , :ref:`api_fluid_layers_batch_norm` , :ref:`api_fluid_layers_dropout` and a single :ref:`api_fluid_layers_pool2d`.
Among them, the number of :ref:`api_fluid_layers_conv2d` , :ref:`api_fluid_layers_batch_norm` and :ref:`api_fluid_layers_dropout` can be controlled separately, resulting in various combinations.
This module is widely used in more complex image classification tasks, such as `VGG <https://arxiv.org/pdf/1409.1556.pdf>`_.
For API Reference, please refer to :ref:`api_fluid_nets_img_conv_group`
3.sequence_conv_pool
--------------------
:code:`sequence_conv_pool` is got by concatenating :ref:`api_fluid_layers_sequence_conv` with :ref:`api_fluid_layers_sequence_pool`.
The module is widely used in the field of `natural language processing <https://en.wikipedia.org/wiki/Natural_language_processing>`_ and `speech recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_ . Models such as the `text classification model <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/text_classification/nets.py>`_ ,
`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/tagspace/train.py>`_ and `Multi view Simnet <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/multiview_simnet/nets.py>`_.
For API Reference, please refer to :ref:`api_fluid_nets_sequence_conv_pool`
4.glu
-----
The full name of :code:`glu` is Gated Linear Units, which originates from the paper `Language Modeling with Gated Convolutional Networks <https://arxiv.org/pdf/1612.08083.pdf>`_ . It consists of :ref:`api_fluid_layers_split` , :ref:`api_fluid_layers_sigmoid` and :ref:`api_fluid_layers_elementwise_mul`.
It divides the input data into 2 equal parts, calculates the `Sigmoid <https://en.wikipedia.org/wiki/Sigmoid_function>`_ of second part, and then performs dot product of the sigmoid vlaue with the first part to get the output.
For API Reference, please refer to :ref:`api_fluid_nets_glu`
5.scaled_dot_product_attention
------------------------------
:code:`scaled_dot_product_attention` originates from the paper `Attention Is All You Need <https://arxiv.org/pdf/1706.03762.pdf>`_ , mainly composed of :ref:`api_fluid_layers_fc` and :ref:`api_fluid_layers_softmax` .
For the input data :code:`Queries` , :code:`Key` and :code:`Values`, calculate the :code:`Attention` according to the following formula.
.. math::
Attention(Q, K, V)= softmax(QK^\mathrm{T})V
This module is widely used in the model of `machine translation <https://en.wikipedia.org/wiki/Machine_translation>`_, such as `Transformer <https://github.com/PaddlePaddle/models/tree/develop/Fluid/PaddleNLP/neural_machine_translation/transformer>`_ .
For API Reference, please refer to :ref:`api_fluid_nets_scaled_dot_product_attention`
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册