upload memory_optimize_en.rst and nets_en.rst (#628)

* upload memory_optimize_en.rst * Review * fix jargon * Fix memory ==> video memory

upload memory_optimize_en.rst and nets_en.rst (#628)
* upload memory_optimize_en.rst * Review * fix jargon * Fix memory ==> video memory
39e1c742 · Mr.Lee · Hao Wang · 02321682 · 39e1c742 · 39e1c742
Showing with 109 addition and 0 deletion

doc/fluid/api_guides/low_level/memory_optimize_en.rst doc/fluid/api_guides/low_level/memory_optimize_en.rst +50 -0

doc/fluid/api_guides/low_level/nets_en.rst doc/fluid/api_guides/low_level/nets_en.rst +59 -0

未找到文件。
--- a/doc/fluid/api_guides/low_level/memory_optimize_en.rst
+++ b/doc/fluid/api_guides/low_level/memory_optimize_en.rst
+.. _api_guide_memory_optimize_en:
+
+#########################
+Video Memory Optimization
+#########################
+
+**This guide is for GPU training.**
+
+Video Memory optimization is to reduce the video memory consumption of :code:`Program` during execution by analyzing and reusing the video memory occupied by :code:`Variable` in :code:`Program`. Users can use the :code:`memory_optimize` interface to perform video memory optimization through Python scripts. The execution strategy of video memory optimization is as follows:
+
+- Firstly, analyze the remaining lifetime of :code:`Variable` according to the relationship between :code:`Operator` in :code:`Program` to get the remaining lifetime of each :code:`Variable`;
+- Secondly, according to the remaining lifetime of each :code:`Variable`, the future :code:`Variable` will reuse the video memory which is used by the :code:`Variable` that approches the end of its remaining lifetime or ceases to exist.
+
+.. code-block:: python
+
+	z = fluid.layers.sum([x, y])
+	m = fluid.layers.matmul(y, z)
+
+In this example, the lifetime of :code:`x` lasts until :code:`fluid.layers.sum`, so its video memory can be reused by :code:`m`.
+
+Disable video memory optimization for specific parts
+=======================================================
+
+:code:`memory_optimize` supports disabling video memory optimization for specific sections. You can specify the :code:`Variable` whose video memory space is not going to be reused by passing in a collection of :code:`Variable` names;
+At the same time, :code:`memory_optimize` disables video memory optimization for the backward part of the network, and the user can enable this function by passing in the :code:`skip_grads` parameter.
+
+.. code-block:: python
+
+	fluid.memory_optimize(fluid.default_main_program(),
+		skip_opt_set=("fc"), skip_grads=True)
+
+In this example, the :code:`fluid.memory_optimize` interface performs analysis of remaining lifetime of :code:`Variable` for the default :code:`Program`   , and skips the :code:`Variable` with the name :code:`fc` and all the :code:`Variable` in the backward part of the network .
+This part of the :code:`Variable` video memory will not be used again by other :code:`Variable`.
+
+Specify the video memory optimization level
+==============================================
+
+:code:`memory_optimize` supports printing information for video memory reusing to facilitate debugging. Users can enable debugging video memory reusing by specifying :code:`print_log=True`;
+
+:code:`memory_optimize` supports two levels of video memory optimization, namely :code:`0` or :code:`1` :
+
+- When the optimization level is :code:`0`: After :code:`memory_optimize` analyzes the remaining lifetime of :code:`Variable`, it will judge the :code:`shape` of :code:`Variable` . Memory reusing can only happens to the :code:`Variable` with the same :code:`shape`;
+- When the optimization level is :code:`1`: the :code:`memory_optimize` will perform video memory reusing as much as possible. After analyzing the remaining survival time of :code:`Variable`, even with different :code:`shape`, the  :code:`Variable` will also perform the maximum amount of video memory reusing.
+
+.. code-block:: python
+
+	fluid.memory_optimize(fluid.default_main_program(),
+		level=0, print_log=True)
+
+In this example, the :code:`fluid.memory_optimize` interface performs analysis of remaining lifetime of :code:`Variable` for the default :code:`Program`   . Only when the :code:`shape` is exactly the same, will the :code:`Variable` enjoy video memory reusing. After the analysis is finished, all the debugging information related to video memory reusing will be printed out.
--- a/doc/fluid/api_guides/low_level/nets_en.rst
+++ b/doc/fluid/api_guides/low_level/nets_en.rst
+.. _api_guide_nets_en:
+
+################
+Complex Networks
+################
+
+When dealing with complex functions, we usually need to code a lot to build a complex `Neural Network <https://en.wikipedia.org/wiki/Artificial_neural_network>`_ .
+Therefore, in order to make it easier for users to build complex network models, we provide some common basic function modules to simplify the user's code and reduce development costs.
+These modules are usually composed of fine-grained functions combined based on certain logics. For implementation, please refer to `nets <https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/nets.py>`_ .
+
+1.simple_img_conv_pool
+----------------------
+
+:code:`simple_img_conv_pool` is got by concatenating :ref:`api_fluid_layers_conv2d` with :ref:`api_fluid_layers_pool2d` .
+This module is widely used in image classification models, such as the `MNIST <https://en.wikipedia.org/wiki/MNIST_database>`_ number classification.
+
+For API Reference, please refer to :ref:`api_fluid_nets_simple_img_conv_pool`
+
+
+2.img_conv_group
+----------------
+
+:code:`img_conv_group` is composed of :ref:`api_fluid_layers_conv2d` , :ref:`api_fluid_layers_batch_norm`, :ref:`api_fluid_layers_dropout` and :ref:`api_fluid_layers_pool2d`.
+This module can implement the combination of multiple :ref:`api_fluid_layers_conv2d` , :ref:`api_fluid_layers_batch_norm` , :ref:`api_fluid_layers_dropout` and a single :ref:`api_fluid_layers_pool2d`.
+Among them, the number of :ref:`api_fluid_layers_conv2d` , :ref:`api_fluid_layers_batch_norm` and :ref:`api_fluid_layers_dropout` can be controlled separately, resulting in various combinations.
+This module is widely used in more complex image classification tasks, such as `VGG <https://arxiv.org/pdf/1409.1556.pdf>`_.
+
+For API Reference, please refer to :ref:`api_fluid_nets_img_conv_group`
+
+
+3.sequence_conv_pool
+--------------------
+
+:code:`sequence_conv_pool` is got by concatenating :ref:`api_fluid_layers_sequence_conv` with :ref:`api_fluid_layers_sequence_pool`.
+The module is widely used in the field of `natural language processing <https://en.wikipedia.org/wiki/Natural_language_processing>`_ and `speech recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_ .  Models such as the `text classification model <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/text_classification/nets.py>`_ ,
+`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/tagspace/train.py>`_ and `Multi view Simnet <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/multiview_simnet/nets.py>`_.
+
+For API Reference, please refer to :ref:`api_fluid_nets_sequence_conv_pool`
+
+
+4.glu
+-----
+The full name of :code:`glu` is Gated Linear Units, which originates from the paper `Language Modeling with Gated Convolutional Networks <https://arxiv.org/pdf/1612.08083.pdf>`_ . It consists of :ref:`api_fluid_layers_split` , :ref:`api_fluid_layers_sigmoid` and :ref:`api_fluid_layers_elementwise_mul`.
+It divides the input data into 2 equal parts, calculates the `Sigmoid <https://en.wikipedia.org/wiki/Sigmoid_function>`_ of second part, and then performs dot product of the sigmoid vlaue with the first part to get the output.
+
+For API Reference, please refer to :ref:`api_fluid_nets_glu`
+
+
+5.scaled_dot_product_attention
+------------------------------
+:code:`scaled_dot_product_attention` originates from the paper `Attention Is All You Need <https://arxiv.org/pdf/1706.03762.pdf>`_ , mainly composed of :ref:`api_fluid_layers_fc` and :ref:`api_fluid_layers_softmax` .
+For the input data :code:`Queries` , :code:`Key` and :code:`Values`, calculate the :code:`Attention` according to the following formula.
+
+.. math::
+ Attention(Q, K, V)= softmax(QK^\mathrm{T})V
+
+This module is widely used in the model of `machine translation <https://en.wikipedia.org/wiki/Machine_translation>`_, such as `Transformer <https://github.com/PaddlePaddle/models/tree/develop/Fluid/PaddleNLP/neural_machine_translation/transformer>`_ .
+
+For API Reference, please refer to :ref:`api_fluid_nets_scaled_dot_product_attention`