[Dy2stat] Add Basic English Usage Guide (#2569)

We added basic English usage guide. In addition, I thought some Chinese sentences are not good so I modified them. TODO: Yamei suggested to split this file into basic usage and architecture. I will do it in next PRs

[Dy2stat] Add Basic English Usage Guide (#2569)
We added basic English usage guide. In addition, I thought some Chinese sentences are not good so I modified them. TODO: Yamei suggested to split this file into basic usage and architecture. I will do it in next PRs
ed7c4a38 · Huihuang Zheng · GitHub · 5858c87f · ed7c4a38 · ed7c4a38
4 changed file
--- a/doc/fluid/advanced_guide/dygraph_to_static/index_en.rst
+++ b/doc/fluid/advanced_guide/dygraph_to_static/index_en.rst
+#######################
+Dygraph to Static Graph
+#######################
+
+- `Dygraph to Static Graph <program_translator_en.html>`_ ：Introduce the basic usage for transforming dygraph code into static code and the architecture of ProgramTranslator.
+
+..  toctree::
+    :hidden:
+
+    program_translator_en.rst
+
--- a/doc/fluid/advanced_guide/dygraph_to_static/program_translator_cn.rst
+++ b/doc/fluid/advanced_guide/dygraph_to_static/program_translator_cn.rst
 动态图转静态图
 ================

-PaddlePadde的动态图具有接口易用、Python风格的编程体验、友好的debug交互等优点。在动态图模式下，代码是按照我们编写的顺序依次执行。这种机制更符合python程序员的习惯，可以很方便地将大脑中的想法快速地转化为实际代码，也更容易调试。但在部分性能方面，python速度负担太大，无法与C++相提并论。因此在工业界部署很多地方（如大型推荐系统、移动端）都倾向于直接使用C++来提速。
+动态图有诸多优点，包括易用的接口，python风格的编程体验，友好的debug交互机制等。在动态图模式下，代码是按照我们编写的顺序依次执行。这种机制更符合Python程序员的习惯，可以很方便地将大脑中的想法快速地转化为实际代码，也更容易调试。但在性能方面，Python执行开销较大，与C++有一定差距。因此在工业界的许多部署场景中（如大型推荐系统、移动端）都倾向于直接使用C++来提速。

-此时，静态图在部署方面更具有性能的优势。静态图程序在编译执行时，先搭建模型的神经网络结构，然后再对神经网络执行计算操作。预先搭建好的神经网络可以脱离python依赖，在C++端被重新解析执行。
+相比动态图，静态图在部署方面更具有性能的优势。静态图程序在编译执行时，先搭建模型的神经网络结构，然后再对神经网络执行计算操作。预先搭建好的神经网络可以脱离Python依赖，在C++端被重新解析执行，而且拥有整体网络结构也能进行一些网络结构的优化。

 动态图代码更易编写和debug，但在部署性能上，静态图更具优势。因此我们新增了动态图转静态图的功能，支持用户依然使用动态图编写组网代码。PaddlePaddle会对用户代码进行分析，自动转换为静态图网络结构，兼顾了动态图易用性和静态图部署性能两方面优势。

 基本使用方法
 --------------

-PaddlePaddle提供了两种动态图转静态图的方式，基于动态图trace的转换与基于源代码级别的转换的ProgramTranslator。
+PaddlePaddle提供了两种动态图转静态图的方式，基于动态图trace的TracedLayer与基于源代码级别转换的ProgramTranslator。

 1. 基于trace的TracedLayer：

@@ -88,7 +88,7 @@ trace是指在模型运行时记录下其运行过哪些算子。TracedLayer就

 2. 基于源代码转写的ProgramTranslator

-对于依赖数据的控制流，我们使用基于源代码转写的ProgramTranslator来进行动态图转静态图。其基本原理是通过分析python代码来将动态图代码转写为静态图代码，并在底层自动帮用户使用执行器运行。其基本使用方法十分简便，只需要在要转化的函数（该函数也可以是用户自定义动态图Layer的forward函数）前添加一个装饰器@paddle.jit.to_static，上面的例子转化如下，并且可以依旧使用该函数运行得到结果：
+对于依赖数据的控制流，我们使用基于源代码转写的ProgramTranslator来进行动态图转静态图。其基本原理是通过分析Python代码来将动态图代码转写为静态图代码，并在底层自动帮用户使用执行器运行。其基本使用方法十分简便，只需要在要转化的函数（该函数也可以是用户自定义动态图Layer的forward函数）前添加一个装饰器 ``@paddle.jit.to_static`` ，上面的例子转化如下，并且可以依旧使用该函数运行得到结果：

 .. code-block:: python

@@ -108,7 +108,7 @@ trace是指在模型运行时记录下其运行过哪些算子。TracedLayer就
    func(input_var)


-若要存储转化后的静态图模型，可以调用paddle.jit.save，我们再以SimpleFcLayer为例，需要在SimpleFcLayer的forward函数添加装饰器：
+若要存储转化后的静态图模型，可以调用 ``paddle.jit.save`` ，我们再以SimpleFcLayer为例，需要在SimpleFcLayer的forward函数添加装饰器：

 .. code-block:: python

@@ -141,7 +141,7 @@ trace是指在模型运行时记录下其运行过哪些算子。TracedLayer就
    input_var = paddle.to_tensor(in_np)
    out = fc_layer(input_var)

-    paddle.jit.save(mnist, "./mnist_dy2stat", input_spec=[input_var])
+    paddle.jit.save(fc_layer, "./fc_layer_dy2stat", input_spec=[input_var])

 内部架构原理
 --------------
@@ -157,21 +157,21 @@ TracedLayer的原理就是trace，相对简单，因此我们在这里不展开

 2. 动态图源码转AST（抽象语法树）

-动态图转静态图的最核心部分类似一个编译器，解析动态图代码语句为AST，再对应AST进行改写，最后反转回成静态图代码。从函数转化为代码字符串可以使用Python的inspect.getsource。从字符串Python提供了自带的ast库来解析字符串为 `AST <https://docs.python.org/3/library/ast.html>`_ ，但是由于python2，python3的语法略有不同，为了避免我们需要额外处理这些python2，python3的不同情况，我们使用了统一python2，python3的开源AST处理 `gast库 <https://github.com/serge-sans-paille/gast>`_ 。这些接口使得函数转化为AST没有本质上的困难。
+动态图转静态图的最核心部分类似一个编译器，解析动态图代码语句为AST，再对应AST进行改写，最后反转回成静态图代码。从函数转化为代码字符串可以使用Python的inspect.getsource。从字符串Python提供了自带的 `ast <https://docs.python.org/3/library/ast.html>`_ 库来解析字符串为AST，但是由于Python2，Python3的语法略有不同，为了避免我们需要额外处理这些Python2，Python3的不同情况，我们使用了统一Python2，Python3的开源AST处理 `gast库 <https://github.com/serge-sans-paille/gast>`_ 。这些接口使得函数转化为AST没有本质上的困难。

 3. AST改写和静态图源码转换

-这部分为动转静最核心的部分，我们对支持的各种语法进行ast转写。其中最重要的python控制流，if-else，while，for循环被分别分析转化为PaddlePaddle静态图接口cond，while_loop等接口实现。我们对想转化的每一种主要语法创建一个Transformer（这里的Transformer是python ast转写的概念，而不是自然语言处理NLP领域的Transformer），每个Transformer扫一遍AST并进行对应的改写。最后被转化完成的AST我们使用gast提供的接口转回成源码。
+这部分为动转静最核心的部分，我们对支持的各种语法进行ast转写。其中最重要的Python控制流，if-else，while，for循环被分别分析转化为PaddlePaddle静态图接口cond，while_loop等接口实现。我们对想转化的每一种主要语法创建一个Transformer（这里的Transformer是Python ast转写的概念，而不是自然语言处理NLP领域的Transformer），每个Transformer扫一遍AST并进行对应的改写。最后被转化完成的AST我们使用gast提供的接口转回成源码。

 4. 静态图源码作为动态图一部分运行的技术

-为了动静转化更加易用和被转化的代码能在动态图中复用，我们在拥有源码后运行生成Program，并将这个Program作为一个大op，包装成动态图的一个op，这样既能把用户的代码转为静态图提速或者保存部署，另一方面如果用户想在python层使用生成的静态图代码作为动态图的一部分继续训练或者别的动态图运算也是可以直接使用。
+为了动静转化更加易用和被转化的代码能在动态图中复用，我们在拥有源码后运行生成Program，并将这个Program作为一个大op，包装成动态图的一个op，这样既能把用户的代码转为静态图提速或者保存部署，另一方面如果用户想在Python层使用生成的静态图代码作为动态图的一部分继续训练或者别的动态图运算也是可以直接使用。

 5. 易用性与Debug功能在动转静过程的实现

 正如AST转写类似编译器，而一般编译器都会提供debug断点，报错，输出一些中间代码等功能。我们在进行动转静时，万一用户的动态图代码出错，或者用户想断点调试，或者用户想看看被转化后的静态图代码是否符合其预期，我们也希望能够像编译器一样提供这些易用性功能，使得动转静兼顾性能和部署同时还具有易用性。我们这里将列出这些功能的实现方式

-A. 报错对应到动态图代码行。由于被转化后的静态图代码和原动态图代码不同，python运行出错时会报静态图的错误，因此我们在每一次AST转写时添加AST节点对应的原动态图代码行等信息，在python报错栈中将静态图的报错转化成对应的动态图源码报错
+A. 报错对应到动态图代码行。由于被转化后的静态图代码和原动态图代码不同，Python运行出错时会报静态图的错误，因此我们在每一次AST转写时添加AST节点对应的原动态图代码行等信息，在Python报错栈中将静态图的报错转化成对应的动态图源码报错

 B. 设置断点功能。我们保留了被转化后代码的中的pdb.set_trace(), 用户可以使用这种方式进行断点调试


--- a/doc/fluid/advanced_guide/dygraph_to_static/program_translator_en.rst
+++ b/doc/fluid/advanced_guide/dygraph_to_static/program_translator_en.rst
+Dygraph to Static Graph
+=======================
+
+The imperative-style coding of PaddlePaddle takes advantage of flexibility, Pythonic coding, and easy-to-debug interface. In dygraph mode, code immediately executes kernels and gets numerical results, which allows users to enjoy traditional Pythonic code order. Therefore it is efficient to transform idea into real code and simple to debug. However, Python code is usually slower than C++ thus lots of industrial systems (such as large recommend system, mobile devices) prefer to deploy with C++ implementation.
+
+Static graph is better at speed and portability. Static graph builds the network structure during compiling time and then does computation. The built network intermediate representation can be executed in C++ and gets rids of Python dependency.
+
+While dygraph has usability and debug benefits and static graph yields performance and deployment advantage, we adds functionality to convert dygraph to static graph. Users use imperative mode to write dygraph code and PaddlePaddle will analyze the Python syntax and turn it into network structure of static graph mode. Our approach retains both the usability of dygraph and portability of static graph.
+
+Basic Usage
+--------------
+
+PaddlePaddle has two ways to transform dygraph to static graph. TracedLayer extracts computation graph through tracing and ProgramTranslator gets computation graph through source code transformation.
+
+
+1. TracedLayer：
+
+Tracing means recording the operators when running a model. TracedLayer is based on this technique. It runs dygraph program once and records all operators, then constructs static graph model and saves it. Now take a glance at an usage example:
+
+Define a simple fully connected network:
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+
+    class SimpleFcLayer(paddle.nn.Layer):
+        def __init__(self, feature_size, batch_size, fc_size):
+            super(SimpleFCLayer, self).__init__()
+            self._linear = paddle.nn.Linear(feature_size, fc_size)
+            self._offset = paddle.to_tensor(
+                np.random.random((batch_size, fc_size)).astype('float32'))
+
+        def forward(self, x):
+            fc = self._linear(x)
+            return fc + self._offset
+
+Save model by TracedLayer:
+
+.. code-block:: python
+
+    import paddle
+    from paddle.jit import TracedLayer
+
+    paddle.disable_static()
+
+    fc_layer = SimpleFcLayer(3, 4, 2)
+    in_np = np.random.random([3, 4]).astype('float32')
+    # Turn numpy ndarray into Tensor
+    input_var = paddle.to_tensor(in_np)
+    # Transforming imperative mode into declarative mode by TracerLayer.trace
+    out_dygraph, static_layer = TracedLayer.trace(fc_layer, inputs=[input_var])
+    save_dirname = './saved_infer_model'
+    # Save the transformed model
+    static_layer.save_inference_model(save_dirname, feed=[0], fetch=[0])
+
+Load model and run it in static graph mode:
+
+.. code-block:: python
+
+    place = paddle.CPUPlace()
+    exe = paddle.Executor(place)
+    program, feed_vars, fetch_vars = paddle.io.load_inference_model(save_dirname, exe)
+    fetch, = exe.run(program, feed={feed_vars[0]: in_np}, fetch_list=fetch_vars)
+
+However, as tracing only records operators once, if user's code contains Tensor-dependent (including Tensor value or Tensor shape) control flow, that is the Tensor can cause different operators being executed, then TracedLayer cannot handle this case. For instance:
+
+.. code-block:: python
+
+    import paddle
+
+    def func(input_var)
+        # if condition depends on the shape of input_var
+        if input_var.shape[0] > 1:
+            return paddle.cast(input_var, "float64")
+        else:
+            return paddle.cast(input_var, "int64")
+
+    paddle.disable_static()
+    in_np = np.array([-2]).astype('int')
+    input_var = paddle.to_tensor(in_np)
+    out = func(input_var)
+
+If we apply TracedLayer.trace(func, inputs=[input_var]) on above example, tracing can take record of operators in only one branch of if-else, then the model can not be saved as what user orignally means. The similar situations applies to while/for loop.
+
+2. ProgramTranslator
+
+For the Tensor-dependent control flow, we use source-code-translate based ProgramTranslator to convert dygraph into static graph. The basic idea is analyzing Python source code and turning into static graph code, then run the static graph code using Executor. The basic usage of ProgramTranslator is simple, put a decorator ``@paddle.jit.to_static`` before the definition of the function to transform (the function can also be a method of a class, e.g., the ``forward`` function of user-defined imperative Layer). Above Tensor-dependent example can be transformed correctly by ProgramTranslator as below:
+
+.. code-block:: python
+
+    import paddle
+
+    @paddle.jit.to_static
+    def func(input_var)
+        # if condition depends on the shape of input_var
+        if input_var.shape[0] > 1:
+            out = paddle.cast(input_var, "float64")
+        else:
+            out = paddle.cast(input_var, "int64")
+
+    paddle.disable_static()
+    in_np = np.array([-2]).astype('int')
+    input_var = paddle.to_tensor(in_np)
+    func(input_var)
+
+To save the transformed model, we can call ``paddle.jit.save`` . Let's take ``SimpleFcLayer`` as an example again, we put decorator at the ``forward`` method of ``SimpleFcLayer`` :
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+
+    class SimpleFcLayer(paddle.nn.Layer):
+        def __init__(self, feature_size, batch_size, fc_size):
+            super(SimpleFCLayer, self).__init__()
+            self._linear = paddle.nn.Linear(feature_size, fc_size)
+            self._offset = paddle.to_tensor(
+                np.random.random((batch_size, fc_size)).astype('float32'))
+
+        @paddle.jit.to_static
+        def forward(self, x):
+            fc = self._linear(x)
+            return fc + self._offset
+
+
+Calling ``paddle.jit.save`` to save above model:
+
+.. code-block:: python
+
+    import paddle
+
+    paddle.disable_static()
+
+    fc_layer = SimpleFcLayer(3, 4, 2)
+    in_np = np.random.random([3, 4]).astype('float32')
+    input_var = paddle.to_tensor(in_np)
+    out = fc_layer(input_var)
+
+    paddle.jit.save(fc_layer, "./fc_layer_dy2stat")
+
+
+Architecture
+--------------
+
+The basic idea of TracedLayer is tracing, it is relatively simple so we won't expend here. This section will talk about the source code transformation of ProgramTranslator.
+
+The transformation is implemented in the decorator so transformation happens when user calls the decorated function, the procedure includes these steps:
+
+1. Function and cache.
+
+The entity for transforming dygraph to static graph is the decorated function. For the PaddlePaddle APIs in the function, since they are same code under dygraph mode and static mode, we don't have to transform those code. However, those APIs are computation in dygraph model while they are building network in static graph mode, if the transformed functions are called multiple times, those APIs will build network multiple times in static graph, which can cause problem. To solve it as well as speed up the transformation, we maintain a cache that maps from function, input shapes, input data types to the Program built by the transformed function. If the function hits cache, we run the stored Program in static graph mode to get result, else we do the code transformation on the function and store the transformed Program into the cache.
+
+2. From dygraph source code to AST (Abstract Syntax Tree)
+
+The core of transforming dygraph to static graph is similar to a compiler, we parse the dygraph code into AST, change AST, then turn it back into static graph code. We use Python ``inspect.getsource`` to get the source code string of the function. Python provides ``ast`` library to parse string code into AST, but Python2, Python3 have slight grammar difference. To avoid the work to handle different grammars, we used an open source AST library `gast <https://github.com/serge-sans-paille/gast>`_ that provides compatibility AST among various Python versions. There is no essential difficulty to turn function into AST with these library.
+
+3. Transform AST and turn it to static graph code
+
+This part is the key part in ProgramTranslator, we modify AST for supported grammars. Those important Python control flows, such as ``if-elif-else, while, for`` loop are converted to PaddlePaddle static graph API ``cond, while_loop`` and so on. We created a Transformer (AST-to-AST Transformer in Python, not the Transformer in Natural Language Process) to transform each grammar. Every Transformer scans AST and modify it. Lastly, we turn AST back to source code string by ``gast`` library.
+
+4. Running static graph code as part of dygraph
+
+In order to increase usability and re-use the transformed static graph code in dygraph, we wrap the generated Program as an dygraph op, the op can run the forward and backward computation of transformed Program. Then we can not only speed up dygraph code or save it for deployment, but also enable user to run part of their dygraph code in static graph mode so that they can continue training or other dygraph computation in their dygraph code.
+
+5. Error handling and Debug
+
+Compiler usually supports debug functionality like breakpoint, throwing exception, print some mid-level codes. ProgramTranslator is similar to a compiler, users may would like to set breakpoints for debugging, or see whether the transformed static graph code is expected. So we also implemented those error handling and debug functionality. Here we list those functions and their implementation.
+
+A. Report errors/exceptions on dygraph code line. Because the transformed static graph code is different to original dygraph code, when Python executes the static graph code, the exceptions will be reported at static graph code. To locate the corresponding dygraph code, we attach some informations such as line number on AST nodes when we transform AST, then we can re-write the static graph exception to the corresponding dygraph code exception.
+
+B. We support ``pdb.set_trace()`` when running ProgramTranslator, user can add this line to set breakpoints.
+
+C. Check the transformed static graph code. Our transformed output is a Python class named ``StaticLayer``, this class can be called, but it also stores the transformed code string. Users could call ``StaticLayer.code`` to get the converted code.
+
+D. Print mid-level transformed code, such as what's the code after transforming ``for`` loop. We provide APIs to set log level to let user check the mid-level code.
+
+
--- a/doc/fluid/advanced_guide/index_en.rst
+++ b/doc/fluid/advanced_guide/index_en.rst
@@ -15,6 +15,7 @@ So far you have already been familiar with PaddlePaddle. And the next expectatio
 ..  toctree::
    :hidden:

+    dygraph_to_static/index_en.rst
    inference_deployment/index_en.rst
    flags/flags_en.rst