WARNING: <function inner_func at 0x7fa9bcaacf50> doesn't have to be transformed to static function because it has been transformed before, it will be run as-is.
InvalidArgumentError: The 'shape'in ReshapeOp is invalid. The input tensor X'size must be equal to the capacity of 'shape'. But received X's shape =[3], X's size = 3, 'shape' is [1, 2], the capacity of 'shape' is 2.
[Hint: Expected capacity == in_size, but received capacity:2 != in_size:3.] (at /paddle/paddle/fluid/operators/reshape_op.cc:206)
The key part of ProgramTranslator is transforming Python grammar into PaddlePaddle static graph code, but there exists difference between Python and PaddlePaddle static graph which causes some limitation of the code transformation.
In this section we will talk about the supported grammars and unsupported grammars, also give some suggestions when the grammar is unsupported.
There are several kinds of supported grammars:
Control flow keywords
---------------------
Control flow means those keywords that controls the execution order of program statements, for example ``if-elif-else, while`` . Conditional operation and loop were implemented as ``cond, while_loop`` APIs in PaddlePaddle static graph. If the condition of a Python dygraph control flow depends on PaddlePaddle Tensor, the ProgramTranslator will convert the control flow into equivalent PaddlePaddle control flow APIs, else it will still be executed as Python control flow. The transformations of those control flow keywords are listed below:
1. ``if-elif-else`` statements
If the condition of ``if <condition>`` is Tensor, ProgramTranslator will turn this ``if-elif-else`` statement to equivalent PaddlePaddle static graph ``cond`` statements, otherwise the ``if-elif-else`` statement is executed as normal Python conditional statement. Note that ``cond`` API only accepts input conditional Tensor with numel equals to 1, so please use this kind of Tensor to write dygraph conditional statement, other Tensors will cause error.
2. ``while`` loop
If the condition of ``while`` is Tensor, ProgramTranslator will turn this ``while`` statement to equivalent PaddlePaddle static graph ``while_loop`` statements, otherwise the ``while`` statement is executed as normal Python ``while`` loop statement. Note that ``while_loop`` API only accepts input conditional Tensor with numel equals to 1, so please use this kind of Tensor to write dygraph loop condition statement, other Tensors will cause error.
3. ``for`` loop
3.1 ``for _ in range(__)`` loop
Firstly, ProgramTranslator will transform it into equivalent Python while loop, then convert dygraph to static graph by same logic of ``while`` loop.
3.2 ``for _ in x`` loop
If ``x`` is a Python container, iterator, or generator, it will be executed as original Python statement. Otherwise ``x`` is a Tensor, ProgramTranslator will transform the loop into PaddlePaddle static graph loop and fetches ``x[0], x[1], ...`` as loop iteration variable in each loop iteration.
3.3 ``for idx, val in enumerate(x)`` loop
If ``x`` is a Python container, iterator, or generator, it will be executed as original Python statement. Otherwise ``x`` is a Tensor, Program
Translator will transform the loop into PaddlePaddle static graph loop. The ``idx`` will be transformed to 1-D tensor with value ``0, 1, ...`` and the ``val`` will be transformed to ``x[0], x[1], ...`` in each loop iteration.
4. ``break, continue``
ProgramTranslator supports ``break, continue`` statements in loop. ProgramTranslator will add some PaddlePaddle static graph ``cond`` statements to skip execution of corresponding part when ``break, continue`` condition is meet.
5. ``return``
ProgramTranslator supports ``return`` in a conditonal block or loop body, not necessary to be at the end of a function. It also supports returning tuple with various length of Tensors with different dtype. The implementation is adding some PaddlePaddle static graph ``cond`` statement to skipparts of code when ``return`` is triggered.
Some Python basic operators
---------------------------
1. ``+, -, *, /, **, >, <, >= , <=, ==`` etc.
Because PaddlePaddle static graph overrides those Python basic arithmetic operators and comparison operators, ProgramTranslator can support those operators.
2. ``and, or, not`` logical operators
Python has ``and, or, not`` keywards as basic logical operators, ProgramTranslator will check whether the variables of the logical operators are Tensors, if they are Tensors, ProgramTranslator replaces the ``and, or, not`` statements into corresponding PaddlePaddle static graph logical operator and run it.
3. Type casting
In dygraph mode, users can use Python type casting grammar. For instance, if ``x`` is a Tensor, ``float(x)`` casts the data type of ``x`` to float. ProgramTranslator will check whether ``x`` is a Tensor during run time, if it is, the casting sentence will be modified to PaddlePaddle static graph ``cast`` API so that its dtype can be changed in the dygraph to static transformation.
Python functions
------------------------------
1. ``print``
In dygraph mode, ``print(x)`` will print Tensor value if ``x`` is a Tensor. ProgramTranslator converts the built-in ``print`` to PaddlePaddle static graph ``Print`` API during dygraph to static graph transformation if the arguments are Tensors, otherwise ProgramTranslator won't convert the ``print``.
2. ``len``
If ``x`` is a Tensor, ``len(x)`` can get the length at 0-dimension of ``x`` in dygraph mode. ProgramTranslator turns it to PaddlePaddle static graph ``shape`` API and returns the 0-dimension of the ``shape``, else if ``x`` is a TensorArray, then ``len(x)`` will be transformed to static graph API ``control_flow.array_length`` to return the length of TensorArray. In other cases, the ``len`` function will be executed as Python built-in ``len``
3. lambda expression
ProgramTranslator supports Python lambda expression and it modifies code to return the expected result.
4. Calling function
If the transformed function calls another function, ProgramTranslator also transform the called function. The benefit is that users can add one decorator at the outside function to do transformation, no need to add the decorator for each function. Note that ProgramTranslator doesn't support
that a function calls itself recursively, the details is in the unsupported grammars section below.
Errors and Exceptions
---------------------
1. ``assert``
If ``x`` is a Tensor, ``assert x`` statement can assert ``x`` to be ``True`` or non-zero value in dygraph mode. ProgramTranslator converts the statement into PaddlePaddle static graph ``Assert`` API to support this grammar.
Python containers
-----------------
1. ``list``: if all elements in a list are Tensors, then ProgramTranslator converts it to TensorArray. PaddlePaddle static graph TensorArray supports append, pop, and modify, other list operations such as sort cannot be supported. When not all elements in a list are Tensors, ProgramTranslator will treat it as normal Python list.
2. ``dict``: ProgramTranslator will add the Tensors in a dict into PaddlePaddle static graph ``Program``, so ``dict`` is supported by ProgramTranslator.
Unsupported grammars
--------------------
1. Use the shape of output tensor of ``reshape``
For example, ``x = reshape(x, shape=shape_tensor)`` , then use ``x.shape[0]`` to do other operation. Due to the difference between dygraph and static graph, it is okay in dygraph but it will fail in static graph. The reason is that APIs return computation result in dygraph mode, so ``x.shape`` has deterministic value after calling ``reshape`` . However, static graph doesn't have the value ``shape_tensor`` during building network, so PaddlePaddle doesn't know the value of ``x.shape`` after calling ``reshape``. PaddlePaddle static graph will set -1 to represent unknown shape value for each dimension of ``x.shape`` in this case, not the expected value.
We suggest to set fixed shape value as much as possible, reduce the reshape operation.
2. List of list of Tensor
For example: ``l = [[tensor1, tensor2], [tensor3, tensor4]]``, because ProgramTranslator transformed a list whose elements are all Tensors into PaddlePaddle static graph TensorArray, but TensorArray doesn't support multi-dimensions, ProgramTranslator cannot run this case.
We suggest to use 1-D list at most time, or use PaddlePaddle API ``create_array, array_read, array_write`` to control TensorArray.
3. Convert Tensor to numpy array and do operation
For example, user doesn't return Tensor in the decorated function but call ``numpy.array(tensor)`` to convert Tensor to numpy array and then use numpy API to compute on it. In dygraph mode, it is okey because Tensor has value, but Tensor is variable for building network in static graph mode, it doesn't contain value if not in static graph running time, so we cannot do numpy calculation on it.
We suggest to use PaddlePaddle APIs to replace numpy API in this case.
4. A function calls itself recursively
ProgramTranslator doesn't support a function calls itself recursively, the reason is that recursive function usually uses ``if-else`` for a condition to stop the recursion, the stop condition will be transformed to a ``cond`` in static graph mode. Since ``cond`` just builds network, it cannot determine how many times it recursively builds network during network built stage, so the function will recursively call itself and build network until stack overflow. Due to above reason, ProgramTranslator cannot support a function calls itself recursively now.
We suggest to write non-recursive function in this case.
- `Dygraph to Static Graph <program_translator_cn.html>`_ :Introduce the basic usage for transforming dygraph code into static code and the architecture of ProgramTranslator.
- `Supported Grammars <grammar_list_en.html>`_ :Introduce the grammars supported by ProgramTranslator and list unsupport grammars.
The imperative-style coding of PaddlePaddle takes advantage of flexibility, Pythonic coding, and easy-to-debug interface. In dygraph mode, code immediately executes kernels and gets numerical results, which allows users to enjoy traditional Pythonic code order. Therefore it is efficient to transform idea into real code and simple to debug. However, Python code is usually slower than C++ thus lots of industrial systems (such as large recommend system, mobile devices) prefer to deploy with C++ implementation.
Static graph is better at speed and portability. Static graph builds the network structure during compiling time and then does computation. The built network intermediate representation can be executed in C++ and gets rids of Python dependency.
While dygraph has usability and debug benefits and static graph yields performance and deployment advantage, we adds functionality to convert dygraph to static graph. Users use imperative mode to write dygraph code and PaddlePaddle will analyze the Python syntax and turn it into network structure of static graph mode. Our approach retains both the usability of dygraph and portability of static graph.
Basic Usage
--------------
PaddlePaddle has two ways to transform dygraph to static graph. TracedLayer extracts computation graph through tracing and ProgramTranslator gets computation graph through source code transformation.
1. TracedLayer:
Tracing means recording the operators when running a model. TracedLayer is based on this technique. It runs dygraph program once and records all operators, then constructs static graph model and saves it. Now take a glance at an usage example:
However, as tracing only records operators once, if user's code contains Tensor-dependent (including Tensor value or Tensor shape) control flow, that is the Tensor can cause different operators being executed, then TracedLayer cannot handle this case. For instance:
.. code-block:: python
import paddle
def func(input_var)
# if condition depends on the shape of input_var
if input_var.shape[0] > 1:
return paddle.cast(input_var, "float64")
else:
return paddle.cast(input_var, "int64")
paddle.disable_static()
in_np = np.array([-2]).astype('int')
input_var = paddle.to_tensor(in_np)
out = func(input_var)
If we apply TracedLayer.trace(func, inputs=[input_var]) on above example, tracing can take record of operators in only one branch of if-else, then the model can not be saved as what user orignally means. The similar situations applies to while/for loop.
2. ProgramTranslator
For the Tensor-dependent control flow, we use source-code-translate based ProgramTranslator to convert dygraph into static graph. The basic idea is analyzing Python source code and turning into static graph code, then run the static graph code using Executor. The basic usage of ProgramTranslator is simple, put a decorator ``@paddle.jit.to_static`` before the definition of the function to transform (the function can also be a method of a class, e.g., the ``forward`` function of user-defined imperative Layer). Above Tensor-dependent example can be transformed correctly by ProgramTranslator as below:
.. code-block:: python
import paddle
@paddle.jit.to_static
def func(input_var)
# if condition depends on the shape of input_var
if input_var.shape[0] > 1:
out = paddle.cast(input_var, "float64")
else:
out = paddle.cast(input_var, "int64")
paddle.disable_static()
in_np = np.array([-2]).astype('int')
input_var = paddle.to_tensor(in_np)
func(input_var)
To save the transformed model, we can call ``paddle.jit.save`` . Let's take ``SimpleFcLayer`` as an example again, we put decorator at the ``forward`` method of ``SimpleFcLayer`` :
The basic idea of TracedLayer is tracing, it is relatively simple so we won't expend here. This section will talk about the source code transformation of ProgramTranslator.
The transformation is implemented in the decorator so transformation happens when user calls the decorated function, the procedure includes these steps:
1. Function and cache.
The entity for transforming dygraph to static graph is the decorated function. For the PaddlePaddle APIs in the function, since they are same code under dygraph mode and static mode, we don't have to transform those code. However, those APIs are computation in dygraph model while they are building network in static graph mode, if the transformed functions are called multiple times, those APIs will build network multiple times in static graph, which can cause problem. To solve it as well as speed up the transformation, we maintain a cache that maps from function, input shapes, input data types to the Program built by the transformed function. If the function hits cache, we run the stored Program in static graph mode to get result, else we do the code transformation on the function and store the transformed Program into the cache.
2. From dygraph source code to AST (Abstract Syntax Tree)
The core of transforming dygraph to static graph is similar to a compiler, we parse the dygraph code into AST, change AST, then turn it back into static graph code. We use Python ``inspect.getsource`` to get the source code string of the function. Python provides ``ast`` library to parse string code into AST, but Python2, Python3 have slight grammar difference. To avoid the work to handle different grammars, we used an open source AST library `gast <https://github.com/serge-sans-paille/gast>`_ that provides compatibility AST among various Python versions. There is no essential difficulty to turn function into AST with these library.
3. Transform AST and turn it to static graph code
This part is the key part in ProgramTranslator, we modify AST for supported grammars. Those important Python control flows, such as ``if-elif-else, while, for`` loop are converted to PaddlePaddle static graph API ``cond, while_loop`` and so on. We created a Transformer (AST-to-AST Transformer in Python, not the Transformer in Natural Language Process) to transform each grammar. Every Transformer scans AST and modify it. Lastly, we turn AST back to source code string by ``gast`` library.
4. Running static graph code as part of dygraph
In order to increase usability and re-use the transformed static graph code in dygraph, we wrap the generated Program as an dygraph op, the op can run the forward and backward computation of transformed Program. Then we can not only speed up dygraph code or save it for deployment, but also enable user to run part of their dygraph code in static graph mode so that they can continue training or other dygraph computation in their dygraph code.
5. Error handling and Debug
Compiler usually supports debug functionality like breakpoint, throwing exception, print some mid-level codes. ProgramTranslator is similar to a compiler, users may would like to set breakpoints for debugging, or see whether the transformed static graph code is expected. So we also implemented those error handling and debug functionality. Here we list those functions and their implementation.
A. Report errors/exceptions on dygraph code line. Because the transformed static graph code is different to original dygraph code, when Python executes the static graph code, the exceptions will be reported at static graph code. To locate the corresponding dygraph code, we attach some informations such as line number on AST nodes when we transform AST, then we can re-write the static graph exception to the corresponding dygraph code exception.
B. We support ``pdb.set_trace()`` when running ProgramTranslator, user can add this line to set breakpoints.
C. Check the transformed static graph code. Our transformed output is a Python class named ``StaticLayer``, this class can be called, but it also stores the transformed code string. Users could call ``StaticLayer.code`` to get the converted code.
D. Print mid-level transformed code, such as what's the code after transforming ``for`` loop. We provide APIs to set log level to let user check the mid-level code.