ngraph cannot be used with garbage collection strategy together (#19655) · Issue · PaddlePaddle / Paddle

ngraph cannot be used with garbage collection strategy together

Created by: sneaxiy

We found that when FLAGS_use_ngraph is true and FLAGS_eager_delete_tensor_gb is 0, there is difference in almost all ngraph unittests test_xxx_ngraph_op.py.

FLAGS_eager_delete_tensor_gb is an environment variable to control whether garbage collection strategy is enabled. If FLAGS_eager_delete_tensor_gb is larger than or equal to 0, garbage collection strategy is enabled.

The garbage collection strategy in PaddlePaddle is designed to save both GPU and CPU memory usages. It would release memory of Tensors which would not be used in the following network calculation.

For example, there is a network with only 3 operators:

x2 = op1(x1)
x3 = op2(x2)
x4 = op3(x3)

Before op2 runs, we can release memory of x1 because x1 would not be be used any more after op1 runs. For the same reason, we can release memory of x2 after op2 runs, and release memory of x3 after op3 runs. Finally, the network would become something like:

x2 = op1(x1)
release_memory(x1)
x3 = op2(x2)
release_memory(x2)
x4 = op3(x3)
release_memory(x3)

Does ngraph implementation in PaddlePaddle would be conflict with garbage collection strategy above?

How to reproduce

If you want to reproduce the problem, please comment this line and run any test_xxx_ngraph_op.py.

https://github.com/PaddlePaddle/Paddle/blob/e9233d1c1ee1a0e3cc97bcb9cd2c4a71342c06fb/paddle/fluid/framework/executor.cc#L70

PaddlePaddle / Paddle 大约 1 年 前同步成功

ngraph cannot be used with garbage collection strategy together

How to reproduce

PaddlePaddle / Paddle
大约 1 年前同步成功