Created by: qingqing01
The system refactoring should take this problem into consideration. Different from training, some memory can be reused during inference, such as the memory of outputs in some middle layers of the whole network.