Add api to clear intermediate tensors in AnalysisPredictor (!22391) · 合并请求 · PaddlePaddle / Paddle

Add api to clear intermediate tensors in AnalysisPredictor !22391

Created by: cryoco

On some devices, memory/graphic-memory is limited. Instead of running paddle inference parallelly, we have to load, analyze, and run them one after another(with each one's resources released after running) on these devices to avoid OOM. In such situations, loading and analysis phases become bottleneck of latency. So we add an API to clear intermediate tensors, enabling AnalysisPredictor to load and analyze all models first, and run inference one by one(with each one's intermediate tensors released to graphic memory pool by calling ClearIntermediateTensor()) to cut the overhead mentioned above.

PaddlePaddle / Paddle 大约 2 年 前同步成功

Add api to clear intermediate tensors in AnalysisPredictor !22391

PaddlePaddle / Paddle
大约 2 年前同步成功