Created by: cryoco
On some devices, memory/graphic-memory is limited. Instead of running paddle inference parallelly, we have to load, analyze, and run them one after another(with each one's resources released after running) on these devices to avoid OOM. In such situations, loading and analysis phases become bottleneck of latency. So we add an API to clear intermediate tensors, enabling AnalysisPredictor to load and analyze all models first, and run inference one by one(with each one's intermediate tensors released to graphic memory pool by calling ClearIntermediateTensor()) to cut the overhead mentioned above.