Add some debug flags to auto growth allocator (!21766) · 合并请求 · PaddlePaddle / Paddle

Add some debug flags to auto growth allocator !21766

Created by: sneaxiy

Add some debug flags for AutoGrowthBestFitAllocator:

FLAGS_free_idle_chunk: whether to call cudaFree when each allocation is destructed. This is used for measuring the actual GPU memory consumption of models (without any caching memory).
FLAGS_free_when_no_cache_hit: whether to call cudaFree when no memory cache is hit. If true, cudaFree would be called before calling cudaMalloc; if else, cudaFree would be called only when out of memory occurs, which is the same as the allocator strategy of PyTorch. This flag is used to debug the advantages of two free strategies.

All of these flags are only used to debug for developer, and it is not public to users.