Created by: Superjomn
Pile Allocator
This is a buddy-allocator based high-level memory allocator for both CPU and GPU memory. Beside a normal memory allocator, it offers the ability to save memory consumption in some special scenarios:
- In inference, several models are executed in sequential, it can help to reuse the memory for temporary variables,
- In training, it can also squeeze the memory used by sequential models execution.
The example scenarios:
We have N
models, each has an average memory size W
for weight and T
for temporary variables, and all these models are used in sequential. By default, these models will take N*(W+T)
size of memory.
By using PileAllocator, we can share the temporary variable memory space for all the models, and in total these model will only take N*W+T
size of memory. That effect is remarkable when T
is large.
TODOs:
- TODO add performance benchmark
- TODO re-consider the overall allocator interface, it seems that the design of
Allocation
as the return type add the overhead, and make it less flexible.